Get reads which map completely to reference using minimap (PacBio & Oxford Nanopore)

Get reads which map completely to reference using minimap (PacBio & Oxford Nanopore)

Minimap is a tool to find approximate mapping positions between two sets of long noisy reads or between genomes and long noisy reads like PacBio of Oxford Nanopore data (https://github.com/lh3/minimap).

Standard output is in PAF format (https://github.com/lh3/miniasm/blob/master/PAF.md).

If you want to filter the PAF file for reads which map entirely (or close to complete) to a genome you can use this simple awk command:

awk '{if(($4-$3)>(0.9*$2)) print}' mapping.paf > mapping.filtered.paf

Nothing fancy at all. But it might help for example to figure out if you resolved the repeats for a genome assembly with your data. By observing of many long reads span your repeats.

In the example above the read has to map 90% of its length to the reference sequence. The mapping was generated by the following command:

minimap -x map-ont ref.fasta reads.fasta > mapping.paf

Leave a Reply

Your email address will not be published. Required fields are marked *