Estimation of Contig Correctness Checker accuracy

To assess the accuracy of the script (that evaluates the integrity of contigs) we selected a set of 600 contigs from the 454 shotgun assembly of N. gaditanaThe first 200 were fused forming 100 chimeric contigs. You can download the dataset from here.

We the run all the steps already described and briefly summarized here:

pass -csfastq 2_F3.csfastq -d testset.fa -cpu 12 -uniq -fid 90 -gff > 2_F3.gff 
pass -csfastq 2_R3.csfastq -d testset.fa -cpu 12 -uniq -fid 90 -gff > 2_F3.gff  
pass_pair -gff1 2_F3.gff -gff2 2_R3.gff -range 900 3000 3001 -ref testset.fa -o ./

Continue reading

TRAP Tutorial

TelomeresThis tutorial explain how to use the TRAP pipeline (Telomeric Repeat Analysis Program) to spot telomeres from mate paired reads in a under assembly genome.

The insert size of mate paired reads should be as long as possibile, to ensure the possibility of anchoring telomeres to existing scaffolds.

Continue reading

Preparing alignments for ScaMPI: pairing

A pivotal step to prepare data for ScaMPI is the pairing of alignments, using the pass_pair program from PASS suite. Once that we aligned both the first and the second mate against reference contigs, we obtain two alignment files, that when paired will be categorized as:

  • Unique pairs – if both mates align uniquely within the same contig (with proper mutual orientation and distance)
  • Unique pair out – if both mates aligne uniquely but in different contigs (and their distance to the edges is still acceptable)

All other pairs will be discarded. The first category can be used for integrity checking of the contigs, while the latter for scaffolding.

Continue reading

SAM2GFF utility to integrate other aligners

The ScaMPI suite comes with its own aligner (PASS), because of its support for color space reads and its ability to run also without buiding an index (that turns to be handy when testing alignments against many different contigs sets).

As older versions of PASS produced a GFF output, instead of the now widespread SAM, that is the alignment format used by ScaMPI to infer arcs between contigs. We now include a script for SAM to GFF conversion (that keeps only scaffolding-needed information!).


Contig correctness check


Checking the correctness of contigs is a pivotal step to perform prior to scaffold, as chimeric sequences can lead to big mistakes. This is especially true when the contigs are generate by a shotgun approach with long reads, while a paired library of short reads is available.

The “” script whilll break apart contigs basing its analysis on the physical coverage of the mates aligned against reference contigs.

Note that step 1 and 2 are identical to the alignment steps used for scaffolding: you can skip to step 3 if already performed them.

Continue reading