Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 1.11 KB

File metadata and controls

19 lines (12 loc) · 1.11 KB

N50 and N75

Two of the essential metrics for assembly are N50 and N75. Both of these are a measure of how long the contigs are. The idea is that you order the contigs from shortest to longest, and find the length of the contig that contains half (for N50) or three quarters (for N75) of the data. If you have a more complete assembly the numbers should be larger, while with shorter assemblies, these numbers will be less.

There is script called countfasta.py that we have provided that takes a single argument and counts the number of fasta characters in the file. (There is a similar metric, called L50 that reports the number of contigs shorter than the contig that contains the 50% point in sequence length, but no one uses this!).

Example usage of countfasta.py:

countfasta.py -f AlgaeAssembly/contigs.fasta 

Total length: 5426326
Shortest: 57 (NODE_5409_length_57_cov_18.5)
Longest: 47734 (NODE_1_length_47734_cov_9.85858)

N50: 1396 (NODE_1006_length_1396_cov_1.72782)
N75: 2183 (NODE_387_length_2183_cov_1.83224)