Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
new histograms and note on --ubam
  • Loading branch information
wdecoster committed Sep 5, 2023
1 parent 45aeb6b commit 435ff1f
Showing 1 changed file with 78 additions and 31 deletions.
109 changes: 78 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Path alignment/example.cram
Creation time 09/09/2022 10:53:36
```

A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the [gap-compressed identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity).
A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the [gap-compressed identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity). The `--ubam` flag will provide metrics for all reads in the file, regardless of whether they are aligned or not.

### Optional output

Expand All @@ -60,38 +60,85 @@ A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note th
* calculating a normalised number of reads per chromosome, e.g. to determine the sex or aneuploidies (`--karyotype`)
* information about the phase blocks. (`--phased`)
* information about number of splice sites. (`--spliced`)
* histograms of read lengths and read identities, as below. (`--hist`)
* histograms of read lengths and read identities, as below. (`--hist`). With `--phased`, also a histogram of phase block lengths. Please let me know if the histograms look inappropriately scaled for your data.

```text
70.97195691947476 .. 71.97292392225151 [ 122235 ]: ∎∎
71.97292392225151 .. 72.97389092502823 [ 136051 ]: ∎∎∎
72.97389092502823 .. 73.97485792780498 [ 145876 ]: ∎∎∎
73.97485792780498 .. 74.9758249305817 [ 157751 ]: ∎∎∎
74.9758249305817 .. 75.97679193335844 [ 179551 ]: ∎∎∎∎
75.97679193335844 .. 76.97775893613516 [ 171769 ]: ∎∎∎∎
76.97775893613516 .. 77.9787259389119 [ 159340 ]: ∎∎∎
77.9787259389119 .. 78.97969294168863 [ 151355 ]: ∎∎∎
78.97969294168863 .. 79.98065994446536 [ 146207 ]: ∎∎∎
79.98065994446536 .. 80.98162694724209 [ 142832 ]: ∎∎∎
80.98162694724209 .. 81.98259395001882 [ 140902 ]: ∎∎∎
81.98259395001882 .. 82.98356095279556 [ 143909 ]: ∎∎∎
82.98356095279556 .. 83.98452795557229 [ 149142 ]: ∎∎∎
83.98452795557229 .. 84.98549495834902 [ 158386 ]: ∎∎∎
84.98549495834902 .. 85.98646196112576 [ 176819 ]: ∎∎∎∎
85.98646196112576 .. 86.98742896390249 [ 199558 ]: ∎∎∎∎
86.98742896390249 .. 87.98839596667922 [ 234573 ]: ∎∎∎∎∎
87.98839596667922 .. 88.98936296945595 [ 280849 ]: ∎∎∎∎∎∎
88.98936296945595 .. 89.99032997223267 [ 348535 ]: ∎∎∎∎∎∎∎∎
89.99032997223267 .. 90.9912969750094 [ 445640 ]: ∎∎∎∎∎∎∎∎∎∎
90.9912969750094 .. 91.99226397778614 [ 583424 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎
91.99226397778614 .. 92.99323098056287 [ 776111 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
92.99323098056287 .. 93.9941979833396 [ 1051370 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
93.9941979833396 .. 94.99516498611634 [ 1414103 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
94.99516498611634 .. 95.99613198889307 [ 1833438 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
95.99613198889307 .. 96.9970989916698 [ 2084833 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
96.9970989916698 .. 97.99806599444653 [ 1620179 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
97.99806599444653 .. 98.99903299722327 [ 416669 ]: ∎∎∎∎∎∎∎∎∎
98.99903299722327 .. 100 [ 39254 ]:
# Histogram for read lengths:
0-2000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
2000-4000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4000-6000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6000-8000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
8000-10000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
10000-12000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
12000-14000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
14000-16000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
16000-18000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
18000-20000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
20000-22000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
22000-24000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
24000-26000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
26000-28000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
28000-30000 ∎∎∎∎∎∎∎∎∎∎∎∎
30000-32000 ∎∎∎∎∎∎∎∎∎
32000-34000 ∎∎∎∎∎∎
34000-36000 ∎∎∎∎
36000-38000 ∎∎
38000-40000 ∎
40000-42000 ∎
42000-44000 ∎
44000-46000
46000-48000
48000-50000
50000-52000
52000-54000
54000-56000
56000-58000
58000-60000
60000+
# Histogram for Phred-scaled accuracies:
Q0-1
Q1-2
Q2-3
Q3-4
Q4-5
Q5-6 ∎∎∎
Q6-7 ∎∎∎∎∎∎∎∎∎∎∎∎
Q7-8 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q8-9 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q9-10 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q10-11 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q11-12 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q12-13 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q13-14 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q14-15 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q15-16 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q16-17 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q17-18 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
Q18-19 ∎∎∎∎
Q19-20 ∎
Q20-21
Q21-22
Q22-23
Q23-24
Q24-25
Q25-26
Q26-27
Q27-28
Q28-29
Q29-30
Q30-31
Q31-32
Q32-33
Q33-34
Q34-35
Q35-36
Q36-37
Q37-38
Q38-39
Q39-40
Q40+
```

## CITATION
Expand Down

0 comments on commit 435ff1f

Please sign in to comment.