From 435ff1f55af0281b9285b8a8efa80e023f30a7be Mon Sep 17 00:00:00 2001 From: wdecoster Date: Tue, 5 Sep 2023 11:57:25 +0200 Subject: [PATCH] update readme new histograms and note on --ubam --- README.md | 109 ++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 78 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index d496842..37c09c3 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ Path alignment/example.cram Creation time 09/09/2022 10:53:36 ``` -A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the [gap-compressed identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity). +A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note that the identity score above is defined as the [gap-compressed identity](https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity). The `--ubam` flag will provide metrics for all reads in the file, regardless of whether they are aligned or not. ### Optional output @@ -60,38 +60,85 @@ A 140Gbase bam file is processed in 12 minutes, using <1Gbyte of memory. Note th * calculating a normalised number of reads per chromosome, e.g. to determine the sex or aneuploidies (`--karyotype`) * information about the phase blocks. (`--phased`) * information about number of splice sites. (`--spliced`) -* histograms of read lengths and read identities, as below. (`--hist`) +* histograms of read lengths and read identities, as below. (`--hist`). With `--phased`, also a histogram of phase block lengths. Please let me know if the histograms look inappropriately scaled for your data. ```text - 70.97195691947476 .. 71.97292392225151 [ 122235 ]: ∎∎ - 71.97292392225151 .. 72.97389092502823 [ 136051 ]: ∎∎∎ - 72.97389092502823 .. 73.97485792780498 [ 145876 ]: ∎∎∎ - 73.97485792780498 .. 74.9758249305817 [ 157751 ]: ∎∎∎ - 74.9758249305817 .. 75.97679193335844 [ 179551 ]: ∎∎∎∎ - 75.97679193335844 .. 76.97775893613516 [ 171769 ]: ∎∎∎∎ - 76.97775893613516 .. 77.9787259389119 [ 159340 ]: ∎∎∎ - 77.9787259389119 .. 78.97969294168863 [ 151355 ]: ∎∎∎ - 78.97969294168863 .. 79.98065994446536 [ 146207 ]: ∎∎∎ - 79.98065994446536 .. 80.98162694724209 [ 142832 ]: ∎∎∎ - 80.98162694724209 .. 81.98259395001882 [ 140902 ]: ∎∎∎ - 81.98259395001882 .. 82.98356095279556 [ 143909 ]: ∎∎∎ - 82.98356095279556 .. 83.98452795557229 [ 149142 ]: ∎∎∎ - 83.98452795557229 .. 84.98549495834902 [ 158386 ]: ∎∎∎ - 84.98549495834902 .. 85.98646196112576 [ 176819 ]: ∎∎∎∎ - 85.98646196112576 .. 86.98742896390249 [ 199558 ]: ∎∎∎∎ - 86.98742896390249 .. 87.98839596667922 [ 234573 ]: ∎∎∎∎∎ - 87.98839596667922 .. 88.98936296945595 [ 280849 ]: ∎∎∎∎∎∎ - 88.98936296945595 .. 89.99032997223267 [ 348535 ]: ∎∎∎∎∎∎∎∎ - 89.99032997223267 .. 90.9912969750094 [ 445640 ]: ∎∎∎∎∎∎∎∎∎∎ - 90.9912969750094 .. 91.99226397778614 [ 583424 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ - 91.99226397778614 .. 92.99323098056287 [ 776111 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 92.99323098056287 .. 93.9941979833396 [ 1051370 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 93.9941979833396 .. 94.99516498611634 [ 1414103 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 94.99516498611634 .. 95.99613198889307 [ 1833438 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 95.99613198889307 .. 96.9970989916698 [ 2084833 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 96.9970989916698 .. 97.99806599444653 [ 1620179 ]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ - 97.99806599444653 .. 98.99903299722327 [ 416669 ]: ∎∎∎∎∎∎∎∎∎ - 98.99903299722327 .. 100 [ 39254 ]: +# Histogram for read lengths: + 0-2000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + 2000-4000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + 4000-6000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + 6000-8000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + 8000-10000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +10000-12000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +12000-14000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +14000-16000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +16000-18000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +18000-20000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +20000-22000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +22000-24000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +24000-26000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +26000-28000 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +28000-30000 ∎∎∎∎∎∎∎∎∎∎∎∎ +30000-32000 ∎∎∎∎∎∎∎∎∎ +32000-34000 ∎∎∎∎∎∎ +34000-36000 ∎∎∎∎ +36000-38000 ∎∎ +38000-40000 ∎ +40000-42000 ∎ +42000-44000 ∎ +44000-46000 +46000-48000 +48000-50000 +50000-52000 +52000-54000 +54000-56000 +56000-58000 +58000-60000 + 60000+ + + +# Histogram for Phred-scaled accuracies: + Q0-1 + Q1-2 + Q2-3 + Q3-4 + Q4-5 + Q5-6 ∎∎∎ + Q6-7 ∎∎∎∎∎∎∎∎∎∎∎∎ + Q7-8 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + Q8-9 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ + Q9-10 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q10-11 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q11-12 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q12-13 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q13-14 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q14-15 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q15-16 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q16-17 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q17-18 ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ +Q18-19 ∎∎∎∎ +Q19-20 ∎ +Q20-21 +Q21-22 +Q22-23 +Q23-24 +Q24-25 +Q25-26 +Q26-27 +Q27-28 +Q28-29 +Q29-30 +Q30-31 +Q31-32 +Q32-33 +Q33-34 +Q34-35 +Q35-36 +Q36-37 +Q37-38 +Q38-39 +Q39-40 + Q40+ ``` ## CITATION