Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mageck MLE Failes with non trival design matrix. #211

Open
andrewholding opened this issue Oct 16, 2024 · 3 comments
Open

Mageck MLE Failes with non trival design matrix. #211

andrewholding opened this issue Oct 16, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@andrewholding
Copy link

Description of the bug

Using the any design matrix with more then 3 samples there pipeline exists from Magick MLE with the following error.

Command error:
INFO @ Wed, 16 Oct 2024 15:53:18: Parameters: /usr/local/bin/mageck mle --threads 6 -k count_table.count.txt -n designmatrix-anh004 -d designmatrix-anh004.txt
INFO @ Wed, 16 Oct 2024 15:53:23: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO @ Wed, 16 Oct 2024 15:53:23: Design matrix:
INFO @ Wed, 16 Oct 2024 15:53:23: [[1. 1. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]]
INFO @ Wed, 16 Oct 2024 15:53:23: Beta labels:baseline,common,hypoxiaVsNormoxia
INFO @ Wed, 16 Oct 2024 15:53:23: Included samples:day0-1,day0-2,control1,control2,hypoxia1,hypoxia2
INFO @ Wed, 16 Oct 2024 15:53:23: Loaded samples:day0-1;day0-2;control1;control2;hypoxia1;hypoxia2
INFO @ Wed, 16 Oct 2024 15:53:23: Sample index: 4;5;2;3;0;1
INFO @ Wed, 16 Oct 2024 15:53:23: Loaded 182 genes.
Error loading line 218
Error loading line 521
Error loading line 907
Traceback (most recent call last):
File "/usr/local/bin/mageck", line 66, in
main();
File "/usr/local/bin/mageck", line 43, in main
args=crisprseq_parseargs();
File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs
mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command
File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 74, in mageckmle_main
allgenedict=read_gene_from_file(args.count_table,includesamples=args.include_samples)
File "/usr/local/lib/python3.9/site-packages/mageck/mleinstanceio.py", line 84, in read_gene_from_file
ginst.nb_count=np.matrix(ginst.nb_count)
File "/usr/local/lib/python3.9/site-packages/numpy/matrixlib/defmatrix.py", line 145, in new
arr = N.array(data, dtype=dtype, copy=copy)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

The follow inputs were used:

Sample sheet
sample,fastq_1,fastq_2,condition
RM-231Hxn1,./seqdata231/RM-231Hxn1_R1_001.fastq.gz,,hypoxia1
RM-231Hxn2,./seqdata231/RM-231Hxn2_R1_001.fastq.gz,,hypoxia2
RM-231Nxn1,./seqdata231/RM-231Nxn1_R1_001.fastq.gz,,control1
RM-231Nxn2,./seqdata231/RM-231Nxn2_R1_001.fastq.gz,,control2
RM-231T0n1,./seqdata231/RM-231T0n1_R1_001.fastq.gz,,day0-1
RM-231T0n2,./seqdata231/RM-231T0n2_R1_001.fastq.gz,,day0-2

Guide library (head):
id target transcript gene symbol
1 ACCAGGGGAGCCAAGTGGA ATP1A1
2 GAAGGAGCCCCGAACCCGG ATP1A1
3 ggcggacacgtggcaacag ATP1A1
4 GAGGGAGCGCAGTAACGGG ATP1A1
5 acagcggtagcagcccggg ATP1A1
6 CCAGCCCGTCTGGGACAGT ATP1A2
7 GGGCTGTGGGTCTAACTGT ATP1A2
8 AGGGAAGGACTAGAGATGT ATP1A2
9 AGCCCACACCAGCCCGTCT ATP1A2

Design Matrix:
Samples baseline common hypoxiaVsNormoxia
day0-1 1 1 0
day0-2 1 1 0
control1 1 0 0
control2 1 0 0
hypoxia1 1 1 1
hypoxia2 1 1 1

Command used and terminal output

nextflow run nf-core/crisprseq --analysis screening --input $sampleSheet -profile apptainer
--library $guideLibrary --outdir $output
--mle_design_matrix $designMatrix
-w work-combined-day0

Relevant files

N E X T F L O W ~ version 23.10.0
Launching https://github.com/nf-core/crisprseq [loving_planck] DSL2 - revision: b2c583a [master]
WARN: Nextflow self-contained distribution allows only core plugins -- User config plugins will be ignored: [email protected]
WARN: Access to undefined parameter reference_fasta -- Initialise it to a default value eg. params.reference_fasta = some_value
WARN: Access to undefined parameter monochromeLogs -- Initialise it to a default value eg. params.monochromeLogs = some_value


                                    ,--./,-.
    ___     __   __   __   ___     /,-._.--~'

|\ | |__ __ / / \ |__) |__ } { | \| | \__, \__/ | \ |___ \-.,--, .,._,'
nf-core/crisprseq v2.2.1-gb2c583a

Core Nextflow options
revision : master
runName : loving_planck
containerEngine : apptainer
launchDir : /mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM
workDir : /mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM/work-combined-day0
projectDir : /users/anh524/.nextflow/assets/nf-core/crisprseq
userName : anh524
profile : apptainer
configFiles :

Input/output options
input : settings/samplesheet-anh003.csv
outdir : anh231-mle-day0
analysis : screening

Screening parameters
library : settings/guide_library-anh001.tsv
mle_design_matrix: settings/designmatrix-anh004.txt

!! Only displaying parameters that differ from the pipeline defaults !!

If you use nf-core/crisprseq for your analysis please cite:


executor > local (8)
[93/ee583d] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:FASTQC (RM-231T0n2) [100%] 6 of 6 ✔
[99/4c6443] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_COUNT (hypoxia1,hypoxia2,control1,control2,day0-1,day0-2) [100%] 1 of 1 ✔
[20/ebfc47] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX (designmatrix-anh004) [100%] 1 of 1, failed: 1 ✘
[- ] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_FLUTEMLE -
[- ] process > NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MULTIQC -
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/crisprseq] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX (designmatrix-anh004)'

Caused by:
Process NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX (designmatrix-anh004) terminated with an error exit status (1)

Command executed:

mageck
mle

--threads 6
-k count_table.count.txt
-n designmatrix-anh004
-d designmatrix-anh004.txt

cat <<-END_VERSIONS > versions.yml
"NFCORE_CRISPRSEQ:CRISPRSEQ_SCREENING:MAGECK_MLE_MATRIX":
mageck: $(mageck -v)
END_VERSIONS

Command exit status:
1

Command output:
Error loading line 218
Error loading line 521
Error loading line 907

Command error:
INFO @ Wed, 16 Oct 2024 15:53:18: Parameters: /usr/local/bin/mageck mle --threads 6 -k count_table.count.txt -n designmatrix-anh004 -d designmatrix-anh004.txt
INFO @ Wed, 16 Oct 2024 15:53:23: Cannot parse design matrix as a string; try to parse it as a file name ...
INFO @ Wed, 16 Oct 2024 15:53:23: Design matrix:
INFO @ Wed, 16 Oct 2024 15:53:23: [[1. 1. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 0. 0.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]
INFO @ Wed, 16 Oct 2024 15:53:23: [1. 1. 1.]]
INFO @ Wed, 16 Oct 2024 15:53:23: Beta labels:baseline,common,hypoxiaVsNormoxia
INFO @ Wed, 16 Oct 2024 15:53:23: Included samples:day0-1,day0-2,control1,control2,hypoxia1,hypoxia2
INFO @ Wed, 16 Oct 2024 15:53:23: Loaded samples:day0-1;day0-2;control1;control2;hypoxia1;hypoxia2
INFO @ Wed, 16 Oct 2024 15:53:23: Sample index: 4;5;2;3;0;1
INFO @ Wed, 16 Oct 2024 15:53:23: Loaded 182 genes.
Error loading line 218
Error loading line 521
Error loading line 907
Traceback (most recent call last):
File "/usr/local/bin/mageck", line 66, in
main();
File "/usr/local/bin/mageck", line 43, in main
args=crisprseq_parseargs();
File "/usr/local/lib/python3.9/site-packages/mageck/argsParser.py", line 258, in crisprseq_parseargs
mageckmle_main(parsedargs=args); # ignoring the script path, and the sub command
File "/usr/local/lib/python3.9/site-packages/mageck/mlemageck.py", line 74, in mageckmle_main
allgenedict=read_gene_from_file(args.count_table,includesamples=args.include_samples)
File "/usr/local/lib/python3.9/site-packages/mageck/mleinstanceio.py", line 84, in read_gene_from_file
ginst.nb_count=np.matrix(ginst.nb_count)
File "/usr/local/lib/python3.9/site-packages/numpy/matrixlib/defmatrix.py", line 145, in new
arr = N.array(data, dtype=dtype, copy=copy)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (6,) + inhomogeneous part.

Work dir:
/mnt/scratch/projects/biol-student-2023/NextFlowPipelineRM/work-combined-day0/20/ebfc47d70920ccc3aa3c2ea08f6da1

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

System information

No response

@andrewholding andrewholding added the bug Something isn't working label Oct 16, 2024
@LaurenceKuhl
Copy link
Contributor

Hi Andrew,

this seems to be coming more from MAGeCK MLE than from the pipeline, could you by any chance double check that everything is tab separated?

@andrewholding
Copy link
Author

Have double checked, it is tab seperated. This has been picked up by @medmaca in the bioinformics core at York.

@LaurenceKuhl
Copy link
Contributor

Hi @medmaca,

is there anyway i could have a subset of the count table to try to run it outside of the pipeline? I'd like to double check on my side and try a few things out! Thanks a bunch :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants