Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About rmats_long.py #26

Open
border-info-nt opened this issue Nov 13, 2024 · 5 comments
Open

About rmats_long.py #26

border-info-nt opened this issue Nov 13, 2024 · 5 comments

Comments

@border-info-nt
Copy link

border-info-nt commented Nov 13, 2024

Hello,

I've completed the analysis with ESPRESSO and am now planning to proceed with rMATS-long analysis. However, I generated the .esp and updated.gtf files individually for each sample, so I’m unable to perform a multi-sample analysis. Is it acceptable to merge these files, or should I start the analysis from scratch?

For exmple

  • transcript_ID transcript_name gene_ID gs689_1 gs689_2 gs689_3 pc3e_1 pc3e_2 pc3e_3
  • ENST00000428599.6 CTNND1-208 ENSG00000198561.16 23.66 16.47 30.81 0.60 0.36 0.31
  • ENST00000682814.1 CTNND1-245 ENSG00000198561.16 0 0 0 1.00 1.01 0
  • ENST00000681984.1 CTNND1-242 ENSG00000198561.16 0 0 1.54 0 0 1.27
  • ENST00000683906.1 CTNND1-250 ENSG00000198561.16 5.45 1.21 2.33 1.11 0 1.03
  • ENST00000426142.6 CTNND1-207 ENSG00000198561.16 2.89 8.82 4.42 15.44 15.73 6.54
  • ENST00000683769.1 CTNND1-249 ENSG00000198561.16 2.13 2.31 0 2.18 0 1.01
  • ENST00000532463.5 CTNND1-230 ENSG00000198561.16 5.90 0 7.31 1.75 6.94 6.54
  • ENST00000529986.5 CTNND1-222 ENSG00000198561.16 24.91 25.03 59.60 54.96 54.33 65.52
  • ENST00000358694.10 CTNND1-201 ENSG00000198561.16 48.22 73.37 111.83 0.60 0.36 0.31
  • ENST00000684704.1 CTNND1-252 ENSG00000198561.16 0 0 1.04 0 0 0
  • ENST00000530068.5 CTNND1-223 ENSG00000198561.16 0 0 0 1.00 0 0
  • ENST00000531007.2 CTNND1-227 ENSG00000198561.16 0 3.23 0 0 0.28 0.29
    • ENST00000534579.5 CTNND1-236 ENSG00000198561.16 5.09 0 5.01 0.60 0.36 0.31

How can I output [gs689_1 gs689_2 gs689_3 pc3e_1 pc3e_2 pc3e_3] in espresso like this?
Or should I merge the results output one sample at a time?

@EricKutschera
Copy link
Contributor

Ideally the espresso run would include all of your samples and the .esp from espresso would then have all the sample names as columns. When you run espresso you can list the sample names along with the input files in the samples.tsv file described in the README: https://github.com/Xinglab/espresso/tree/v1.5.0?tab=readme-ov-file#basic-usage

It's not easy to merge the results from multiple espresso runs since the novel isoforms detected would have different IDs (ESPRESSO:chr1:2:3). Also there may be novel isoforms which have slightly different transcript start or end coordinates between runs but are otherwise the same

@border-info-nt
Copy link
Author

border-info-nt commented Dec 6, 2024

Sorry,,,,

I have an additional question.
I received a force close error message.
The file male_N2_R0_updated.gtf is 99.1 MB
The file male_N2_R0_abundance.esp is 15.2MB.

We would appreciate your advice.
Thank you.

1: Vectorized input to element_text() is not officially supported.
ℹ Results may be unexpected or may change in future versions of ggplot2.
2: Removed 132 rows containing missing values or values outside the scale range (geom_point()).
3: get_plot_component(plot, "guide-box") で:
Multiple components found; returning the first one. To return all, use return_all = TRUE.
4: Removed 132 rows containing missing values or values outside the scale range (geom_point()).
running: ['/miniconda3/bin/python', '/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py', '--updated-gtf', '/variant/male4/rmats_long_n1y594i8_tmp/ENSG00000131469.15_updated.gtf', '--out-tsv', '/西村/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12.tsv', '--main-transcript-id', 'ENST00000589913.6', '--second-transcript-id', 'ENST00000253788.12']
running: ['/miniconda3/bin/python', '/rMATS-long_v_1_0_0/scripts/FindAltTSEvents.py', '-i', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12guif3i92_tmp_isoform.gtf', '-o', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12.tsv']
Traceback (most recent call last):
File "/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py", line 176, in
main()
File "/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py", line 172, in main
classify_isoform_differences(args)
File "/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py", line 55, in classify_isoform_differences
classify_isoform_differences_with_temp_files(
File "/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py", line 149, in classify_isoform_differences_with_temp_files
rmats_long_utils.run_command(command)
File "//rMATS-long_v_1_0_0/scripts/rmats_long_utils.py", line 278, in run_command
subprocess.run(command, check=True)
File "/miniconda3/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/miniconda3/bin/python', '/rMATS-long_v_1_0_0/scripts/FindAltTSEvents.py', '-i', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12guif3i92_tmp_isoform.gtf', '-o', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12.tsv']' died with <Signals.SIGKILL: 9>.
Traceback (most recent call last):
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 624, in
main()
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 618, in main
summary = rmats_long(args)
^^^^^^^^^^^^^^^^
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 607, in rmats_long
process_genes(genes_to_process, temp_dir, sorted_paths,
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 103, in process_genes
process_genes_with_handles(
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 144, in process_genes_with_handles
process_gene(gene, temp_files_for_gene['abundance'],
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 251, in process_gene
classify_isoforms(gene_id, out_dir, updated_gtf, gencode_gtf,
File "/rMATS-long_v_1_0_0/scripts/rmats_long.py", line 334, in classify_isoforms
rmats_long_utils.run_command(command)
File "/rMATS-long_v_1_0_0/scripts/rmats_long_utils.py", line 278, in run_command
subprocess.run(command, check=True)
File "/miniconda3/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/miniconda3/bin/python', '/rMATS-long_v_1_0_0/scripts/classify_isoform_differences.py', '--updated-gtf', '/variant/male4/rmats_long_n1y594i8_tmp/ENSG00000131469.15_updated.gtf', '--out-tsv', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12.tsv', '--main-transcript-id', 'ENST00000589913.6', '--second-transcript-id', 'ENST00000253788.12']' returned non-zero exit status 1.

@EricKutschera
Copy link
Contributor

It looks like the main error was:

subprocess.CalledProcessError: Command '['/miniconda3/bin/python', '/rMATS-long_v_1_0_0/scripts/FindAltTSEvents.py', '-i', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12guif3i92_tmp_isoform.gtf', '-o', '/variant/male4/ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12.tsv']' died with <Signals.SIGKILL: 9>.

I was able to run FindAltTSEvents.py for those two transcripts based on the gencode v46 gtf and got this output:

transcript1	transcript2	event	coordinates
ENST00000253788.12	ENST00000589913.6	RI	chr17:42998472:42998748:+

I don't know why that command would get a SIGKILL. It may have been due to something else happening on your system. If you run rmats_long.py again do you get the same error for the same two transcript IDs?

@border-info-nt
Copy link
Author

If you run rmats_long.py again do you get the same error for the same two transcript IDs?

Yes, Can I remove it?

transcript1 transcript2 event coordinates
ENST00000253788.12 ENST00000589913.6 RI chr17:42998472:42998748:+

When this ID is reached, the system suddenly stops.
We do not know the reason for this.
I am having trouble finding out if it is a lack of CPU or if there is a problem with the file itself.
Can you suggest a solution?

@EricKutschera
Copy link
Contributor

If you replace os.remove(temp_file) with pass at this line: https://github.com/Xinglab/rMATS-long/blob/v1.0.0/scripts/classify_isoform_differences.py#L61

Then if you run the rmats_long.py command again it will not delete the file like ENSG00000131469.15_isoform_differences_ENST00000589913.6_to_ENST00000253788.12guif3i92_tmp_isoform.gtf used in the FindAltTSEvents.py command shown in the error message. When you have that tmp gtf file you can try the FindAltTSEvents.py and see if it gets a SIGKILL. If it does then can you post the tmp gtf file? Or if that doesn't reproduce the error can you post the rmats_long.py command you are using and the input files you are using with that command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants