Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advance negative data generation RNAs not involve in a interaction #19

Open
teresa-m opened this issue Apr 23, 2021 · 6 comments
Open
Assignees
Labels
question Further information is requested

Comments

@teresa-m
Copy link
Member

teresa-m commented Apr 23, 2021

Idea is to:

  • filter data for RNAs that are higly expressed but not involved in an interaction
    • Use mRNA-Seq data produced in same conditions highly expressed RNAs
    • Find expression cutoff for the RNAs
    • Create DB of RNAs involved in RRIs (position, ID, or Biotype)
    • filter so we end up with list or RNAs not involved in RRIs
  • RNAs not haveig a RRI find the protein binding profiles
  • extract potetioal binding sides by ignoring postions that are bound by proteins.
  • check that they have the same binding profile?

The bining profile can be found her for human: https://doi.org/10.1016/j.molcel.2012.05.021

@teresa-m
Copy link
Member Author

Ideas from Rolf. Not sure if I summarized it corretly?

@teresa-m teresa-m added the question Further information is requested label Apr 23, 2021
@teresa-m teresa-m self-assigned this Apr 23, 2021
@teresa-m
Copy link
Member Author

https://dorina.mdc-berlin.de/regulators -> get CLIP data from here or from Dominik

@teresa-m
Copy link
Member Author

paper:
RBP coverage: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38355
mRNA-Seq: https://www.ncbi.nlm.nih.gov/gds?LinkName=biosample_gds&from_uid=997837
... maybe better total RNA-Seq to also get ncRNAs

@teresa-m
Copy link
Member Author

teresa-m commented Jul 26, 2021

Downloade Data:
full mRNAseq:
Run | Assay Type | AvgSpotLen | Bases | BioProject | BioSample | Bytes | Center Name | Consent | DATASTORE filetype | DATASTORE provider | DATASTORE region | Experiment | GEO_Accession (exp) |Instrument | Library Name | LibraryLayout | LibrarySelection | LibrarySource | Organism | Platform | purification | ReleaseDate | Sample Name | source_name | SRA | Study | Treatment|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--|
SRR500121 |RNA-Seq | 36 | 1368141804 | PRJNA167851 | SAMN00997837 | 847624134 | GEO | public | "sra fastq" | "gs s3 ncbi" | "gs.US s3.us-east-1 ncbi.public" | SRX149162 | GSM936076 | Illumina Genome Analyzer II | GSM936076: RNAseq_mRNA | SINGLE | cDNA | TRANSCRIPTOMIC | Homo sapiens | ILLUMINA | oligo(dT) | 2012-06-06T00:00:00Z | GSM936076 | HEK293 cell culture | SRP013463| no treatment (mRNA)|

@teresa-m
Copy link
Member Author

Next steps will be to gerate the following position files:

  1. expresse positions of mRNA: take total mRNAs -> aligne using RNAstar (build in genome+ gtf of PARIS analysis) -> Blockbuster (find positions of reads over TH) -> end up with list of genomic positions
  2. list mRNA interaction postions
  3. list of protein profiling positions

Generate potenetal negative mRNA binding sides by filter out all positions of 2 and 3 in one

Next step: How to generate the negative RRIs?

@teresa-m
Copy link
Member Author

teresa-m commented Aug 31, 2021

Task on how to construct the data set are written here: Generate trainings data using context #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants