Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests on simulated data for karyotype #700

Open
jonbrenas opened this issue Dec 11, 2024 · 1 comment
Open

Add tests on simulated data for karyotype #700

jonbrenas opened this issue Dec 11, 2024 · 1 comment

Comments

@jonbrenas
Copy link
Collaborator

Part of #689 .

For the record:

It [the `karyotype function] has tests on simulated data to be compliant with the rest of the package.

@jonbrenas
Copy link
Collaborator Author

As @alimanfoo mentioned in #702, creating the tests "could be tricky because the simulated data is not guaranteed to generate data at the tag SNP positions."

Looking quickly at the code simulating the SNP data, it looks like for each chromosome, a "size" is chosen randomly (between 50 000 and 100 000 for Ag3 and between 80 000 and 120 000 for Af1) and the positions are then assigned (starting at 1) meaning that the positions are all < 120 000, i.e., none of the tags is ever going to have simulated data.

I see a few possible solutions:

  1. Generate enough simulated SNPs to cover all the regions containing targets (i.e., use a minimum size > the highest value in the tags). That would blow up the size of the simulated data which sounds sub-optimal.
  2. Generate extra data for exactly the tags. This would cause the generation of more simulated data but not at the same scale.
  3. Use a similar method to the one used for the AIMs, i.e., generate the targets on the fly instead of using the ones from the file. This would require a bit of recoding of karyotype.py as the path to the targets is hard-coded to a path in the package (i.e., it cannot be simulated) instead of a path in the data storage (i.e., it can be simulated).

I think 3) would make the most sense but differing opinions are welcome.

Anything that I missed? Any better idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant