Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi_to_multi_fast5 #65

Open
nick-youngblut opened this issue Nov 28, 2021 · 2 comments
Open

multi_to_multi_fast5 #65

nick-youngblut opened this issue Nov 28, 2021 · 2 comments

Comments

@nick-youngblut
Copy link

single_to_multi_fast5 can be used to reduce the number of files per sequencing run (eg., 100's of 1000's down to just 1000's via selecting the appropriate --batch_size). If one would want to change the number of sequences per fast5 (eg., to further reduce the total number of files), one cannot use single_to_multi_fast5 again on the mullti-fast5 files with a larger --batch-size.

It would be helpful to add a script (e.g., multi_to_multi_fast5) that could alter the number of sequences per fast5 file: either by combining sequences or splitting them, depending on the total number of fast5 files that the user wants.

@fbrennen
Copy link
Contributor

Hi @nick-youngblut -- you can do this with fast5_subset, though it will require you to give it a list containing all the read_ids you currently have (which I believe you should be able to easily generate from your call to single_to_multi_fast5). We can certainly look into allowing the read_id list from fast5_subset to be optional, at which point it will do exactly what you're after.

@nick-youngblut
Copy link
Author

Thanks for pointing out that option. I was looking for a computationally efficient and straight-forward way of changing the number of sequences per fast5 (more or less seqs per file) -- a split/aggregate script. I'm guessing that most just use the now default 4k sequences per fast5 and never want to change it, so maybe 4k-per-file is optimal for most/all situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants