Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple snp_transcripts in plot_diplotype_clustering_advanced() #703

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

jonbrenas
Copy link
Collaborator

@jonbrenas jonbrenas commented Dec 12, 2024

Resolves #600.

@KellyLBennett and Nana Amoako needed this functionality earlier today so I sped it up a bit. There are currently no tests for plot_diplotype_clustering_advanced() so the best I can say is that it worked in the notebook. I will add tests when I have time.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jonbrenas jonbrenas self-assigned this Dec 12, 2024
@jonbrenas jonbrenas marked this pull request as draft December 12, 2024 15:48
@jonbrenas jonbrenas marked this pull request as ready for review December 13, 2024 13:02
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @jonbrenas. Couple of small comments. Could you also post a screenshot of this working?

Comment on lines 564 to 566
snp_transcript: Optional[base_params.transcript] = None,
snp_transcripts: Sequence[base_params.transcript] = [],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jonbrenas, this will be a breaking API change. Instead we could leave the parameter name the same but widen the type so it can accept either an individual transcript or a sequence of transcripts.

Copy link
Collaborator Author

@jonbrenas jonbrenas Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we had this debate a while back (I don't remember which function it was for) about whether it was OK to break the API to make sure that the parameter names still make sense and (I think) we agreed that it was the case but that it would only be done for a new major version.

Given that we had a major version released 2 weeks ago, it might make more sense to keep the parameter name unchanged but change its content and do our best to remember to update it for the next major release. Is that the decision? Either way, I widened the type of snp_transcript.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why there are any changes to this file included in this PR?

Copy link
Collaborator Author

@jonbrenas jonbrenas Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnophelesDipClustAnalysis needs to inherit gene_cnv from AnophelesCnvFrequencyAnalysis (see comment below) which created a loop in the inheritance tree. Because AnophelesDipClustAnalysis is a subclass of AnophelesCnvFrequencyAnalysis and AnophelesSnpFrequencyAnalysis, when AnophelesDataResource inherits AnophelesDipClustAnalysis, it also inherits them so they don't need to be included in the list of inherited classes anymore.

from .cnv_data import AnophelesCnvData
from .cnv_frq import AnophelesCnvFrequencyAnalysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, why is this change here in this PR?

Copy link
Collaborator Author

@jonbrenas jonbrenas Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gene_cnv used to be in anopheles.py and the code for plot_diplotype_clustering_advanced contains a comment saying that it needs to be moved. This has been done and it is now in cnv_frq.py. To be able to use it as self.gene_cnv(), the class AnophelesDipClustAnalysis thus has to inherit the functions of AnophelesCnvFrequencyAnalysis. It worked previously because the API classes (Ag3 and Af1 inherit this function) but it breaks the listing and coverage tests to have a class call functions that are not part of its inheritance tree. There were no tests until now so it wasn't raising an issue but that was hardly the best solution.

It may have been clearer to create a separate issue and PR to clean up AnophelesDipClustAnalysis and create the tests but given that the changes are relatively minor I thought it would be OK to do both at the same time.

Comment on lines 554 to 556
snp_transcript="Plot amino acid variants for this transcript.",
snp_transcripts="Plot amino acid variants for these transcripts.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking API change could be avoided here, see similar comment below.

@jonbrenas
Copy link
Collaborator Author

Here is a picture (the chosen genes were Cyp6aa1 and Cyp6p15p):

Screenshot 2024-12-13 at 18 02 24

We might want to add titles to make things clearer.

…_clustering' of github.com:malariagen/malariagen-data-python into 600-adding-option-for-multiple-transcripts-to-diplotype_clustering
@jonbrenas jonbrenas marked this pull request as draft December 13, 2024 19:05
@jonbrenas jonbrenas marked this pull request as ready for review December 13, 2024 19:30
@sanjaynagi
Copy link
Collaborator

awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()
3 participants