-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()
#703
base: master
Are you sure you want to change the base?
Allow multiple snp_transcripts in plot_diplotype_clustering_advanced()
#703
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @jonbrenas. Couple of small comments. Could you also post a screenshot of this working?
malariagen_data/anoph/dipclust.py
Outdated
snp_transcript: Optional[base_params.transcript] = None, | ||
snp_transcripts: Sequence[base_params.transcript] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jonbrenas, this will be a breaking API change. Instead we could leave the parameter name the same but widen the type so it can accept either an individual transcript or a sequence of transcripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we had this debate a while back (I don't remember which function it was for) about whether it was OK to break the API to make sure that the parameter names still make sense and (I think) we agreed that it was the case but that it would only be done for a new major version.
Given that we had a major version released 2 weeks ago, it might make more sense to keep the parameter name unchanged but change its content and do our best to remember to update it for the next major release. Is that the decision? Either way, I widened the type of snp_transcript
.
malariagen_data/anopheles.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear why there are any changes to this file included in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnophelesDipClustAnalysis
needs to inherit gene_cnv
from AnophelesCnvFrequencyAnalysis
(see comment below) which created a loop in the inheritance tree. Because AnophelesDipClustAnalysis
is a subclass of AnophelesCnvFrequencyAnalysis
and AnophelesSnpFrequencyAnalysis
, when AnophelesDataResource
inherits AnophelesDipClustAnalysis
, it also inherits them so they don't need to be included in the list of inherited classes anymore.
from .cnv_data import AnophelesCnvData | ||
from .cnv_frq import AnophelesCnvFrequencyAnalysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, why is this change here in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gene_cnv
used to be in anopheles.py
and the code for plot_diplotype_clustering_advanced
contains a comment saying that it needs to be moved. This has been done and it is now in cnv_frq.py
. To be able to use it as self.gene_cnv()
, the class AnophelesDipClustAnalysis
thus has to inherit the functions of AnophelesCnvFrequencyAnalysis
. It worked previously because the API classes (Ag3
and Af1
inherit this function) but it breaks the listing and coverage tests to have a class call functions that are not part of its inheritance tree. There were no tests until now so it wasn't raising an issue but that was hardly the best solution.
It may have been clearer to create a separate issue and PR to clean up AnophelesDipClustAnalysis
and create the tests but given that the changes are relatively minor I thought it would be OK to do both at the same time.
malariagen_data/anoph/dipclust.py
Outdated
snp_transcript="Plot amino acid variants for this transcript.", | ||
snp_transcripts="Plot amino acid variants for these transcripts.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking API change could be avoided here, see similar comment below.
…-to-diplotype_clustering
…_clustering' of github.com:malariagen/malariagen-data-python into 600-adding-option-for-multiple-transcripts-to-diplotype_clustering
awesome. |
…-to-diplotype_clustering
Resolves #600.
@KellyLBennett and Nana Amoako needed this functionality earlier today so I sped it up a bit. There are currently no tests for
plot_diplotype_clustering_advanced()
so the best I can say is that it worked in the notebook. I will add tests when I have time.