-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train/Validate/Test Clarification #97
Comments
Hey @lcaronson, thanks for your interest in using MIScnn!
The DSC of 0.9544 for the kidney segmentation were automatically computed with our cross-validation function (https://github.com/frankkramer-lab/MIScnn/blob/master/miscnn/evaluation/cross_validation.py). With default parameters -> without any callbacks, no validation monitoring is performed by which the returning cross-validation set can be used as testing sets. However, you can always run a prediction call by yourself and then compute the associated DSCs.
For the KiTS19, we didn't used any validation set and computed our scores purely on the testing sets from the 3-fold cross-validation. Also we only used a subset of 120 samples. -> 3x (80 train & 40 test) However, in our more recent covid-19 segmentation based on limited data, we used a cross-val (train/val) and testing strategy. In this study, we performed a 5-fold cross-validation on only 20 samples and computed 5 models (each fold returning a model).
Sadly, there is no clear answer to this question. Personally, I would highly recommend for a 80/20 percentage split into train/test and then run a 3-fold or 5-fold cross-validation on the 80% training data. For testing then utilizing ensembling learning techniques. This is the state-of-the-art approach and will gain great performance. Hope that I was able to give you some insights/feedback! :) Cheers, |
Hi, please how did you average these predictions into one? Did you just average the metrics computed from the 5 predictions for each sample? |
Hey @emmanuel-nwogu, correct. In this study, we just averaged the predictions pixelwise via mean. Cheers, |
Thanks for the reply. From my understanding, you average the predicted binary masks to generate a final prediction mask. Is there a common name for this in the literature? |
Happy to help! :) Absolutely correct! Sadly, to my knowledge, there is no community-accepted name for functions to combine predictions originating from ensemble learning. Last year, we published an experiment analysis about ensemble learning in medical image classification, in which I called the combination methods to merge multiple predictions pooling functions and the averaging as mean (which can be either unweighted as well as weighted). Hope this explains/helps a little bit on general ensemble learning in biomedical image analysis. Best Regards, |
Thanks, I'll check it out. :) |
Hello,
I just have a couple of clarifying questions to ask.
I am a little bit unclear how you come up with final 0.9544 Dice coefficient value in the published MIScnn paper. Is there some kind of additional function that can be used to compare the test data to predictions? Or is that value returned during the cross-validation phase?
If the cross-validation phase is also doing the testing of the data, then how do we define the ratio of train/validate/test data? For example, my understanding is that of the roughly 300 studies in the KiTS19 dataset, you did 80/90/40 ratio? I am just trying to figure out how you set these parameters in the code.
As a final question for you, if I have a dataset of 60 studies, would a decent train/validate/test ratio be 30/15/15?
The text was updated successfully, but these errors were encountered: