Skip to content

Commit

Permalink
abstract -> 150 words
Browse files Browse the repository at this point in the history
  • Loading branch information
cgreene authored Nov 6, 2024
1 parent 99a154a commit 105560c
Showing 1 changed file with 3 additions and 4 deletions.
7 changes: 3 additions & 4 deletions content/01.abstract.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,9 @@ If the goal is to achieve robust performance across contexts or datasets, whenev

Guidelines in statistical modeling for genomics hold that simpler models have advantages over more complex ones.
Potential advantages include cost, interpretability, and improved generalization across datasets or biological contexts.
Gene signatures in cancer transcriptomics tend to include small subsets of genes for these reasons, and algorithms for defining signatures are often designed with these ideas in mind.
To directly test the latter assumption, that small gene signatures generalize better, we examined the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice-versa) and biological contexts (holding out entire cancer types from pan-cancer data).
We compared two simple procedures for model selection, one that exclusively relies on cross-validation performance and one that combines cross-validation performance with regularization strength.
We directly tested the assumption that small gene signatures generalize better by examining the generalization of mutation status prediction models across datasets (from cell lines to human tumors and vice-versa) and biological contexts (holding out entire cancer types from pan-cancer data).
We compared model selection between solely cross-validation performance and combining cross-validation performance with regularization strength.
We did not observe that more regularized signatures generalized better.
This result held across both generalization problems and for both linear models (LASSO logistic regression) and non-linear ones (neural networks).
When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation, instead of those that are smaller or more regularized.
When the goal of an analysis is to produce generalizable predictive models, we recommend choosing the ones that perform best on held-out data or in cross-validation instead of those that are smaller or more regularized.

0 comments on commit 105560c

Please sign in to comment.