Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore optimization of machine learning pipeline #7

Open
jessept opened this issue Nov 2, 2016 · 3 comments
Open

Explore optimization of machine learning pipeline #7

jessept opened this issue Nov 2, 2016 · 3 comments

Comments

@jessept
Copy link
Collaborator

jessept commented Nov 2, 2016

The current classifier pipeline takes a long time to fit and may be fitting the same model multiple times. This should be looked at in hopes of finding some low-hanging fruit in performance gains.

@dhimmel
Copy link
Member

dhimmel commented Nov 2, 2016

The pipeline used to be much faster when we (incorrectly) did feature selection and standardization prior to cross validation (grid search). The issue is that sklearn reperforms these tranformations verbatim when they could be memoized: see scikit-learn/scikit-learn#7536 (comment).

There are two potential solutions:

  1. Use grid search from dask-learn (dklearn). dask-learn uses dask in the background. I'm excited about dask, but dask-learn development appears to have stalled and the pull request to include dask-learn in dask petered out. However, it may still be functional.

  2. @jnothman may know of a solution based on Finding which features are passed to the final estimator of an sklearn pipeline scikit-learn/scikit-learn#7536 (comment) where he said:

    we've seen a couple of attempted contributions, as well as my generic remember_model wrapper which I've never formally submitted as a PR

@jnothman
Copy link

jnothman commented Nov 7, 2016

Pipeline memoising has been implemented in:

My more generic model memoiser requires scikit-learn/scikit-learn#5080, which I may push for again at some point :)

@jnothman
Copy link

jnothman commented Nov 7, 2016

I think there's still potential for something like scikit-learn/scikit-learn#3951 to be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants