Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: For long ML tasks, make intermediate saves #1024

Open
sylvaincom opened this issue Dec 27, 2024 · 1 comment
Open

feat: For long ML tasks, make intermediate saves #1024

sylvaincom opened this issue Dec 27, 2024 · 1 comment
Labels
enhancement New feature or request user-reported

Comments

@sylvaincom
Copy link
Contributor

sylvaincom commented Dec 27, 2024

Is your feature request related to a problem? Please describe.

As a data scientist, I might launch some long ML tasks on a server that is bad and I might loose all my results if the server crashes.
Got this issue from a user interview.

Describe the solution you'd like

Save some intermediate results. For example, if you do a cross-validation with 5 splits, we could store at least the 1st split before everything has finished running, so that you have at least the 1st split if it crashes in the middle of the 2nd split.

Related to #989

Edit: neptune.ai does continued tracking (but for foundation models)

@sylvaincom sylvaincom added enhancement New feature or request needs-triage This has been recently submitted and needs attention user-reported labels Dec 27, 2024
@glemaitre
Copy link
Member

joblib.Memory allows to cache results. In scikit-learn, the Pipeline exposes a memory parameter to allow for such behaviour. It would be cool to go at the estimator level to have some aggressive caching. But it is not a straightforward task because sometimes hashing the inputs is more costly than just calling the function itself.

So if skore could make a sensible caching mechanism into place.

Example regarding the caching: https://joblib.readthedocs.io/en/stable/auto_examples/memory_basic_usage.html#sphx-glr-auto-examples-memory-basic-usage-py

In #997, there is an in-memory caching mechanism. Persisting it on the disk would be useful to avoid the recomputing some of the results.

@tuscland tuscland removed the needs-triage This has been recently submitted and needs attention label Jan 3, 2025
@augustebaum augustebaum changed the title feat: For long ML tasks, make same intermediate saves feat: For long ML tasks, make intermediate saves Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request user-reported
Projects
None yet
Development

No branches or pull requests

3 participants