feat: For long ML tasks, make intermediate saves #1024

sylvaincom · 2024-12-27T10:48:29Z

Is your feature request related to a problem? Please describe.

As a data scientist, I might launch some long ML tasks on a server that is bad and I might loose all my results if the server crashes.
Got this issue from a user interview.

Describe the solution you'd like

Save some intermediate results. For example, if you do a cross-validation with 5 splits, we could store at least the 1st split before everything has finished running, so that you have at least the 1st split if it crashes in the middle of the 2nd split.

Related to #989

Edit: neptune.ai does continued tracking (but for foundation models)

glemaitre · 2024-12-30T09:25:39Z

joblib.Memory allows to cache results. In scikit-learn, the Pipeline exposes a memory parameter to allow for such behaviour. It would be cool to go at the estimator level to have some aggressive caching. But it is not a straightforward task because sometimes hashing the inputs is more costly than just calling the function itself.

So if skore could make a sensible caching mechanism into place.

Example regarding the caching: https://joblib.readthedocs.io/en/stable/auto_examples/memory_basic_usage.html#sphx-glr-auto-examples-memory-basic-usage-py

In #997, there is an in-memory caching mechanism. Persisting it on the disk would be useful to avoid the recomputing some of the results.

sylvaincom added enhancement New feature or request needs-triage This has been recently submitted and needs attention user-reported labels Dec 27, 2024

tuscland removed the needs-triage This has been recently submitted and needs attention label Jan 3, 2025

augustebaum changed the title ~~feat: For long ML tasks, make same intermediate saves~~ feat: For long ML tasks, make intermediate saves Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: For long ML tasks, make intermediate saves #1024

feat: For long ML tasks, make intermediate saves #1024

sylvaincom commented Dec 27, 2024 •

edited

Loading

glemaitre commented Dec 30, 2024

feat: For long ML tasks, make intermediate saves #1024

feat: For long ML tasks, make intermediate saves #1024

Comments

sylvaincom commented Dec 27, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

glemaitre commented Dec 30, 2024

sylvaincom commented Dec 27, 2024 •

edited

Loading