Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bundle #586

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

WIP: bundle #586

wants to merge 13 commits into from

Conversation

rhubert
Copy link
Contributor

@rhubert rhubert commented Sep 23, 2024

Bundle the results of all checkoutSteps needed to build a package and provide a bundle-config file to be able to rebuild the package only with the bundle.

Such a bundle can be used to build on a air-gapped system or to archive all sources of a build. When building from the bundle only the bundle is extracted but no other checkoutScripts are run.

There is one issue with the actual URL-Scm extraction. The source workspace contains both, the original downloaded file and the extracted sources. This unnecessarily doubles the size of the bundle and - since the bundle-extraction uses the UrlScm as well - produces a different workspace hash when the bundle is extracted. That's why I changed to download the original file into workspace/../_download where also the .extracted file is placed. This change makes the unittest-failing ATM as the ../_download folder is always the same when using a tempDir.

I'm not sure how to proceed here:

  • fix the unit-tests
  • use a workspace folder (e.g. workspace/.bob-download/) for the downloaded files and exclude this directory when hashing / bundling?
  • leave the download location as is and ignore the downloaded + .extracted file?
  • ...?

@jkloetzke
Copy link
Member

I would argue that the tarball download optimization is some welcome but unrelated optimization. I would move it to some separate PR that should probably be merged first.

Reusing the URL SCM for the bundles is IMHO not the right approach. It should instead work like the archive stuff. Right now binary artifacts are used for saving or restoring package steps. What we need here is to save and restore checkout steps. In the best case we can build on the archive module and reuse most code from there.

From a more general angle: should this work for indeterministic checkouts too? I would argue against this and only bundle deterministic checkouts. But if there is a good reason to decide otherwise I'm open to it. It's just that my gut feeling is that it will get nasty to get all corner cases correct...

Copy link

codecov bot commented Sep 30, 2024

Codecov Report

Attention: Patch coverage is 89.90826% with 22 lines in your changes missing coverage. Please review.

Project coverage is 88.93%. Comparing base (2d829b4) to head (198136b).

Files with missing lines Patch % Lines
pym/bob/archive.py 84.93% 11 Missing ⚠️
pym/bob/scm/url.py 92.45% 8 Missing ⚠️
pym/bob/builder.py 92.59% 2 Missing ⚠️
pym/bob/cmds/build/build.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #586      +/-   ##
==========================================
+ Coverage   88.86%   88.93%   +0.06%     
==========================================
  Files          48       48              
  Lines       15474    15646     +172     
==========================================
+ Hits        13751    13914     +163     
- Misses       1723     1732       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rhubert
Copy link
Contributor Author

rhubert commented Jan 4, 2025

I added a new implementation using the archive stuff. Since adding files to the final bundle using tarfile can not be done in parallel this finalization step is necessary. Maybe this could be optimized somehow but it seams to be impossible to synchronize this asyncio stuff. It's either unable to pickle asyncio.future objects, or the lock is generated in the wrong loop. 🤷
With this finalization method it's somehow simple and works - I don't think time is that much relevant when a bundle is packed.

To get the blackbox test working #606 is required. As of today I haven't tested bundling / unbundling a larger, real world project. I'll do this in the next days.

rhubert and others added 13 commits January 14, 2025 07:13
Separate the downloaded files from the extracted files by downloading
them into a `download` directory next to the workspace. Also the canary is
generated there. The Gzip and XZ-Extractor always extract the files into
the directory of the compressed file. Therefor the compressed files is
copied into the workspace-directory first. By removing `-k`the
compressed files are no longer kept.

To trigger a attic move of old workspaces a version information is added
to the url-scm spec.
Delete the download directory if the scm-workspace is moved to attic.
'upload' and 'download' did not fit for all archives.
This special archive is used to bundle checkoutWorkspaces into a single
tar-file.
And use the builder functions to enable and finish bundling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants