Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Managing Temporary Files #1083

Open
MohammadMahdiJavid opened this issue Feb 2, 2024 · 2 comments
Open

Managing Temporary Files #1083

MohammadMahdiJavid opened this issue Feb 2, 2024 · 2 comments
Labels
backlog Not a priorty, but nice to have enhancement Not a bug or a feature request good-first-bug Bugs that are good for a first-time committer to tackle

Comments

@MohammadMahdiJavid
Copy link

Hi,

i'm running large crawls, but as i noticed temp files are not getting removed as sometime passes or crawls move forward

openwpm_profile_archive_{some random number} --> each almost more than 2GB

i was wondering, if i made mistake in my experiments or this feature is not implemented?

Thanks

@vringar
Copy link
Contributor

vringar commented Feb 7, 2024

So from a quick search around I can see the profile.tar getting generated here:

crash_recovery = False
# if this is restarting from a crash, update the tar location
# to be a tar of the crashed browser's history
if self.current_profile_path is not None:
# tar contents of crashed profile to a temp dir
tempdir = tempfile.mkdtemp(prefix="openwpm_profile_archive_")
tar_path = Path(tempdir) / "profile.tar"
dump_profile(
browser_profile_path=self.current_profile_path,
tar_path=tar_path,
compress=False,
browser_params=self.browser_params,
)
# make sure browser loads crashed profile
self.browser_params.recovery_tar = tar_path
crash_recovery = True
self.logger.info("BROWSER %i: Launching browser..." % self.browser_id)

Which then get used here:
elif browser_params.recovery_tar:
logger.debug(
"BROWSER %i: Loading recovered browser profile from: %s"
% (browser_params.browser_id, browser_params.recovery_tar)
)
load_profile(
browser_profile_path,
browser_params,
browser_params.recovery_tar,
)

And never cleaned up. Since the recovery_tar is by definition generated by OpenWPM, it should clean up after the browser has been restored after a crash. Doing an os.remove and unsetting browser_params.recovery_tar after it has been restored seems reasonable.

Do you have time to implement this?

@vringar vringar added good-first-bug Bugs that are good for a first-time committer to tackle enhancement Not a bug or a feature request backlog Not a priorty, but nice to have labels Feb 7, 2024
@MohammadMahdiJavid
Copy link
Author

Hi,
Thanks for your time and the great insight provided

# and previous profile path.
if success:
self.logger.debug("BROWSER %i: Browser spawn successful!" % self.browser_id)
previous_profile_path = self.current_profile_path
self.current_profile_path = browser_profile_path
if previous_profile_path is not None:
shutil.rmtree(previous_profile_path, ignore_errors=True)
if tempdir is not None:
shutil.rmtree(tempdir, ignore_errors=True)

I see here that tempdir get's removed, although the variable name looks very unreadable :)
and tempdir is the one used to create the directory

# to be a tar of the crashed browser's history
if self.current_profile_path is not None:
# tar contents of crashed profile to a temp dir
tempdir = tempfile.mkdtemp(prefix="openwpm_profile_archive_")
tar_path = Path(tempdir) / "profile.tar"

I think the issue would be from the profiling since it get's removed when spawn is successful and by looking more into the logs I realized

there are different errors like


  File "openwpm/commands/profile_commands.py", line 58, in dump_profile
    tar.add(browser_profile_path, arcname="")
    
  File "python3.9/tarfile.py", line 2172, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
    
  File "python3.9/tarfile.py", line 2150, in add
    tarinfo = self.gettarinfo(name, arcname)
    
  File "python3.9/tarfile.py", line 2023, in gettarinfo
    statres = os.lstat(name)
    
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/firefox_profile_mp57p7k5/prefs-41.js'

or similar errors for other files like

prefs-41.js

storage.sqlite-journal

WebDriverBiDiServer.json

I was wondering when the profile is being dumped, if the previous browser is crashed and closed, right? does it need a few seconds maybe to remove temp files or something like this?

i think this should be the issue of "not removed archived profiles"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Not a priorty, but nice to have enhancement Not a bug or a feature request good-first-bug Bugs that are good for a first-time committer to tackle
Projects
None yet
Development

No branches or pull requests

2 participants