-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set custom temporary directory used by FF and delete it #1093
base: master
Are you sure you want to change the base?
Set custom temporary directory used by FF and delete it #1093
Conversation
instance and delete it when the browser quits. This is a workaround for an issue with geckodriver. When the OpenWPM extension is installed via `WebDriver.install_addon()`, geckodriver makes a copy of the XPI file in TMPDIR. However, geckodriver never deletes that file. So on a stateless crawl, you end up with one copy of the XPI file for each site visited. This workaround sets TMPDIR in the environment before creating the geckodriver service, and then deletes the directory after `driver.quit()` returns in `BrowserManager.run()`. We use this indirection because we don't have access to the name of the temporary file, and it doesn't seem safe to just delete XPI files in /tmp willy-nilly.
- Temporary directory is created in `BrowserManagerHandle.launch_browser_manager`. - Temporary directory is deleted in `BrowserManagerHandle.shutdown_browser`. - Temporary directory is added to environment for `selenium.webdriver.firefox.service.Service` in `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`. Value `None` means no temporary directory has been set, `Some p` means it has been set to `p`. When the browser manager is launched, if `tmpdir` is not `None`, try to delete it, assuming that it is leftover from some failure that prevented normal deletion. Regardless, then make a new temporary directory. Make sure to set it to `None` when the temporary directory is deleted during normal cleanup.
BrowserManagerHandle.close_browser_manager.
instance and delete it when the browser quits. This is a workaround for an issue with geckodriver. When the OpenWPM extension is installed via `WebDriver.install_addon()`, geckodriver makes a copy of the XPI file in TMPDIR. However, geckodriver never deletes that file. So on a stateless crawl, you end up with one copy of the XPI file for each site visited. This workaround sets TMPDIR in the environment before creating the geckodriver service, and then deletes the directory after `driver.quit()` returns in `BrowserManager.run()`. We use this indirection because we don't have access to the name of the temporary file, and it doesn't seem safe to just delete XPI files in /tmp willy-nilly.
- Temporary directory is created in `BrowserManagerHandle.launch_browser_manager`. - Temporary directory is deleted in `BrowserManagerHandle.shutdown_browser`. - Temporary directory is added to environment for `selenium.webdriver.firefox.service.Service` in `deploy_browsers.deploy_firefox()
`BrowserManagerHandle.close_browser_manager`.
`BrowserParams.tmpdir` is now `Optional[Path]`. Value `None` means no temporary directory has been set, `Some p` means it has been set to `p`. When the browser manager is launched, if `tmpdir` is not `None`, try to delete it, assuming that it is leftover from some failure that prevented normal deletion. Regardless, then make a new temporary directory. Make sure to set it to `None` when the temporary directory is deleted during normal cleanup.
BrowserManagerHandle.close_browser_manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
thanks for you contribution.
Could you create a test where you assert that the original TMPDIR doesn't contain any .xpi and also address the mypy issues?
Or is this contribution done from your side and if I want those things I need to do them myself?
I'll try to work on these in the next couple of weeks. |
`webdriver.Firefox` in `deploy_browsers/deploy_firefox.py`.
… into delete_tmpdir
I have fixed the |
I'm a little perplexed about your suggested test. The original TMPDIR (typically |
… into delete_tmpdir
This pull request modifies OpenWPM as follows:
geckodriver
is started, the TMPDIR environment variable is set in the environment used bygeckodriver
.The proximate rationale for doing this is that when
geckodriver
adds the OpenWPM extension to FF, it makes a copy of the XPI file in its temporary directory. Unfortunately, it doesn't delete it. That means one copy of the XPI file for each browser that is launched is left behind in/tmp
. On a big stateless crawl, that's a lot of copies. We can't find out the name of the temporary file thatgeckodriver
creates to delete it ourselves, and we don't want to just delete all XPI files in/tmp
because we don't actually know to whom the all belong. This approach seems to resolve these problems.This patch is based on v.0.28.0.
See #1090.