Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition around subprocess inference state tidyup #3

Closed
wants to merge 11 commits into from

Conversation

PeterJCLaw
Copy link
Owner

@PeterJCLaw PeterJCLaw commented Jun 30, 2024

Summary

There was a race condition due to the combination of Python's object ids being re-usable and Jedi persisting such ids beyond the real lifeteime of some objects. This could lead to the subprocess' view of the lifetime of InferenceState contexts getting out of step with that in the parent process and resulting in errors when removing them. It is also possible that this could result in erroneous results being reported, however this was not directly observed.

The race was specifically:

  • InferenceState A created, gets id 1
  • InferenceStateSubprocess A' created, uses InferenceState A which it stores as a weakref and an id
  • InferenceStateSubprocess A' is used, the sub-process learns about an InferenceState with id 1
  • InferenceState A goes away, InferenceStateSubprocess A' is not yet garbage collected
  • InferenceState B created, gets id 1
  • InferenceStateSubprocess B' created, uses InferenceState B which it stores as a weakref and an id
  • InferenceStateSubprocess B' is used, the sub-process re-uses its entry for an InferenceState with id 1

At this point the order of operations between the two InferenceStateSubprocess instances going away is immaterial -- both will trigger a removal of a state with id 1. As long as B' doesn't try to use the sub-process again after the first removal has happened then the second removal will fail.

This change resolves the race condition by coupling the context in the subprocess to the corresponding manager class instance in the parent process, rather than to the consumer InferenceState.

See inline comments for further details.

Code review

Suggest review by commit. There's some small refactors in the early commits. The actual fix is the last commit on the branch.

Testing

I'm open to ideas on how to add a test for this.

Currently this is relying on Python 3.13 tests reproducing the issue fairly easily, though not entirely consistently.

Links

Currently builds on davidhalter#2003 (as it was discovered there), though it doesn't really need to be part of the upgrade process.

This should make it easier to add new entries as well as clarifying
the intent of this filter.
In Python 3.13 the `locals` function now returns a fresh mapping
each time it's called (when called in a function). We thus need
to store a reference to the mapping being used, rather than
re-fetching it each time.

Since we don't actually need to modify the locals within the scope
of the test function itself, it suffices to use our own mapping
here rather than the result of calling `locals`, which fully
isolates this test from the nature of that function.

Fixes davidhalter#2002
This moves to using the 3.13 grammar as well as testing 3.13 in CI.
Jedi passes pickles to subprocesses which are running the target
version of Python and thus may not be the same as the version
under which Jedi itself is running. In Python 3.13, pathlib is
being refactored to allow for easier extension and has thus moved
most of its internal implementation to a submodule. Unfortunately
this changes the paths of the symbols, causing pickles of those
types to fail to load in earlier versions of Python.

This commit introduces a custom unpickler which accounts for this
move, allowing bi-directional passing of pickles to work.
This removes some of the coupling between the management of the
underlying process and the inference state itself, which intends
to enable changing the origin of the id. This will be useful in
the next commit.
I'm not sure where this was used in the past, however it appears
to be unused now. Removing this simplifies a change I'm about to
make to _InferenceStateProcess.
@PeterJCLaw PeterJCLaw force-pushed the ensure-unique-subprocess-reference-ids branch 3 times, most recently from c3859c6 to 27d4938 Compare June 30, 2024 17:48
@PeterJCLaw PeterJCLaw changed the title Face race condition around subprocess inference state tidyup Fix race condition around subprocess inference state tidyup Jun 30, 2024
@PeterJCLaw PeterJCLaw force-pushed the ensure-unique-subprocess-reference-ids branch from 27d4938 to 0fb2b90 Compare June 30, 2024 20:10
This is derived from my understanding of the code, plus a bit of
experimentation.
There was a race condition due to the combination of Python's
object ids being re-usable and Jedi persisting such ids beyond
the real lifeteime of some objects. This could lead to the
subprocess' view of the lifetime of `InferenceState` contexts
getting out of step with that in the parent process and
resulting in errors when removing them. It is also possible
that this could result in erroneous results being reported,
however this was not directly observed.

The race was specifically:
- `InferenceState` A created, gets id 1
- `InferenceStateSubprocess` A' created, uses `InferenceState`
  A which it stores as a weakref and an id
- `InferenceStateSubprocess` A' is used, the sub-process learns
  about an `InferenceState` with id 1
- `InferenceState` A goes away, `InferenceStateSubprocess` A' is
  not yet garbage collected
- `InferenceState` B created, gets id 1
- `InferenceStateSubprocess` B' created, uses `InferenceState` B
  which it stores as a weakref and an id
- `InferenceStateSubprocess` B' is used, the sub-process re-uses
  its entry for an `InferenceState` with id 1

At this point the order of operations between the two
`InferenceStateSubprocess` instances going away is immaterial --
both will trigger a removal of a state with id 1. As long as B'
doesn't try to use the sub-process again after the first removal
has happened then the second removal will fail.

This commit resolves the race condition by coupling the context
in the subprocess to the corresponding manager class instance
in the parent process, rather than to the consumer `InferenceState`.

See inline comments for further details.
@PeterJCLaw PeterJCLaw force-pushed the ensure-unique-subprocess-reference-ids branch from 0fb2b90 to e8f7a4a Compare June 30, 2024 20:13
@PeterJCLaw PeterJCLaw closed this Jul 1, 2024
@PeterJCLaw
Copy link
Owner Author

-> davidhalter#2008

@PeterJCLaw PeterJCLaw deleted the ensure-unique-subprocess-reference-ids branch July 1, 2024 20:18
@PeterJCLaw
Copy link
Owner Author

Passed using Python 3.13.0-beta.2: https://github.com/PeterJCLaw/jedi/actions/runs/9734658689/job/26862957824

Testing whether 3.13.0-beta.3 might fail: https://github.com/PeterJCLaw/jedi/actions/runs/9768066424 (as have seen issues on beta 3 on davidhalter#2003)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant