Roll algorithm back to v2.5.3, then add back features to avoid a breaking change #130

notgull · 2024-04-11T04:44:44Z

I have to apologize. I know it sounds self-deprecating, but really, I've been over the line on this particular issue. This should hopefully fix it for now.

The new implementation for the event listener, #51, has been plagued with bugs from its inception. Frankly, I've gotten sick of papering over these bugs, and I know that prod customers are almost certainly impacted but haven't reported it.

This PR reverts the implementation of event-listener back to version 2.5.3; before I made any of the sweeping architectural changes to the crate. Then I re-add new changes to this crate in a "path of least resistance" fashion. The goal is to change the internal algorithm as little as possible, as we know this algorithm is stable and robust.

The only real diversion from this path is my adding of a no_std implementation. I've added a no_std implementation based on an atomic linked list built on an amortized atomically expanded vector. Since it doesn't use concurrent-queue it fixes #109.

Let me run a "crater test" on the smol ecosystem before this is merged. Expect a "post mortem" at some point in the near future.

notgull · 2024-04-14T04:59:05Z

Paging @smol-rs/admins for reviewing this PR. Also paging @mamaicode, @jbr and @beckend as I've re-written some of your commits here.

benches/bench.rs

src/lib.rs

src/notify.rs

tests/counter.rs

zeenix · 2024-04-15T12:15:33Z

While I understand why you did it the way you did (to be able to better test/bisect the changes) but it's very hard to review a big bunch of commits like this. Also since you'll likely be squashing, can you already squash all the reverting commits into one (you can add in the description if you only reverted specific commits or it's a hard reset) and then add the new features on top (with each commit describing the change and why it's different now). After that, please do not squash merge when it's time to merge, to maintain the history.

I'm sorry for asking you to do all that but I've very limited time these days for smol (as you might have noticed). :(

I'm man enough to know when I've been wrong. The intrusive-list-based list implementation is riddled with bugs, fails MIRI, and overall increases the complexity of this crate. I've invested a lot of time and energy into improving it; effort which I feel has been wasted. So I figure I'll cut my losses and revert. This PR undoes every commit made between now and the release of event-listener version 2.5.3. The next commits will be replaying newer features in order to avoid a breaking change. Signed-off-by: John Nunley <dev@notgullnet>

@taiki-e

@taiki-e's commits since v2.5.3 aren't the problem, as they only change CI/framing details. This commit brings back all work by @taiki-e. Co-authored-by: John Nunley <[email protected]> Signed-off-by: John Nunley <[email protected]>

This commit adds back the <T> parameter on Event and EventListener. So far nothing is done with the generic parameter. However it will eventually be used to bring back tagged events. Signed-off-by: John Nunley <[email protected]>

This commit re-adds the "notify.rs" file pretty much verbatim. This allows for tagged notifications to be used, i.e. convey additional control information along with the notification. The tags are stored in the links with the notification. This also adds some tests for the new functionality. Signed-off-by: John Nunley <[email protected]>

This commit adds back a feature that allows one to use portable-atomic instead of regular atomics. Kind of useless without no-std, but it's here again. Signed-off-by: John Nunley <[email protected]>

The Listener trait was previously used to abstract over heap and stack based event listeners. However at this point we only have heap based event listeners. Still, to avoid a breaking change we need to have this trait. Other minor fixups include: - Make the locking ignore poisoning. - Reduce formatting code Signed-off-by: John Nunley <[email protected]>

For now it just creates a listener on the heap. We can rework this later, maybe to include stack listeners. For now it's best to keep it this way, I feel. Signed-off-by: John Nunley <[email protected]>

This way added earlier as a way of telling how many listeners were woken up by a notify() call. It's very simple to implement with the mutex strategy. Signed-off-by: John Nunley <[email protected]>

This also makes sure that MIRI can run here. Signed-off-by: John Nunley <[email protected]>

notgull · 2024-04-16T03:14:23Z

While I understand why you did it the way you did (to be able to better test/bisect the changes) but it's very hard to review a big bunch of commits like this. Also since you'll likely be squashing, can you already squash all the reverting commits into one (you can add in the description if you only reverted specific commits or it's a hard reset)

The reverts are all squashed into 6368046. The rest of the commits are re-adding the functionality.

then add the new features on top (with each commit describing the change and why it's different now).

Done.

After that, please do not squash merge when it's time to merge, to maintain the history.

Sounds good!

I'm sorry for asking you to do all that but I've very limited time these days for smol (as you might have noticed). :(

All good, no worries

Replay of #31 with minor modifications Signed-off-by: John Nunley <[email protected]>

- Update Cargo.toml and CHANGELOG.md - Add back the smol-rs logo (#63) Signed-off-by: John Nunley <[email protected]>

cc #48 and #59 Signed-off-by: John Nunley <[email protected]>

cc #86, it's harder to get this info on no_std so I've ignored it for now Signed-off-by: John Nunley <[email protected]>

Mirror of #114 Signed-off-by: John Nunley <[email protected]>

.github/workflows/ci.yml

Cargo.toml

Co-authored-by: Jacob Rothstein <[email protected]> Signed-off-by: John Nunley <[email protected]>

beckend · 2024-04-30T23:37:06Z

@notgull what stops you from merging this?

notgull · 2024-05-01T00:06:26Z

@notgull what stops you from merging this?

I'm waiting for reviews from other smol-rs members. I'm aware it's a big PR and that they don't have much time to review it.

notgull · 2024-05-01T02:53:06Z

Actually, come to think of it this PR might be a little much.

@smol-rs/admins Would it be preferred if I split this up into smaller PRs, starting with an initial rollback followed by adding on more features?

zeenix · 2024-05-01T18:29:08Z

@smol-rs/admins Would it be preferred if I split this up into smaller PRs, starting with an initial rollback followed by adding on more features?

I can't say for others but what makes this PR hard to review for me, is it's hard to see what really changed in the end. I understand why you did it the way you did, to be able to figure out where the regression(s) happened but now that you know, I think it would be much better if we don't go back in history, but instead add the needed fixes on top of the current main branch (i-e not doing rollbacks, revert of specific commits is of course good). WDYT?

notgull · 2024-05-04T22:20:48Z

I understand why you did it the way you did, to be able to figure out where the regression(s) happened but now that you know, I think it would be much better if we don't go back in history, but instead add the needed fixes on top of the current main branch (i-e not doing rollbacks, revert of specific commits is of course good). WDYT?

The issue is that I'm not 100% sure what the current regressions are caused by, and the current advantages of the current implementation are not worth the effort of my part to investigate them.

beckend · 2024-05-04T22:23:04Z

If you have proven stability by adding all these tests, why not call it good enough of a change and merge it.

zeenix · 2024-05-05T12:13:06Z

The issue is that I'm not 100% sure what the current regressions are caused by, and the current advantages of the current implementation are not worth the effort of my part to investigate them.

Fair enough but then how do we ensure that the regressions weren't re-added by the "new" commits? For all we know, there could be a timing issue and tests don't catch the issue(s) still present. Finding out the issue and fixing it, would be the only certain way of actually fixing the problems. I would hate for us to have to break API again and again.

zeenix · 2024-05-05T12:17:50Z

Moreover, if there are still regressions/issue, merging this PR as is, would make it super hard to bisect the history to figure out the culprit commits/changes later on.

In any case, since I'm not that involved in the project anymore and you're doing the work, I'd leave it to you how you want to proceed. I would request that you consider my advice here.

notgull · 2024-05-05T15:34:23Z

Fair enough but then how do we ensure that the regressions weren't re-added by the "new" commits? For all we know, there could be a timing issue and tests don't catch the issue(s) still present.

I'm using the async-lock rwlock contention tests as a benchmark here. This current code passes this test and master does not. I'll admit that it's not a perfect benchmark but I think it generally works as an indicator.

But, fair enough. I'll do a deep dive into the current code and see if I can figure out what the issue is.

zeenix · 2024-05-06T17:05:50Z

But, fair enough. I'll do a deep dive into the current code and see if I can figure out what the issue is.

I appreciate your very constructive attitude to my rather blunt feedback. 👍

notgull · 2024-05-16T03:15:53Z

cc #133

The notify() function contains two optimizations: - If the `inner` field is not yet initialized (i.e. no listeners have been created yet), it returns `0` from the function early. - If the atomic `notified` field indicates that the current notification would do nothing, it returns `0` early as well. `loom` testing has exposed race conditions in these optimizations that can cause notifications to be missed, leading to deadlocks. This commit removes these optimizations in the hope of preventing these deadlocks. Ideally we can restore them in the future. Closes #138 cc #130 and #133 Signed-off-by: John Nunley <[email protected]>

notgull · 2024-05-27T22:27:33Z

Obsoleted by #139

The notify() function contains two optimizations: - If the `inner` field is not yet initialized (i.e. no listeners have been created yet), it returns `0` from the function early. - If the atomic `notified` field indicates that the current notification would do nothing, it returns `0` early as well. `loom` testing has exposed race conditions in these optimizations that can cause notifications to be missed, leading to deadlocks. This commit removes these optimizations in the hope of preventing these deadlocks. Ideally we can restore them in the future. Closes #138 cc #130 and #133 Signed-off-by: John Nunley <[email protected]>

notgull force-pushed the notgull/revert branch from 0d8f8e8 to eec2f6c Compare April 14, 2024 03:30

notgull changed the title ~~The rollback~~ Roll algorithm back to v2.5.3, then add back features to avoid a breaking change Apr 14, 2024

notgull marked this pull request as ready for review April 14, 2024 04:57

fogti reviewed Apr 14, 2024

View reviewed changes

benches/bench.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/notify.rs Show resolved Hide resolved

tests/counter.rs Show resolved Hide resolved

notgull and others added 9 commits April 15, 2024 19:58

chore: Re-apply Taiki's commits

a51a210

@taiki-e's commits since v2.5.3 aren't the problem, as they only change CI/framing details. This commit brings back all work by @taiki-e. Co-authored-by: John Nunley <[email protected]> Signed-off-by: John Nunley <[email protected]>

feat: Add back tag generic

b3a4ddb

This commit adds back the <T> parameter on Event and EventListener. So far nothing is done with the generic parameter. However it will eventually be used to bring back tagged events. Signed-off-by: John Nunley <[email protected]>

feat: Add back portable-atomic feature

5b0b36a

This commit adds back a feature that allows one to use portable-atomic instead of regular atomics. Kind of useless without no-std, but it's here again. Signed-off-by: John Nunley <[email protected]>

feat: Add back listener! macro

2d62017

For now it just creates a listener on the heap. We can rework this later, maybe to include stack listeners. For now it's best to keep it this way, I feel. Signed-off-by: John Nunley <[email protected]>

feat: Add back notify() returning the event count

375e5fd

This way added earlier as a way of telling how many listeners were woken up by a notify() call. It's very simple to implement with the mutex strategy. Signed-off-by: John Nunley <[email protected]>

tests: Add a greater breadth of tests

5511324

This also makes sure that MIRI can run here. Signed-off-by: John Nunley <[email protected]>

notgull force-pushed the notgull/revert branch from 70afc24 to f659391 Compare April 16, 2024 03:13

notgull force-pushed the notgull/revert branch 3 times, most recently from 46d4236 to aa63e03 Compare April 17, 2024 00:03

notgull and others added 6 commits April 17, 2024 17:58

bench: Add basic benchmarks

1339914

Replay of #31 with minor modifications Signed-off-by: John Nunley <[email protected]>

docs: Update documentation

3430e45

- Update Cargo.toml and CHANGELOG.md - Add back the smol-rs logo (#63) Signed-off-by: John Nunley <[email protected]>

Fix build badge and update links (#45)

2dd1fbe

feat: Re-add the is_notified method

99078d1

cc #48 and #59 Signed-off-by: John Nunley <[email protected]>

m: Update fmt::Debug to produce useful output

7bc92d7

cc #86, it's harder to get this info on no_std so I've ignored it for now Signed-off-by: John Nunley <[email protected]>

feat: Add a way to get the current number of listeners

434b390

Mirror of #114 Signed-off-by: John Nunley <[email protected]>

notgull force-pushed the notgull/revert branch from aa63e03 to e9060bd Compare April 18, 2024 05:01

taiki-e reviewed Apr 18, 2024

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

Cargo.toml Outdated Show resolved Hide resolved

notgull force-pushed the notgull/revert branch 2 times, most recently from e9c1f04 to 62ce8fe Compare April 19, 2024 02:27

notgull force-pushed the notgull/revert branch from 62ce8fe to a86b9c1 Compare April 19, 2024 02:30

tests: Add further testing

23b90f3

Co-authored-by: Jacob Rothstein <[email protected]> Signed-off-by: John Nunley <[email protected]>

notgull force-pushed the notgull/revert branch from a86b9c1 to 23b90f3 Compare April 19, 2024 02:33

notgull mentioned this pull request May 27, 2024

Disable early-out notify optimizations #139

Merged

notgull closed this May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roll algorithm back to v2.5.3, then add back features to avoid a breaking change #130

Roll algorithm back to v2.5.3, then add back features to avoid a breaking change #130

notgull commented Apr 11, 2024 •

edited

Loading

notgull commented Apr 14, 2024

zeenix commented Apr 15, 2024

notgull commented Apr 16, 2024

beckend commented Apr 30, 2024

notgull commented May 1, 2024

notgull commented May 1, 2024

zeenix commented May 1, 2024

notgull commented May 4, 2024

beckend commented May 4, 2024 •

edited

Loading

zeenix commented May 5, 2024

zeenix commented May 5, 2024 •

edited

Loading

notgull commented May 5, 2024

zeenix commented May 6, 2024

notgull commented May 16, 2024

notgull commented May 27, 2024

Roll algorithm back to v2.5.3, then add back features to avoid a breaking change #130

Roll algorithm back to v2.5.3, then add back features to avoid a breaking change #130

Conversation

notgull commented Apr 11, 2024 • edited Loading

notgull commented Apr 14, 2024

zeenix commented Apr 15, 2024

notgull commented Apr 16, 2024

beckend commented Apr 30, 2024

notgull commented May 1, 2024

notgull commented May 1, 2024

zeenix commented May 1, 2024

notgull commented May 4, 2024

beckend commented May 4, 2024 • edited Loading

zeenix commented May 5, 2024

zeenix commented May 5, 2024 • edited Loading

notgull commented May 5, 2024

zeenix commented May 6, 2024

notgull commented May 16, 2024

notgull commented May 27, 2024

notgull commented Apr 11, 2024 •

edited

Loading

beckend commented May 4, 2024 •

edited

Loading

zeenix commented May 5, 2024 •

edited

Loading