-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasionally stuck WebTransport streams #1451
Comments
Yes I'm interested. Each event should be atomically written. If you're able to pinpoint some issue in the qlog crate I'm happy to take a look. |
If you're waiting on me @LPardue .. I'm afraid I can't help you with that, sorry. :/ I haven't used the If you have some specific instructions, e.g. on how to capture a more detailed trace with Firefox, or questions about my reproducer, I'm all ears though! 😄 |
Thanks @ansemjo , I think we need on of the mozilla team to give a pointer how Firefox uses the qlog crate. It seems to me like there could be a multi-thread problem in how the app accesses qlog access that is causing a single even to be truncated. I'm not familiar enough with the neqo code base and lack the time right now to spend on figuring that out, but I am very interested in getting to the root cause and a fix. |
This is slightly offtopic to the "corrupted logs" issue but in my original issue I tried capturing some more logs to find the cause. The last capture in quic-go/webtransport-go#84 (comment) shows that both To me, that sounds like an issue somewhere at the interface between |
On closer inspection, neqo is using an old version of the qlog crate that provided a hack-ish way to do streaming JSON serialization,this is a possible cause of the weird logs. Since 0.4.0 we've moved to the JSON-SEQ format which could possibly fix this problem, see the qlog bump PR #1454 |
We're now on qlog 0.12.0. Closing this under the assumption that that has addressed the issue. |
I've checked my reproducer above and I don't have any corrupted qlog files anymore; all of them can be imported fine in qvis. However, the "blocked streams" from my original issue still occur. |
Thanks for the update. Reopening this. |
Thanks. I'm not sure this here is the right issue for it though? I've held off on reporting it to Mozilla directly because I didn't have the time for it back when I found it and I've personally only tested it with this one quic-go server. This particular issue was about the broken qlog files - and those seem to be fixed. |
Well, this repo is a Mozilla repo :-) I'll leave it up to you if you want to close this and reopen it when you have confirmed it is our issue. |
You're right. I was thinking of the Bugzilla issue tracker though. :P Sorry for staying silent on this for so long. I just updated the dependencies in my reproducer and retried with Firefox 124.0.1 on Linux. The issue still occurs, albeit much less frequently! Here are two fresh qlog files from Firefox and quic-go, where the unidirectional stream is stuck after sending 22 blobs or about 2 MB:
I guess, in order to further narrow it down to an actual Firefox issue:
It might also be entirely unrelated to the QUIC code and instead be an issue with AsyncIterators? At least the simple const reader = stream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = value;
received += chunk.length;
console.log(...prefix, "CHUNK", received, chunk);
}; Could you maybe try to reproduce the issue on your end first? |
Am trying to. Am following the steps in your reproducer. What is |
Ah, sorry, that is simply opening the URL in a web browser (Firefox in this case). On Linux with a Gnome Desktop that usually opens any compatible program, which can handle |
Ah, got it. Is there any way to just test this with the neqo-client demo client? Would be much easier to debug. |
That's among my TODOs to narrow it down; haven't tried that yet and I haven't worked with neqo directly either yet. You can see a client implementation in Go in the gist but as far as I remember, I couldn't reproduce it with a |
I seem to able to reproduce this, but I am not entirely sure. |
Your screenshot shows a success, actually. Meaning your WebTransport stream did not hang. You can see that because the logged lines contain the total number of bytes received, which should be 32 * 100 KiB = 3276800 at the end. The server window is a better indicator because the output should just stop after some blob without ever printing the In the latest commit (revision 12 in the gist) I added a SUCCESS message to the browser console output. When you don't see that, your stream is stuck. Edit: I realize I should be clearer with my messages. 🙃 So, just to be abundantly clear: when you do see a SUCCESS or 3276800 received bytes, you did not hit this issue. When that's missing and your stream is stuck, that's when you ... successfully reproduced the issue. Sorry, I need to fix that wording when I'm home. I'm on another computer with Windows right now and reproduced it after a few tries. The binary was compiled with |
Sure!
And how it looked in the terminal:
And finally, the browser console: I hope that's helpful to you. |
CC @jesup |
Hi! I opened an issue in
quic-go/webtransport-go
two days ago because certain BidirectionalStreams got stuck when I sent too much data in a short interval and I suspected the Go server at first. However, I couldn't reproduce this issue with neither a Go client, nor Chromium. The reproducer I used is in this Gist of mine.@marten-seemann suggested that I record a qlog and visualize it with qvis. I set
network.http.http3.enable_qlog
in Firefox'about:config
and tried to visualize the log. But the recorded JSON is broken / corrupt. I've attached such a log from opening the reproducer above in a fresh Firefox profile here: localhost.qlog.txtOpening the file in VSCode highlights a few places where it looks like maybe writes to logfile were interleaved, so the objects were not properly printed?
I'm not sure how I can debug this further and I haven't worked with
neqo
directly yet but this seemed like the most sensible place to report this issue? For now, it seems I found a workaround for my initial issue, anyway.The text was updated successfully, but these errors were encountered: