Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyCivetServer Seems to Contain Unprotected Critical Sections #1820

Open
cameronbracco opened this issue Dec 12, 2024 · 1 comment
Open

MyCivetServer Seems to Contain Unprotected Critical Sections #1820

cameronbracco opened this issue Dec 12, 2024 · 1 comment

Comments

@cameronbracco
Copy link

Background Context

We recently encountered an issue where seemingly multiple connections to the Trick Web Server, specifically to the Web Socket-based Variable Server API, caused our entire simulation which was previous running at real time to instead run extremely slowly (~1/600th real time). This continued for some time, with short jumps in performance every 30 seconds before it return to running slowly. From profiling the code during this time we identified that the Trick sim process was barely using any CPU and appeared to be blocked. Eventually (after closing connections and waiting), the sim began to catch up and eventual resumed real-time operation (note the graph below shows two such instances of this failure happening in a long durance sim run):

image

While we were able to reproduce this issue occasionally, we have been unable to reproduce it reliably even with dedicated scripts and as such the above description is mainly for context than anything else. If anyone has ideas about this failure I would be interested in hearing them, but we have gone in a different direction for now.

Overview

While investigating the above failure we were able to narrow it down to connections to the Trick Web Server. Digging into the code, I noticed that while almost all of the functions in MyCivetServer.cpp lock a mutex before accessing the connection map, the two functions responsible for handling messages do not:

https://github.com/nasa/trick/blob/master/trick_source/web/CivetServer/src/MyCivetServer.cpp#L374-L394

In certain cases, I believe this could cause issues if a connection was removed (logic that is wrapped in a critical section) when simultaneously the code was attempting to process a message from that connection.

I cannot claim that this sort of race condition would cause the behavior that originally caused us to look at this code, but I figured it would be good to report it anyways

@sharmeye
Copy link
Contributor

Thanks for letting us know. The web server interface isn't very commonly used, so I'm not too surprised there are, shall we say "inefficiencies", in that code. We'll look into it and see if we can assess the problem and improve the performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants