Insufficient cores leading to a deadlock or application hang #229

diegolovison · 2025-01-06T17:25:42Z

The application utilizes a Semaphore referred to as blockingSemaphore to control a shared resource.

AbstractShell.java

if(seconds > 0){
  acquired = blockingSemaphore.tryAcquire(seconds,TimeUnit.SECONDS);
} else {
    blockingSemaphore.acquire();//released in the observer
}

blockingConsumer = (response) -> {
    blockingResponse.setLength(0);
    blockingResponse.append(response);
    blockingSemaphore.release();
};

The ScheduledThreadPoolExecutor is configured with a limited number of cores.
Previous code: new ScheduledThreadPoolExecutor(Runtime.getRuntime().availableProcessors()/2,new ThreadFactory() {
New code: new ScheduledThreadPoolExecutor(getMinimumScheduleCorePoolSize(),new ThreadFactory() {

private static int getMinimumScheduleCorePoolSize() {
    int cores = Runtime.getRuntime().availableProcessors() / 2;
    if (cores < 3) {
        return 3;
    } else {
        return cores;
    }
}

The blockingSemaphore.acquire() and blockingSemaphore.release() are both created by the ScheduledThreadPoolExecutor mentioned above.

A situation like the following will hang if corePoolSize of the ScheduledThreadPoolExecutor is equal to 2.

scripts:
  doit:
    - sh: date
      then:
        - set-state: RUN.date
  update:
    - sh: echo 'uno' >> /tmp/foo.txt
    - sleep: 2s
    - sh: echo 'dos' >> /tmp/foo.txt
    - sleep: 2s
    - sh: echo 'tres' >> /tmp/foo.txt
    - set-state: RUN.update true
  foo:
    - sleep: 15s
    - sh: tail -f /tmp/foo.txt
      watch:
        - regex: dos
          then:
            - script:
                name: doit
                async: true
        - regex: tres
          then:
            - ctrlC
    - set-state: RUN.foo true
hosts:
  local: "<Insert host value here>"
roles:
  doit:
    hosts: [local]
    run-scripts: [update, foo]

If the number of cores is equals 2 (new ScheduledThreadPoolExecutor(2,new ThreadFactory() {) and you run the YAML above, two threads will be in the blockingSemaphore.acquire() state. Because of that, there is no space for the third thread calls the blockingSemaphore.release() and the application will hang forever.

Increasing the number of cores (e.g., to 3) resolves the issue temporarily but introduces the potential for another scenario where three cores calls the blockingSemaphore.acquire(), leading to a hang. This approach merely postpones the issue without addressing its root cause.

The text was updated successfully, but these errors were encountered:

willr3 · 2025-01-08T11:43:13Z

I can find 3 areas where tasks are submitted to the ScheduledThreadPoolExecutor:

The callback from SuffixStream. This occurs when qDup detects the prompt at the end of the stream from the remote shell. qDup uses a callback with a small delay because if there are additional characters from the remote stream then the prompt is not the end of the output from the current command. This delay catches cases like sh: echo "$PS1" && doSomething.sh The callback also runs the Sh.postRun() which calls AbstractShell.shSync to capture the exit code. The shSync call is what uses blockingSemaphore to ensure only one call to the remote shell at a time.
The sleep command pushes a task into the queue to pass execution onto the next command in the script after the desired delay. This is done through the Context
The timer on a command (usually on an sh) queues a task in the pool with the specified wait duration.

I think what you've uncovered is an error in the design. The callback from the suffix stream calling shSync means we need 2 free threads in the pool to complete the callback. 1 for running the callback with shSync and then the second for the other callback that identifies the prompt at the end of the shSync command.
I think we need to move the core callback that identifies the prompt at the end of ouptut into a separate thread pool and then execute the postRun portion of the callback in the pool with the user commads.

Does my rambling make sense or should I go back to staring at the code some more?

diegolovison mentioned this issue Jan 6, 2025

Running qDup hangs #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insufficient cores leading to a deadlock or application hang #229

Insufficient cores leading to a deadlock or application hang #229

diegolovison commented Jan 6, 2025 •

edited

Loading

willr3 commented Jan 8, 2025

Insufficient cores leading to a deadlock or application hang #229

Insufficient cores leading to a deadlock or application hang #229

Comments

diegolovison commented Jan 6, 2025 • edited Loading

willr3 commented Jan 8, 2025

diegolovison commented Jan 6, 2025 •

edited

Loading