Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running qDup hangs #178

Open
galderz opened this issue Feb 2, 2023 · 10 comments
Open

Running qDup hangs #178

galderz opened this issue Feb 2, 2023 · 10 comments

Comments

@galderz
Copy link
Contributor

galderz commented Feb 2, 2023

Other times running helloworld example just hangs, e.g.

"Thread-2" #19 daemon prio=5 os_prio=31 cpu=1.57ms elapsed=40.46s tid=0x000000012b84ee00 nid=0xa603 runnable  [0x000000017156e000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes([email protected]/Native Method)
	at java.io.FileInputStream.read([email protected]/FileInputStream.java:276)
	at java.io.BufferedInputStream.read1([email protected]/BufferedInputStream.java:282)
	at java.io.BufferedInputStream.read([email protected]/BufferedInputStream.java:343)
	- locked <0x000000061cc17288> (a java.io.BufferedInputStream)
	at sun.nio.cs.StreamDecoder.readBytes([email protected]/StreamDecoder.java:270)
	at sun.nio.cs.StreamDecoder.implRead([email protected]/StreamDecoder.java:313)
	at sun.nio.cs.StreamDecoder.read([email protected]/StreamDecoder.java:188)
	- locked <0x000000061cc17bc0> (a java.io.InputStreamReader)
	at java.io.Reader.read([email protected]/Reader.java:197)
	at java.io.InputStreamReader.read([email protected]/InputStreamReader.java:157)
	at java.util.Scanner.readInput([email protected]/Scanner.java:882)
	at java.util.Scanner.findWithinHorizon([email protected]/Scanner.java:1796)
	at java.util.Scanner.hasNextLine([email protected]/Scanner.java:1610)
	at io.hyperfoil.tools.qdup.JsonServer.lambda$start$1(JsonServer.java:199)
	at io.hyperfoil.tools.qdup.JsonServer$$Lambda$302/0x0000000800ec5eb8.run(Unknown Source)
	at java.lang.Thread.run([email protected]/Thread.java:833)
...
"qdup-command-0" #22 prio=5 os_prio=31 cpu=3.44ms elapsed=40.40s tid=0x000000013cc25400 nid=0x8703 waiting on condition  [0x0000000171d9e000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x0000000603469808> (a java.util.concurrent.FutureTask)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
	at java.util.concurrent.FutureTask.awaitDone([email protected]/FutureTask.java:447)
	at java.util.concurrent.FutureTask.get([email protected]/FutureTask.java:190)
	at java.util.concurrent.AbstractExecutorService.invokeAll([email protected]/AbstractExecutorService.java:252)
	at io.hyperfoil.tools.qdup.cmd.Dispatcher.invokeAll(Dispatcher.java:317)
	at io.hyperfoil.tools.qdup.Run.queueRunScripts(Run.java:852)
	at io.hyperfoil.tools.qdup.Run.nextStage(Run.java:300)
	at io.hyperfoil.tools.qdup.Run.postStop(Run.java:258)
	at io.hyperfoil.tools.qdup.cmd.Dispatcher.lambda$checkActiveCount$14(Dispatcher.java:462)
	at io.hyperfoil.tools.qdup.cmd.Dispatcher$$Lambda$375/0x0000000800f1d6f0.accept(Unknown Source)
	at java.lang.Iterable.forEach([email protected]/Iterable.java:75)
	at io.hyperfoil.tools.qdup.cmd.Dispatcher.lambda$checkActiveCount$15(Dispatcher.java:462)
	at io.hyperfoil.tools.qdup.cmd.Dispatcher$$Lambda$374/0x0000000800f1d4c8.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run([email protected]/Thread.java:833)

"qdup-command-1" #23 prio=5 os_prio=31 cpu=409.77ms elapsed=40.39s tid=0x000000013c995400 nid=0xa103 waiting on condition  [0x0000000171faa000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x00000006029a3f98> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/AbstractQueuedSynchronizer.java:1047)
	at java.util.concurrent.Semaphore.acquire([email protected]/Semaphore.java:318)
	at io.hyperfoil.tools.qdup.SshSession.shConnecting(SshSession.java:824)
	at io.hyperfoil.tools.qdup.SshSession.connect(SshSession.java:618)
	at io.hyperfoil.tools.qdup.SshSession.<init>(SshSession.java:285)
	at io.hyperfoil.tools.qdup.Run.lambda$queueRunScripts$18(Run.java:813)
	at io.hyperfoil.tools.qdup.Run$$Lambda$382/0x0000000800f1f098.call(Unknown Source)
	at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run([email protected]/Thread.java:833)

"sshd-SshClient[7ffc78f9]-timer-thread-1" #24 daemon prio=5 os_prio=31 cpu=18.92ms elapsed=40.15s tid=0x000000013c995a00 nid=0x8903 waiting on condition  [0x00000001721b6000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
	- parking to wait for  <0x0000000602f35200> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos([email protected]/LockSupport.java:252)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos([email protected]/AbstractQueuedSynchronizer.java:1672)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:1182)
	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take([email protected]/ScheduledThreadPoolExecutor.java:899)
	at java.util.concurrent.ThreadPoolExecutor.getTask([email protected]/ThreadPoolExecutor.java:1062)
	at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1122)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
	at java.lang.Thread.run([email protected]/Thread.java:833)

Full thread dump here.

This has been observed with both macOS and fedora clients on fedora server. Java version 17.

@johnaohara
Copy link
Member

johnaohara commented Feb 2, 2023

After enabling trace logging, I am seeing that executing this command hangs qDup on machines that are not using bash/sh for the terminal: https://github.com/Hyperfoil/qDup/blob/master/src/main/java/io/hyperfoil/tools/qdup/SshSession.java#L618

On a target machine using bash, the response buffer content from the remote machine returns are something like: ^.........[?2004h[user@localhost..]$. , whereas for non-bash shells, the response buffer contains ^.........[0m.[49m.[39m.[27m.[24m.[1m.[7m%.[27m.[1m.[0m.........

qDup can not decode the response from the remote machine correctly

The remote machine that caused the hang was running https://github.com/romkatv/powerlevel10k on top of zsh, using zsh also causes qDup to hang

@zakkak
Copy link
Contributor

zakkak commented Dec 1, 2023

I have had a similar issue in the past with Emacs in tramp mode and the work around there is to set a different prompt depending on the value of the TERM environment variable.

For instance I have the following in my .zshrc

case "$TERM" in
    xterm*|rxvt*|eterm*|screen*|st-256color*)
        # PS1="my fancy multi-line prompt > "
        ;;
    *)
        PS1="> "
        ;;
esac

Which sets the prompt to > when TERM doens't match any of the xterm*|rxvt*|eterm*|screen*|st-256color*.

Is there a way in qDup to set an environment variable for the ssh session?
It could help as a workaround till this gets fixes.

@zakkak
Copy link
Contributor

zakkak commented Dec 1, 2023

After having a look at the code, I now understand that qDup is setting the prompt itself so the above comment is irrelevant, but I think I figured out the actual issue.

The prompt used by qDup, <_#%@_qdup_@%#_>, is subject to expansion by default in zsh resulting in a prompt like <_#10:02_qdup_@%_> , as %@ is being expanded to the current time of the day [1]. This results in qDup failing to find the suffix in the SuffixStream, thus waiting forever.

Disabling the expansion when zsh is detected should allow qDup to set the prompt as expected. To do so one needs to run setopt NO_PROMPT_SUBST; setopt NO_PROMPT_PERCENT; setopt NO_PROMPT_BANG;

[1] https://zsh.sourceforge.io/Doc/Release/Prompt-Expansion.html#Date-and-time

@willr3
Copy link
Collaborator

willr3 commented Dec 7, 2023

I think we should change the prompt to something that does not expand by default. We picked <_#%@_qdup_@%#_> because it stood out when scanning the run.log and because it was unlikely to occur as output from other shell interaction.
I have to admit, i don't use zsh and we have not run tests on zsh in the past but I will see if I can recreate this in a container.

@willr3
Copy link
Collaborator

willr3 commented Dec 7, 2023

@zakkak Would changing the prompt to <_#_qdup_#_> trigger any default completion in zsh?
We still need to address the additional setopt commands but I would like to create a release that has a functional prompt while we explore better ways to support zsh.

@zakkak
Copy link
Contributor

zakkak commented Dec 7, 2023

@zakkak Would changing the prompt to <_#_qdup_#_> trigger any default completion in zsh?

Hi @willr3 , it seems to not trigger any default completion/expansions (AFAIK expansions are prefixed with %).

❯ export PS1='<_#_qdup_#_> '; set +o history; export HISTCONTROL="ignoreboth" 
set: no such option: history
<_#_qdup_#_> 

Note the no history option warning though.

@willr3
Copy link
Collaborator

willr3 commented Dec 8, 2023

qDup was developed and tested against bash so there are probably several places that need to be revisited when connecting to other shells.
Is there a zsh equivalent to disable the history for the session? We disable history to avoid polluting the users history with all the commands that run as part of the qDup execution.

@zakkak
Copy link
Contributor

zakkak commented Dec 8, 2023

Is there a zsh equivalent to disable the history for the session? We disable history to avoid polluting the users history with all the commands that run as part of the qDup execution.

According to man zshbuiltins it looks like this can be achieved with fc -p:

fc -p pushes the current history list onto a stack and switches to a new history list. If the -a option is also specified, this history list will be automatically popped when the current function scope is exited, which is a much better solution than creating a trap function to call `fc -P' manually. If no arguments are specified, the history list is left empty, $HISTFILE is unset, and $HISTSIZE & $SAVEHIST are set to their default values. If one argument is given, $HISTFILE is set to that file‐name, $HISTSIZE & $SAVEHIST are left unchanged, and the history file is read in (if it exists) to initialize the new list. If a second argument is specified, $HISTSIZE & $SAVEHIST are instead set to the single specified numeric value. Finally, if a third argument is specified, $SAVEHIST is set to a separate value from $HISTSIZE. You are free to change these environment values for the new history list however you desire in order to manipulate the new history list.

@diegolovison
Copy link
Contributor

Maybe related to: #229
I see in thread dump reported the following code:

at java.util.concurrent.Semaphore.acquire([email protected]/Semaphore.java:318)

@willr3
Copy link
Collaborator

willr3 commented Jan 8, 2025

I think the hanging observed in this issue was resolved when we changed qDups prompt to no longer include a sequence that zsh would auto-expand. This was more a "live lock" where a trigger did not occur to release the semaphore.
It looks like #229 identified a condition where the semaphore can deadlock (fail to acquire) when the scheduled threadpool is too small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants