We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garnet server becomes unresponsive (deadlock) upon multiple cluster meet operations, when started with very few threads (4).
./GarnetServer --port 6000 --checkpointdir /tmp/checkpoints/6000 --cluster True --minthreads 4 --maxthreads 4 ./GarnetServer --port 6001 --checkpointdir /tmp/checkpoints/6001 --cluster True --minthreads 4 --maxthreads 4 ./GarnetServer --port 6002 --checkpointdir /tmp/checkpoints/6002 --cluster True --minthreads 4 --maxthreads 4 ./GarnetServer --port 6003 --checkpointdir /tmp/checkpoints/6003 --cluster True --minthreads 4 --maxthreads 4 ./GarnetServer --port 6004 --checkpointdir /tmp/checkpoints/6004 --cluster True --minthreads 4 --maxthreads 4 ./GarnetServer --port 6005 --checkpointdir /tmp/checkpoints/6005 --cluster True --minthreads 4 --maxthreads 4
redis-cli -p 6000 127.0.0.1:6000 > cluster meet 127.0.0.1 6001 OK 127.0.0.1:6000 > cluster meet 127.0.0.1 6002 OK 127.0.0.1:6000 > cluster meet 127.0.0.1 6003 OK 127.0.0.1:6000 > cluster meet 127.0.0.1 6004 OK 127.0.0.1:6000 > cluster meet 127.0.0.1 6005 <-- This one got stuck in my experiment.
The server should remain responsive.
v1.0.47
No response
I took the thread snapshot of the stuck Garnet server.
Thread (0x4B33) has acquired the lock in ClusterManager and is stuck while writing to the file (nodes.conf).
The other threads (0x45FD, 0x4B3B, etc.) are stuck waiting for the ClusterManager lock itself.
Thread (0x4231): [Native Frames] System.Private.CoreLib!System.Threading.Thread.Sleep(int32) GarnetServer!Garnet.Program.Main(class System.String[]) Thread (0x4248): [Native Frames] System.Private.CoreLib!System.Threading.Monitor.Wait(class System.Object,int32) Microsoft.Extensions.Logging.Console!Microsoft.Extensions.Logging.Console.ConsoleLoggerProcessor.TryDequeue(value class Microsoft.Extensions.Logging.Console.LogMessageEntry&) Microsoft.Extensions.Logging.Console!Microsoft.Extensions.Logging.Console.ConsoleLoggerProcessor.ProcessLogQueue() Thread (0x424C): [Native Frames] Garnet.cluster!Garnet.cluster.ClusterManager.FlushConfig() Garnet.cluster!Garnet.cluster.ClusterManager.TryMerge(class Garnet.cluster.ClusterConfig,bool) Garnet.cluster!Garnet.cluster.GarnetServerNode.<Gossip>b__26_0(class System.Threading.Tasks.Task`1<value class Garnet.common.MemoryResult`1<unsigned int8>>) System.Private.CoreLib!System.Threading.Tasks.ContinuationTaskFromResultTask`1[Garnet.common.MemoryResult`1[System.Byte]].InnerInvoke() System.Private.CoreLib!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() Thread (0x424D): [Native Frames] System.Private.CoreLib!System.Threading.WaitHandle.WaitOneNoCheck(int32) System.Private.CoreLib!System.Threading.PortableThreadPool+GateThread.GateThreadStart() Thread (0x424F): [Native Frames] ?!? System.Net.Sockets!System.Net.Sockets.SocketAsyncEngine.EventLoop() Thread (0x4252): [Native Frames] System.Private.CoreLib!System.Threading.WaitHandle.WaitOneNoCheck(int32) System.Private.CoreLib!System.Threading.TimerQueue.TimerThread() Thread (0x45FD): [Native Frames] Garnet.cluster!Garnet.cluster.ClusterManager.FlushConfig() Garnet.cluster!Garnet.cluster.ClusterManager.TryMerge(class Garnet.cluster.ClusterConfig,bool) Garnet.cluster!Garnet.cluster.GarnetServerNode.<Gossip>b__26_0(class System.Threading.Tasks.Task`1<value class Garnet.common.MemoryResult`1<unsigned int8>>) System.Private.CoreLib!System.Threading.Tasks.ContinuationTaskFromResultTask`1[Garnet.common.MemoryResult`1[System.Byte]].InnerInvoke() System.Private.CoreLib!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() Thread (0x4B33): [Native Frames] System.Private.CoreLib!System.Threading.Monitor.Wait(class System.Object,int32) System.Private.CoreLib!System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(int32,unsigned int32,value class System.Threading.CancellationToken) System.Private.CoreLib!System.Threading.SemaphoreSlim.Wait(int32,value class System.Threading.CancellationToken) System.Private.CoreLib!System.Threading.SemaphoreSlim.Wait() Garnet.cluster!Garnet.cluster.ClusterUtils.WriteInto(class Tsavorite.core.IDevice,class Tsavorite.core.SectorAlignedBufferPool,unsigned int64,unsigned int8[],int32,class Microsoft.Extensions.Logging.ILogger) Garnet.cluster!Garnet.cluster.ClusterManager.FlushConfig() Garnet.cluster!Garnet.cluster.ClusterManager.TryMerge(class Garnet.cluster.ClusterConfig,bool) Garnet.cluster!Garnet.cluster.GarnetServerNode.<Gossip>b__26_0(class System.Threading.Tasks.Task`1<value class Garnet.common.MemoryResult`1<unsigned int8>>) System.Private.CoreLib!System.Threading.Tasks.ContinuationTaskFromResultTask`1[Garnet.common.MemoryResult`1[System.Byte]].InnerInvoke() System.Private.CoreLib!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() Thread (0x4B3B): [Native Frames] Garnet.cluster!Garnet.cluster.ClusterManager.FlushConfig() Garnet.cluster!Garnet.cluster.ClusterManager.TryMerge(class Garnet.cluster.ClusterConfig,bool) Garnet.cluster!Garnet.cluster.GarnetServerNode.<Gossip>b__26_0(class System.Threading.Tasks.Task`1<value class Garnet.common.MemoryResult`1<unsigned int8>>) System.Private.CoreLib!System.Threading.Tasks.ContinuationTaskFromResultTask`1[Garnet.common.MemoryResult`1[System.Byte]].InnerInvoke() System.Private.CoreLib!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(class System.Threading.Thread,class System.Threading.ExecutionContext,class System.Threading.ContextCallback,class System.Object) System.Private.CoreLib!System.Threading.Tasks.Task.ExecuteWithThreadLocal(class System.Threading.Tasks.Task&,class System.Threading.Thread) System.Private.CoreLib!System.Threading.ThreadPoolWorkQueue.Dispatch() System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
The text was updated successfully, but these errors were encountered:
I also tried starting the Garnet servers with these flags: --clean-cluster-config, --aof-null-device, --no-obj. Facing the same issue.
--clean-cluster-config
--aof-null-device
--no-obj
Sorry, something went wrong.
vazois
No branches or pull requests
Describe the bug
Garnet server becomes unresponsive (deadlock) upon multiple cluster meet operations, when started with very few threads (4).
Steps to reproduce the bug
Expected behavior
The server should remain responsive.
Screenshots
Release version
v1.0.47
IDE
No response
OS version
No response
Additional context
I took the thread snapshot of the stuck Garnet server.
Thread (0x4B33) has acquired the lock in ClusterManager and is stuck while writing to the file (nodes.conf).
The other threads (0x45FD, 0x4B3B, etc.) are stuck waiting for the ClusterManager lock itself.
The text was updated successfully, but these errors were encountered: