-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft cluster doesn't switch #164
Comments
Could you please provide more details.
|
Leader have this: second server have this: |
Thanks for report, this is not expected behavior, we have a log compaction, probably it's disabled in patroni by default, I will check. |
Thanks for your answer |
For now I haven't found the reason why file is growing, but I will continue investigating and fix this case. |
Hi,
Describe the bug
Raft leader doesn't switch also the Postgres leader switch in RO mode
We have four node for raft (two dedicated and two with postgres)
To Reproduce
have a raft folder without space
Expected behavior
The folder of our raft leader has been full (no space left) also the raft cluster have wait the previous leader and Patroni cluster fall in error (leader in RO mode and Replica can't start).
To fix this issue we have increase space of raft folder but we think it's not the best way.
How we can purge raft file to save space ?
Why raft cluster doesn't switch?
Environment
extract from Patroni configuration file
cat /etc/patroni/patroni.yml
scope: PGSQLGrafanaProd
namespace: /PGSQLGrafana/
name: PGSQLGrafanaProd1
raft:
data_dir: /apps/pgsql/raft
self_addr: 1.1.1.1:5010
partner_addrs: ['1.1.1.2:5010','1.1.1.3:5010','1.1.1.4:5010']
Have you checked Patroni logs?
Before crash:
Aug 22 20:39:51 s01vl9950891 systemd[1]: patroni.service: Failed with result 'signal'.
Aug 22 20:39:51 s01vl9950891 systemd[1]: patroni.service: Main process exited, code=killed, status=11/SEGV
Aug 22 20:39:46 s01vl9950891 patroni[1906]: 2022-08-22 20:39:46,414 INFO: no action. I am (PGSQLGrafanaProd1), the leader with the lock
Aug 22 20:39:45 s01vl9950891 patroni[1906]: 2022-08-22 20:39:45,672 WARNING: Failed to store full dump
Aug 22 20:39:40 s01vl9950891 patroni[1906]: 2022-08-22 20:39:40,751 WARNING: Failed to store full dump
Aug 22 20:39:36 s01vl9950891 patroni[1906]: 2022-08-22 20:39:36,413 INFO: no action. I am (PGSQLGrafanaProd1), the leader with the lock
Aug 22 20:39:35 s01vl9950891 patroni[1906]: 2022-08-22 20:39:35,927 WARNING: Failed to store full dump
Aug 22 20:39:30 s01vl9950891 patroni[1906]: 2022-08-22 20:39:30,973 WARNING: Failed to store full dump
When we have try to run daemon manually:
Aug 23 09:47:46 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:46,564 WARNING: Loop time exceeded, rescheduling immediately.
Aug 23 09:47:46 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:46,562 INFO: starting after demotion in progress
Aug 23 09:47:46 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:34,561 INFO: Lock owner: PGSQLGrafanaProd1; I am PGSQLGrafanaProd1
Aug 23 09:47:44 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - accepting connections
Aug 23 09:47:42 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:41 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:40 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:39 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:38 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:37 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:36 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:36 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - rejecting connections
Aug 23 09:47:35 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:35.222 CEST [1212938] HINT: Future log output will appear in directory "/apps/pgsql/log".
Aug 23 09:47:35 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:35.222 CEST [1212938] LOG: redirecting log output to logging collector process
Aug 23 09:47:35 s01vl9950891 patroni[1212815]: 10.244.120.156:5432 - no response
Aug 23 09:47:35 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:35,064 INFO: postmaster pid=1212938
Aug 23 09:47:34 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:34,561 WARNING: Loop time exceeded, rescheduling immediately.
Aug 23 09:47:34 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:34,560 INFO: closed patroni connection to the postgresql cluster
Aug 23 09:47:34 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:34,559 INFO: demoted self because failed to update leader lock in DCS
Aug 23 09:47:32 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:32,324 INFO: Demoting self (immediate-nolock)
Aug 23 09:47:32 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:32,324 ERROR: failed to update leader lock
Aug 23 09:47:32 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:22,322 INFO: Lock owner: PGSQLGrafanaProd1; I am PGSQLGrafanaProd1
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:22,205 INFO: establishing a new patroni connection to the postgres cluster
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:22,100 INFO: No PostgreSQL configuration items changed, nothing to reload.
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: OSError: [Errno 28] No space left on device
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self.__incomingTransmissionFile.close()
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/serializer.py", line 193, in setTransmissionData
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: if self.__serializer.setTransmissionData(serialized):
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/syncobj.py", line 896, in __onMessageReceived
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self._onMessageReceivedCallback(node, message)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/transport.py", line 107, in _onMessageReceived
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self.__onMessageReceived(message)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/tcp_connection.py", line 173, in __processConnection
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self.__descrToCallbacks[descr](descr, eventMask)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/poller.py", line 97, in poll
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self._poller.poll(timeToWait)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/syncobj.py", line 616, in _onTick
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: super(DynMemberSyncObj, self)._onTick(timeToWait)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/usr/lib/python3.6/site-packages/patroni/dcs/raft.py", line 84, in _onTick
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: super(KVStoreTTL, self)._onTick(timeToWait)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/usr/lib/python3.6/site-packages/patroni/dcs/raft.py", line 238, in _onTick
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self._onTick(timeToWait)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/var/lib/pgsql/.local/lib/python3.6/site-packages/pysyncobj/syncobj.py", line 532, in doTick
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self.doTick(self.conf.autoTickPeriod)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/usr/lib/python3.6/site-packages/patroni/dcs/raft.py", line 248, in _autoTickThread
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self._target(*self._args, **self._kwargs)
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/usr/lib64/python3.6/threading.py", line 867, in run
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: self.run()
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: File "/usr/lib64/python3.6/threading.py", line 919, in _bootstrap_inner
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: Traceback (most recent call last):
Aug 23 09:47:22 s01vl9950891 patroni[1212815]: Exception in thread Thread-2:
Aug 23 09:47:17 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:17,084 INFO: waiting on raft
Aug 23 09:47:12 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:12,071 INFO: waiting on raft
Aug 23 09:47:07 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:07,060 INFO: waiting on raft
Aug 23 09:47:02 s01vl9950891 patroni[1212815]: 2022-08-23 09:47:02,049 INFO: waiting on raft
The text was updated successfully, but these errors were encountered: