[UNTESTED] immediately abort chef-client if maintenance mode is set #13

aspiers · 2015-09-04T18:55:59Z

The reasoning is explained in the comments. I'm not sure why I ever thought it was a good idea to allow a chef-client run to proceed if maintenance mode is set. It's not even as if that approach could ever restore an ill node to full health, because we were deliberately checking whether the maintenance mode was set prior to the chef-client run, and if so, leaving it set. This way, if a node is left in maintenance mode, it will be discovered sooner, resulting in the cloud operator being alerted to a degraded cluster sooner.

Also ported crowbar/barclamp-pacemaker#149 on top of this.

houndci-bot · 2015-09-04T18:56:03Z

chef/cookbooks/crowbar-pacemaker/files/default/pacemaker_maintenance_handlers.rb

+        if maintenance_mode?
+          raise \
+            "Pacemaker maintenance mode was already set on " +
+            "#{node.hostname}; aborting! Please diagnose why this was the " +


Style/MultilineOperationIndentation: Use 2 (not 0) spaces for indenting an expression spanning multiple lines.

vuntz · 2015-09-07T08:47:05Z

chef/cookbooks/crowbar-pacemaker/files/default/pacemaker_maintenance_handlers.rb

-                           "not by Chef::Provider::PacemakerService; leaving as is")
+            # This shouldn't happen, and suggests that something is
+            # interfering in a way it shouldn't.
+            raise "Something put node into maintenance mode during run!"


Not sure I would raise something here; if the admin did put the node in maintenance while chef-client was running, then, well, that's the way it is and why should chef fail because of that?

Yeah OK, I guess a Chef::Log.warn is more appropriate here.

vuntz · 2015-09-07T16:35:14Z

+1

The reasoning is explained in the comments. I'm not sure why I ever thought it was a good idea to allow a chef-client run to proceed if maintenance mode is set. It's not even as if that approach could ever restore an ill node to full health, because we were deliberately checking whether the maintenance mode was set prior to the chef-client run, and if so, leaving it set. This way, if a node is left in maintenance mode, it will be discovered sooner, resulting in the cloud operator being alerted to a degraded cluster sooner.

houndci-bot · 2015-09-11T13:07:34Z

chef/cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb

+      end
+
+      Chef::Log.debug("Cluster is up")
+      return true


Style/RedundantReturn: Redundant return detected.

aspiers · 2015-09-11T13:12:38Z

Crap, raising an exception in the start handler doesn't abort the run :-(

[2015-09-11T13:09:48+00:00] ERROR: Report handler Chef::Pacemaker::StartHandler raised #<RuntimeError: Pacemaker maintenance mode was already set on d52-54-02-77-77-02; aborting! Please diagnose why this was the case, 
fix the root cause, and then unset maintenance mode via HAWK or by running 'crm node ready' on the node.>
[2015-09-11T13:09:48+00:00] ERROR: /var/chef/handlers/pacemaker_maintenance_handlers.rb:36:in `report'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:226:in `run_report_unsafe'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:214:in `run_report_safely'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:70:in `block in run_start_handlers'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:69:in `each'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:69:in `run_start_handlers'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:78:in `block in <class:Handler>'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:103:in `call'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:103:in `block in run_started'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:102:in `each'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:102:in `run_started'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:426:in `do_run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:184:in `run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application.rb:133:in `run_chef_client'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:306:in `block in run_application'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:294:in `loop'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:294:in `run_application'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application.rb:65:in `run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/bin/chef-client:26:in `<top (required)>'
[2015-09-11T13:09:48+00:00] ERROR: /usr/bin/chef-client:23:in `load'
[2015-09-11T13:09:48+00:00] ERROR: /usr/bin/chef-client:23:in `<main>'
[2015-09-11T13:09:48+00:00] INFO: Start handlers complete.
[2015-09-11T13:09:48+00:00] INFO: Loading cookbooks [apache2, barclamp, bluepill, bmc-nat, ceph, cinder, corosync, crowbar-hacks, crowbar-openstack, crowbar-pacemaker, database, drbd, glance, haproxy, hawk, kernel-pani
c, keystone, logging, lvm, mysql, nagios, network, neutron, nsx, ntp, ohai, openssl, pacemaker, postgresql, provisioner, rabbitmq, repos, utils, uwsgi, xfs]
[2015-09-11T13:09:48+00:00] INFO: Storing updated cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb in the cache.

Need to find a way to do this.

Make #maintenance_mode? return a sensible value even when Pacemaker is down or uninstalled. This is especially helpful when it is invoked by Chef's start_handler. Cherry-picked from crowbar/barclamp-pacemaker@7dff6c1.

aplanas · 2016-10-06T12:50:25Z

@aspiers still relevant? if so, can you rebase?

coveralls · 2017-06-14T01:27:50Z

Changes Unknown when pulling 43590c4 on aspiers:abort-on-maintenance-mode into ** on crowbar:master**.

jsuchome · 2017-06-15T11:51:06Z

@aspiers is this still alive or should we close it?

houndci-bot reviewed Sep 4, 2015
View reviewed changes

aspiers force-pushed the abort-on-maintenance-mode branch from 2cc0c35 to 4352717 Compare September 4, 2015 19:12

vuntz reviewed Sep 7, 2015
View reviewed changes

aspiers force-pushed the abort-on-maintenance-mode branch from 4352717 to d4eb66f Compare September 7, 2015 16:24

Adam Spiers added 2 commits September 11, 2015 00:58

fix some rubocop offences

090f565

aspiers force-pushed the abort-on-maintenance-mode branch from d4eb66f to 8661acd Compare September 11, 2015 13:07

houndci-bot reviewed Sep 11, 2015
View reviewed changes

aspiers mentioned this pull request Sep 11, 2015

[needs testing] fix #maintenance_mode? return value crowbar/barclamp-pacemaker#149

Closed

fix #maintenance_mode? return value

43590c4

Make #maintenance_mode? return a sensible value even when Pacemaker is down or uninstalled. This is especially helpful when it is invoked by Chef's start_handler. Cherry-picked from crowbar/barclamp-pacemaker@7dff6c1.

aspiers force-pushed the abort-on-maintenance-mode branch from 8661acd to 43590c4 Compare September 11, 2015 13:13

JanZerebecki added the wip label Jan 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UNTESTED] immediately abort chef-client if maintenance mode is set #13

[UNTESTED] immediately abort chef-client if maintenance mode is set #13

aspiers commented Sep 4, 2015

houndci-bot Sep 4, 2015

vuntz Sep 7, 2015

aspiers Sep 7, 2015

aspiers Sep 7, 2015

vuntz commented Sep 7, 2015

houndci-bot Sep 11, 2015

aspiers commented Sep 11, 2015

aplanas commented Oct 6, 2016

coveralls commented Jun 14, 2017

jsuchome commented Jun 15, 2017

[UNTESTED] immediately abort chef-client if maintenance mode is set #13

Are you sure you want to change the base?

[UNTESTED] immediately abort chef-client if maintenance mode is set #13

Conversation

aspiers commented Sep 4, 2015

houndci-bot Sep 4, 2015

Choose a reason for hiding this comment

vuntz Sep 7, 2015

Choose a reason for hiding this comment

aspiers Sep 7, 2015

Choose a reason for hiding this comment

aspiers Sep 7, 2015

Choose a reason for hiding this comment

vuntz commented Sep 7, 2015

houndci-bot Sep 11, 2015

Choose a reason for hiding this comment

aspiers commented Sep 11, 2015

aplanas commented Oct 6, 2016

coveralls commented Jun 14, 2017

jsuchome commented Jun 15, 2017