Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UNTESTED] immediately abort chef-client if maintenance mode is set #13

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aspiers
Copy link
Member

@aspiers aspiers commented Sep 4, 2015

The reasoning is explained in the comments. I'm not sure why I ever thought it was a good idea to allow a chef-client run to proceed if maintenance mode is set. It's not even as if that approach could ever restore an ill node to full health, because we were deliberately checking whether the maintenance mode was set prior to the chef-client run, and if so, leaving it set. This way, if a node is left in maintenance mode, it will be discovered sooner, resulting in the cloud operator being alerted to a degraded cluster sooner.

Also ported crowbar/barclamp-pacemaker#149 on top of this.

if maintenance_mode?
raise \
"Pacemaker maintenance mode was already set on " +
"#{node.hostname}; aborting! Please diagnose why this was the " +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/MultilineOperationIndentation: Use 2 (not 0) spaces for indenting an expression spanning multiple lines.

@aspiers aspiers force-pushed the abort-on-maintenance-mode branch from 2cc0c35 to 4352717 Compare September 4, 2015 19:12
"not by Chef::Provider::PacemakerService; leaving as is")
# This shouldn't happen, and suggests that something is
# interfering in a way it shouldn't.
raise "Something put node into maintenance mode during run!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I would raise something here; if the admin did put the node in maintenance while chef-client was running, then, well, that's the way it is and why should chef fail because of that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah OK, I guess a Chef::Log.warn is more appropriate here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@aspiers aspiers force-pushed the abort-on-maintenance-mode branch from 4352717 to d4eb66f Compare September 7, 2015 16:24
@vuntz
Copy link
Member

vuntz commented Sep 7, 2015

+1

Adam Spiers added 2 commits September 11, 2015 00:58
The reasoning is explained in the comments.  I'm not sure why I ever
thought it was a good idea to allow a chef-client run to proceed if
maintenance mode is set.  It's not even as if that approach could ever
restore an ill node to full health, because we were deliberately
checking whether the maintenance mode was set prior to the chef-client
run, and if so, leaving it set.  This way, if a node is left in
maintenance mode, it will be discovered sooner, resulting in the cloud
operator being alerted to a degraded cluster sooner.
@aspiers aspiers force-pushed the abort-on-maintenance-mode branch from d4eb66f to 8661acd Compare September 11, 2015 13:07
end

Chef::Log.debug("Cluster is up")
return true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style/RedundantReturn: Redundant return detected.

@aspiers
Copy link
Member Author

aspiers commented Sep 11, 2015

Crap, raising an exception in the start handler doesn't abort the run :-(

[2015-09-11T13:09:48+00:00] ERROR: Report handler Chef::Pacemaker::StartHandler raised #<RuntimeError: Pacemaker maintenance mode was already set on d52-54-02-77-77-02; aborting! Please diagnose why this was the case, 
fix the root cause, and then unset maintenance mode via HAWK or by running 'crm node ready' on the node.>
[2015-09-11T13:09:48+00:00] ERROR: /var/chef/handlers/pacemaker_maintenance_handlers.rb:36:in `report'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:226:in `run_report_unsafe'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:214:in `run_report_safely'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:70:in `block in run_start_handlers'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:69:in `each'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:69:in `run_start_handlers'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/handler.rb:78:in `block in <class:Handler>'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:103:in `call'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:103:in `block in run_started'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:102:in `each'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:102:in `run_started'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:426:in `do_run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/client.rb:184:in `run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application.rb:133:in `run_chef_client'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:306:in `block in run_application'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:294:in `loop'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application/client.rb:294:in `run_application'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/lib/chef/application.rb:65:in `run'
[2015-09-11T13:09:48+00:00] ERROR: /usr/lib64/ruby/gems/2.1.0/gems/chef-10.32.2/bin/chef-client:26:in `<top (required)>'
[2015-09-11T13:09:48+00:00] ERROR: /usr/bin/chef-client:23:in `load'
[2015-09-11T13:09:48+00:00] ERROR: /usr/bin/chef-client:23:in `<main>'
[2015-09-11T13:09:48+00:00] INFO: Start handlers complete.
[2015-09-11T13:09:48+00:00] INFO: Loading cookbooks [apache2, barclamp, bluepill, bmc-nat, ceph, cinder, corosync, crowbar-hacks, crowbar-openstack, crowbar-pacemaker, database, drbd, glance, haproxy, hawk, kernel-pani
c, keystone, logging, lvm, mysql, nagios, network, neutron, nsx, ntp, ohai, openssl, pacemaker, postgresql, provisioner, rabbitmq, repos, utils, uwsgi, xfs]
[2015-09-11T13:09:48+00:00] INFO: Storing updated cookbooks/crowbar-pacemaker/libraries/maintenance_mode_helpers.rb in the cache.

Need to find a way to do this.

Make #maintenance_mode? return a sensible value even when Pacemaker is
down or uninstalled.  This is especially helpful when it is invoked
by Chef's start_handler.

Cherry-picked from crowbar/barclamp-pacemaker@7dff6c1.
@aspiers aspiers force-pushed the abort-on-maintenance-mode branch from 8661acd to 43590c4 Compare September 11, 2015 13:13
@aplanas
Copy link
Contributor

aplanas commented Oct 6, 2016

@aspiers still relevant? if so, can you rebase?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 43590c4 on aspiers:abort-on-maintenance-mode into ** on crowbar:master**.

@jsuchome
Copy link
Member

@aspiers is this still alive or should we close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

7 participants