Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block extensions disallowed by policy #3259

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c2cc2c6
Block disallowed extension processing
mgunnala Nov 8, 2024
151081d
Enable policy e2e tests
mgunnala Nov 8, 2024
edec2af
Pylint
mgunnala Nov 8, 2024
a37508f
Fix e2e test failures
mgunnala Nov 11, 2024
b0da554
Address review comments
mgunnala Nov 18, 2024
a4f5cab
Merge branch 'develop' into allowlist_2
mgunnala Nov 18, 2024
699b9ba
Address review comments
mgunnala Nov 20, 2024
86de0c5
Address test review comments
mgunnala Nov 21, 2024
c3e9b89
Remove status file for single-config
mgunnala Nov 22, 2024
65d7034
Add back status file for single-config
mgunnala Nov 22, 2024
95f247a
Run e2e tests on all endorsed
mgunnala Nov 22, 2024
3b18519
Fix UT failures
mgunnala Nov 23, 2024
63da127
Pylint
mgunnala Nov 26, 2024
471cd59
Merge branch 'develop' into allowlist_2
narrieta Nov 26, 2024
8ea989b
Address review comments for agent code
mgunnala Dec 3, 2024
83f6ff0
Tests
mgunnala Dec 3, 2024
b037e41
Revert "Tests"
mgunnala Dec 3, 2024
ba3869c
Address test comments
mgunnala Dec 6, 2024
dfcc158
Address test comments
mgunnala Dec 9, 2024
fe07ffa
Merge branch 'develop' into allowlist_2
mgunnala Dec 9, 2024
a31bdcf
Address test comments
mgunnala Dec 10, 2024
5198cf8
Cleanup existing extensions on test VMs
mgunnala Dec 12, 2024
4a0a4ef
Address comments and disable dependencies e2e tests
mgunnala Dec 16, 2024
daa8017
Merge branch 'develop' into allowlist_2
mgunnala Dec 16, 2024
bacc425
Add fixes for e2e tests
mgunnala Dec 17, 2024
3319916
Add back delete failure test case
mgunnala Dec 17, 2024
8c31798
Address comments round 3
mgunnala Dec 17, 2024
32ef5c1
Address comments
mgunnala Dec 17, 2024
f0895b7
Merge branch 'develop' into allowlist_2
mgunnala Dec 17, 2024
0c9f1c7
Pylint
mgunnala Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 76 additions & 2 deletions azurelinuxagent/ga/exthandlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from azurelinuxagent.common.agent_supported_feature import get_agent_supported_features_list_for_extensions, \
SupportedFeatureNames, get_supported_feature_by_name, get_agent_supported_features_list_for_crp
from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator
from azurelinuxagent.ga.policy.policy_engine import ExtensionPolicyEngine, ExtensionPolicyError
from azurelinuxagent.common.datacontract import get_properties, set_properties
from azurelinuxagent.common.errorstate import ErrorState
from azurelinuxagent.common.event import add_event, elapsed_milliseconds, WALAEventOperation, \
Expand Down Expand Up @@ -482,10 +483,50 @@ def handle_ext_handlers(self, goal_state_id):
depends_on_err_msg = None
extensions_enabled = conf.get_extensions_enabled()

# Instantiate policy engine, and use same engine to handle all extension handlers.
narrieta marked this conversation as resolved.
Show resolved Hide resolved
# If an error is thrown during policy engine initialization, we block all extensions and report the error via handler/extension status for
mgunnala marked this conversation as resolved.
Show resolved Hide resolved
# each extension.
policy_error = None
try:
policy_engine = ExtensionPolicyEngine()
except Exception as ex:
policy_error = ex

for extension, ext_handler in all_extensions:

handler_i = ExtHandlerInstance(ext_handler, self.protocol, extension=extension)

# Invoke policy engine to determine if extension is allowed. If not, block extension and report error on
# behalf of the extension.
policy_err_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this is a constant... define it at the class level?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

ExtensionRequestedState.Enabled: ('enable', ExtensionErrorCodes.PluginEnableProcessingFailed),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment describing the elements in the tuple?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added and moved this to the class level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'enable' and 'disable' are internal CRP/Agent operations; users are not aware of them. They should not be propagated to error messages displayed to the user

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this to "run" and "uninstall"

# Note: currently, when uninstall is requested for an extension, CRP polls until the agent does not
# report status for that extension, or until timeout is reached. In the case of a policy error, the
# agent reports failed status on behalf of the extension, which will cause CRP to poll for the full
# timeout, instead of failing fast.
#
# TODO: CRP does not currently have a terminal error code for uninstall. Once this code is added, use
# it instead of PluginDisableProcessingFailed below.
ExtensionRequestedState.Uninstall: ('uninstall', ExtensionErrorCodes.PluginDisableProcessingFailed),
ExtensionRequestedState.Disabled: ('disable', ExtensionErrorCodes.PluginDisableProcessingFailed),
}
policy_op, policy_err_code = policy_err_map.get(ext_handler.state)
if policy_error is not None:
err = ExtensionPolicyError(msg="", inner=policy_error, code=policy_err_code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the intention of creating an exception object here? seems like it is only used to pass the error code, but it is never raised

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially implemented the ExtensionPolicyError class to have a centralized error message for extensions blocked by policy, and also to pass the code. But you make a good point - since we never actually raise the exception, I've removed the ExtensionPolicyError class and now pass the code/message directly into the reporting function.

self.__handle_and_report_policy_error(handler_i, err, report_op=handler_i.operation, message=ustr(err),
extension=extension, report=True)
continue

extension_allowed = policy_engine.should_allow_extension(ext_handler.name)
if not extension_allowed:
msg = "failed to {0} extension '{1}' because extension is not specified in allowlist. To {0}, " \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = "failed to {0} extension '{1}' because extension is not specified in allowlist. To {0}, " \
msg = "failed to {0} extension '{1}' because it is not specified in allowlist. To {0}, " \

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

"add extension to the allowed list in the policy file ('{2}').".format(policy_op,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"add extension to the allowed list in the policy file ('{2}').".format(policy_op,
"add the extension to the allowed list in the policy file ('{2}').".format(policy_op,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

ext_handler.name,
conf.get_policy_file_path())
err = ExtensionPolicyError(msg, code=policy_err_code)
self.__handle_and_report_policy_error(handler_i, err, report_op=handler_i.operation, message=ustr(err),
extension=extension, report=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like we are missing a continue statement here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think continue statement would break the dependency logic.

It's ok to use continue in the other condition because we know all extensions will fail (dependencies don't matter)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the case where a specific extension is disallowed by policy, we should log an error for dependencies as well (using the existing code). Adding a continue statement would skip this logic.

In the case of a policy failure, where all extensions should be blocked regardless of dependencies, we can skip this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. To make this clearer, can you do 'if not extension_allowed' after 'if depends_on_err_msg is not None'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've updated

# In case of extensions disabled, we skip processing extensions. But CRP is still waiting for some status
# back for the skipped extensions. In order to propagate the status back to CRP, we will report status back
# here with an error message.
Expand Down Expand Up @@ -527,7 +568,11 @@ def handle_ext_handlers(self, goal_state_id):
continue

# Process extensions and get if it was successfully executed or not
extension_success = self.handle_ext_handler(handler_i, extension, goal_state_id)
# If extension was blocked by policy, treat the extension as failed and do not process the handler.
if not extension_allowed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge this 'if not extension_allowed:' with the one just above it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, made this change, thanks!

extension_success = False
else:
extension_success = self.handle_ext_handler(handler_i, extension, goal_state_id)

dep_level = self.__get_dependency_level((extension, ext_handler))
if 0 <= dep_level < max_dep_level:
Expand Down Expand Up @@ -692,6 +737,32 @@ def __handle_and_report_ext_handler_errors(ext_handler_i, error, report_op, mess
add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
message=message)

@staticmethod
def __handle_and_report_policy_error(ext_handler_i, error, report_op, message, report=True, extension=None):
maddieford marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __handle_and_report_policy_error(ext_handler_i, error, report_op, message, report=True, extension=None):
def _report_policy_error(ext_handler_i, error, report_op, message, report=True, extension=None):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

# TODO: Consider merging this function with __handle_and_report_ext_handler_errors() above, after investigating
# the impact of this change.
#
# If extension status is present, CRP will ignore handler status and report extension status. In the case of policy errors,
# extensions are not processed, so collect_ext_status() reports transitioning status on behalf of the extension.
# However, extensions blocked by policy should fail fast, so agent should write a .status file for policy failures.
# Note that __handle_and_report_ext_handler_errors() does not create the file for single-config extensions, but changing
# it will require additional testing/investigation. As a temporary workaround, this separate function was created
# to write a status file for single-config extensions.

# Set handler status for all extensions (with and without settings)
ext_handler_i.set_handler_status(message=message, code=error.code)

# Create status file for extensions with settings (single and multi config).
if extension is not None:
ext_handler_i.create_status_file_if_not_exist(extension, status=ExtensionStatusValue.error, code=error.code,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_status_file_if_not_exist() will not overwrite existing status file (for the current sequence number). Is this behavior acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should overwrite the existing file with the policy error

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now overwrite the existing file with policy error. I've added an "overwrite" parameter and changed the function name to create_status_file( ).

operation=report_op, message=message)

if report:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would report be False?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it isn't ever false, I initially wrote it this way because I was copying the exact structure of __handle_and_report_ext_handler_errors(). But I've removed it since that parameter isn't being used for now.

name = ext_handler_i.get_extension_full_name(extension)
handler_version = ext_handler_i.ext_handler.version
add_event(name=name, version=handler_version, op=report_op, is_success=False, log_event=True,
message=message)

def handle_enable(self, ext_handler_i, extension):
"""
1- Ensure the handler is installed
Expand Down Expand Up @@ -990,7 +1061,10 @@ def report_ext_handler_status(self, vm_status, ext_handler, goal_state_changed):
# extension even if HandlerState == NotInstalled (Sample scenario: ExtensionsGoalStateError, DecideVersionError, etc)
# We also need to report extension status for an uninstalled handler if extensions are disabled because CRP
# waits for extension runtime status before failing the extension operation.
if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config or not conf.get_extensions_enabled():
# In the case of policy failures, we want to report extension status with a terminal code so CRP fails fast. If
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this

        # We also need to report extension status for an uninstalled handler if extensions are disabled because CRP
        # waits for extension runtime status before failing the extension operation.
        # In the case of policy failures, we want to report extension status with a terminal code so CRP fails fast. If
        # extension status is not present, collect_ext_status() will set a default transitioning status, and CRP will
        # wait for timeout.

to

        # We also need to report extension status for an uninstalled handler if extensions are disabled, or if the extension
        # failed due to policy, because CRP waits for extension runtime status before failing the extension operation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of the change is to enter this condition when the extension fails due to policy, but this change means that we enter the condition whenever policy is enabled.

Is there any negative effect to calling ext_handler_i.get_extension_handler_statuses... whenever policy is enabled? Why is this behind the if condition in the first place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - removed this condition, because it would cause us to enter the if condition even for non-policy-related uninstall failures.

# extension status is not present, collect_ext_status() will set a default transitioning status, and CRP will
# wait for timeout.
if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config or not conf.get_extensions_enabled() or ExtensionPolicyEngine.get_policy_enforcement_enabled():

# Since we require reading the Manifest for reading the heartbeat, this would fail if HandlerManifest not found.
# Only try to read heartbeat if HandlerState != NotInstalled.
Expand Down
21 changes: 12 additions & 9 deletions azurelinuxagent/ga/policy/policy_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from azurelinuxagent.common import logger
from azurelinuxagent.common.event import WALAEventOperation, add_event
from azurelinuxagent.common import conf
from azurelinuxagent.common.exception import AgentError
from azurelinuxagent.common.exception import AgentError, ExtensionError
from azurelinuxagent.common.protocol.extensions_goal_state_from_vm_settings import _CaseFoldedDict
from azurelinuxagent.common.utils.flexible_version import FlexibleVersion

Expand All @@ -36,12 +36,6 @@
_MAX_SUPPORTED_POLICY_VERSION = "0.1.0"


class PolicyError(AgentError):
"""
Error raised during agent policy enforcement.
"""


class InvalidPolicyError(AgentError):
"""
Error raised if user-provided policy is invalid.
Expand All @@ -51,13 +45,22 @@ def __init__(self, msg, inner=None):
super(InvalidPolicyError, self).__init__(msg, inner)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an INFO message just after the check for enabled stating that we are using Policy? This makes clearer the fact that we are now processing policies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__read_policy() is called right after the check for enabled, and it logs the following statement:

Policy enforcement is enabled. Enforcing policy using policy file found at '<path>'. File contents: <policy>

Is that sufficient, or do you think we need an additional log message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's sufficient, but the message should probably be in the caller instead of read_policy. Who knows, as code evolves we may add other code before read_policy, or call read_policy multiple times.

Alternatively, the caller can log "Policy enforcement is enabled." and read_policy "Enforcing policy using policy file found at ''. File contents: "

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

class ExtensionPolicyError(ExtensionError):
"""
Error raised during agent extension policy enforcement.
"""
def __init__(self, msg, code, inner=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'code' and 'inner' parameters are not in the same order as in the base class, which can lead to subtle coding errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it this way because I wanted "code" to be a required parameter in ExtensionPolicyEngine, but not "inner". But I can set a default value for "code", to keep them in the same order as in the base class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing this class, based on the other comments

msg = "Extension will not be processed: {0}".format(msg)
super(ExtensionPolicyError, self).__init__(msg, inner, code)


class _PolicyEngine(object):
"""
Implements base policy engine API.
"""
def __init__(self):
# Set defaults for policy
self._policy_enforcement_enabled = self.__get_policy_enforcement_enabled()
self._policy_enforcement_enabled = self.get_policy_enforcement_enabled()
mgunnala marked this conversation as resolved.
Show resolved Hide resolved
if not self.policy_enforcement_enabled:
return

Expand All @@ -76,7 +79,7 @@ def _log_policy_event(msg, is_success=True, op=WALAEventOperation.Policy, send_e
add_event(op=op, message=msg, is_success=is_success, log_event=False)

@staticmethod
def __get_policy_enforcement_enabled():
def get_policy_enforcement_enabled():
"""
Policy will be enabled if (1) policy file exists at the expected location and (2) the conf flag "Debug.EnableExtensionPolicy" is true.
"""
Expand Down
Loading
Loading