From 6e854144f872af229a33d4de887d763f147b2cef Mon Sep 17 00:00:00 2001 From: Nageswara Nandigam <84482346+nagworld9@users.noreply.github.com> Date: Wed, 27 Mar 2024 16:22:01 -0700 Subject: [PATCH] Release 2.10.0.0(merge to Master) (#3099) * Add support for Azure Clouds (#2795) * Add support for Azure Clouds --------- Co-authored-by: narrieta * Check certificates only if certificates are included in goal state and update test-requirements to remove codecov (#2803) * Update version to dummy 1.0.0.0' * Revert version change * Only check certificats if goal state includes certs * Fix code coverage deprecated issue * Move condition to function call * Add tests for no outbound connectivity (#2804) * Add tests for no outbound connectivity --------- Co-authored-by: narrieta * Use cloud when validating test location (#2806) * Use cloud when validating test location --------- Co-authored-by: narrieta * Redact access tokens from extension's output (#2811) * Redact access tokens from extension's output * python 2.6 --------- Co-authored-by: narrieta * Add @gabstamsft as code owner (#2813) Co-authored-by: narrieta * Fix name of single IB device when provisioning RDMA (#2814) The current code assumes the ipoib interface name is ib0 when single IB interface is provisioned. This is not always true when udev rules are used to rename to other names like ibPxxxxx. Fix this by searching any interface name starting with "ib". * Allow tests to run on random images (#2817) * Allow tests to run on random images * PR feedback --------- Co-authored-by: narrieta * Bug fixes for end-to-end tests (#2820) Co-authored-by: narrieta * Enable all Azure clouds on end-to-end tests (#2821) Co-authored-by: narrieta * Add Azure CLI to container image (#2822) Co-authored-by: narrieta * Fixes for Azure clouds (#2823) * Fixes for Azure clouds * add debug info --------- Co-authored-by: narrieta * Add test for extensions disabled; refactor VirtualMachine and VmExtension (#2824) * Add test for extensions disabled; refactor VirtualMachine and VmExtension --------- Co-authored-by: narrieta * Fixes for end-to-end tests (#2827) Co-authored-by: narrieta * Add test for osProfile.linuxConfiguration.provisionVMAgent (#2826) * Add test for osProfile.linuxConfiguration.provisionVMAgent * add files * pylint * added messages * ssh issue --------- Co-authored-by: narrieta * Enable suppression rules for waagent.log (#2829) Co-authored-by: narrieta * Wait for service start when setting up test VMs; collect VM logs when setup fails (#2830) Co-authored-by: narrieta * Add vm arch to heartbeat telemetry (#2818) (#2838) * Add VM Arch to heartbeat telemetry * Remove outdated vmsize heartbeat tesT * Remove unused import * Use platform to get vmarch (cherry picked from commit 66e8b3d782fdf2ebc443212bbb731a89599201f6) * Add regular expression to match logs from very old agents (#2839) Co-authored-by: narrieta * Increase concurrency level for end-to-end tests (#2841) Co-authored-by: narrieta * Agent update refactor supports GA versioning (#2810) * agent update refactor (#2706) * agent update refactor * address PR comments * updated available agents * fix pylint warn * updated test case warning * added kill switch flag * fix pylint warning * move last update attempt variables * report GA versioning supported feature. (#2752) * control agent updates in e2e tests and fix uts (#2743) * disable agent updates in dcr and fix uts * address comments * fix uts * report GA versioning feature * Don't report SF flag idf auto update is disabled (#2754) * fix uts (#2759) * agent versioning test_suite (#2770) * agent versioning test_suite * address PR comments * fix pylint warning * fix update assertion * fix pylint error * logging manifest type and don't log same error until next period in agent update. (#2778) * improve logging and don't log same error until next period * address comments * update comment * update comment * Added self-update time window. (#2794) * Added self-update time window * address comment * Wait and retry for rsm goal state (#2801) * wait for rsm goal state * address comments * Not sharing agent update tests vms and added scenario to daily run (#2809) * add own vm property * add agent_update to daily run * merge conflicts * address comments * address comments * additional comments addressed * fix pylint warning * Add test for FIPS (#2842) * Add test for FIPS * add test * increase sleep * remove unused file * added comment * check uptime --------- Co-authored-by: narrieta * Eliminate duplicate list of test suites to run (#2844) * Eliminate duplicate list of test suites to run * fix paths * add agent update --------- Co-authored-by: narrieta * Port NSBSD system to the latest version of waagent (#2828) * nsbsd: adapt to recent dns.resolver * osutil: Provide a get_root_username function for systems where its not 'root' (like in nsbsd) * nsbsd: tune the configuration filepath * nsbsd: fix lib installation path --------- Co-authored-by: Norberto Arrieta * Fix method name in update test (#2845) Co-authored-by: narrieta * Expose run name as a runbook variable (#2846) Co-authored-by: narrieta * Collect test artifacts as a separate step in the test pipeline (#2848) * Collect test artifacts as a separate step in the test pipeline --------- Co-authored-by: narrieta * remove agent update test and py27 version from build (#2853) * Fix infinite retry loop in end to end tests (#2855) * Fix infinite retry loop * fix message --------- Co-authored-by: narrieta * Remove empty "distro" module (#2854) Co-authored-by: narrieta * Enable Python 2.7 for unit tests (#2856) * Enable Python 2.7 for unit tests --------- Co-authored-by: narrieta * Skip downgrade if requested version below daemon version (#2850) * skip downgrade for agent update * add test * report it in status * address comments * revert change * improved error msg * address comment * update location schema and added skip clouds in suite yml (#2852) * update location schema in suite yml * address comments * . * pylint warn * comment * Do not collect LISA logs by default (#2857) Co-authored-by: narrieta * Add check for noexec on Permission denied errors (#2859) * Add check for noexec on Permission denied errors * remove type annotation --------- Co-authored-by: narrieta * Wait for log message in AgentNotProvisioned test (#2861) * Wait for log message in AgentNotProvisioned test * hardcoded value --------- Co-authored-by: narrieta * Always collect logs on end-to-end tests (#2863) * Always collect logs * cleanup --------- Co-authored-by: narrieta * agent publish scenario (#2847) * agent publish * remove vm size * address comments * deamom version fallback * daemon versionfix * address comments * fix pylint error * address comment * added error handling * add time window for agent manifest download (#2860) * add time window for agent manifest download * address comments * address comments * ignore 75-persistent-net-generator.rules in e2e tests (#2862) * ignore 75-persistent-net-generator.rules in e2e tests * address comment * remove * Always publish artifacts and test results (#2865) Co-authored-by: narrieta * Add tests for extension workflow (#2843) * Update version to dummy 1.0.0.0' * Revert version change * Basic structure * Test must run in SCUS for test ext * Add GuestAgentDCRTest Extension id * Test stucture * Update test file name * test no location * Test location as southcentralus * Assert ext is installed * Try changing version for dcr test ext * Update expected message in instance view * try changing message to string * Limit images for ext workflow * Update classes after refactor * Update class name * Refactor tests * Rename extension_install to extension_workflow * Assert ext status * Assert operation sequence is expected * Remove logger reference * Pass ssh client * Update ssh * Add permission to run script * Correct permissions * Add execute permissions for helper script * Make scripts executable * Change args to string * Add required parameter * Add shebang for retart_agent * Fix arg format * Use restart utility * Run restart with sudo * Add enable scenario * Attempt to remove start_time * Only assert enable * Add delete scenario * Fix uninstall scenario * Add extension update scenario * Run assert scenario on update scenario * Fix reference to ext * Format args as str instead of arr * Update test args * Add test case for update without install * Fix delete * Keep changes * Save changes * Add special chars test case * Fix dcr_ext issue{ * Add validate no lag scenario * Fix testguid reference * Add additional log statements for debugging * Fix message to check before encoding * Encode setting name * Correctly check data * Make check data executable * Fix command args for special char test * Fix no lag time * Fix ssh client reference * Try message instead of text * Remove unused method * Start clean up * Continue code cleanup * Fix pylint errors * Fix pylint errors * Start refactor * Debug agent lag * Update lag logging * Fix assert_that for lag * Remove typo * Add readme for extension_workflow scenario * Reformat comment * Improve logging * Refactor assert scenario * Remove unused constants * Remove unusued parameter in assert scenario * Add logging * Improve logging * Improve logging * Fix soft assertions issue * Remove todo for delete polling * Remove unnecessary new line * removed unnecessary function * Make special chars log more readable * remove unnecessary log * Add version to add or update log * Remove unnecessary assert instance view * Add empty log line * Add update back to restart args to debug * Add update back to restart args to debug * Remove unused init * Remove test_suites from pipeline yml * Update location in test suite yml * Add comment for location restriction * Remove unused init and fix comments * Improve method header * Rename scripts * Remove print_function * Rename is_data_in_waagent_log * Add comments describing assert operation sequence script * add comments to scripts and type annotate assert operation sequence * Add GuestAgentDcrExtension source code to repo * Fix typing.dict error * Fix typing issue * Remove outdated comment * Add comments to extension_workflow.py * rename scripts to match test suite name * Ignore pylint warnings on test ext * Update pylint rc to ignore tests_e2e/GuestAgentDcrTestExtension * Update pylint rc to ignore tests_e2e/GuestAgentDcrTestExtension * disable all errors/warnings dcr test ext * disable all errors/warnings dcr test ext * Run workflow on debian * Revert to dcr config distros * Move enable increment to beginning of function * Fix gs completed regex * Remove unnessary files from dcr test ext dir * Update agent_ext_workflow.yml to skip China and Gov clouds (#2872) * Update agent_ext_workflow.yml to skip China and Gov clouds * Update tests_e2e/test_suites/agent_ext_workflow.yml * fix daemon version (#2874) * Wait for extension goal state processing before checking for lag in log (#2873) * Update version to dummy 1.0.0.0' * Revert version change * Add sleep time to allow goal state processing to complete before lag check * Add retry logic to gs processing lag check * Clean up retry logic * Add back empty line * Fix timestamp parsing issue * Fix timestamp parsing issue * Fix timestamp parsing issue * Do 3 retries{ * Extract tarball with xvf during setup (#2880) In a pipeline run we saw the following error when extracting the tarball on the test node: Adding v to extract the contents with verbose * enable agent update in daily run (#2878) * Create Network Security Group for test VMs (#2882) * Create Network Security Group for test VMs * error handling --------- Co-authored-by: narrieta * don't allow downgrades for self-update (#2881) * don't allow downgrades for self-update * address comments * update comment * add logger * Supress telemetry failures from check agent log (#2887) Co-authored-by: narrieta * Install assertpy on test VMs (#2886) * Install assertpy on test VMs * set versions --------- Co-authored-by: narrieta * Add sample remote tests (#2888) * Add sample remote tests * add pass * review feedback --------- Co-authored-by: narrieta * Enable Extensions.Enabled in tests (#2892) * enable Extensions.Enabled * address comment * address comment * use script * improve msg * improve msg * Reorganize file structure of unit tests (#2894) * Reorganize file structure of unit tests * remove duplicate * add init * mocks --------- Co-authored-by: narrieta * Report useful message when extension processing is disabled (#2895) * Update version to dummy 1.0.0.0' * Revert version change * Fail GS fast in case of extensions disabled * Update extensions_disabled scenario to look for GS failed instead of timeout when extensions are disabled * Update to separate onHold and extensions enabled * Report ext disabled error in handler status * Try using GoalStateUnknownFailure * Fix indentation error * Try failing ext handler and checking logs * Report ext processing error * Attempt to fail fast * Fix param name * Init error * Try to reuse current code * Try to reuse current code * Clean code * Update scenario tests * Add ext status file to fail fast * Fail fast test * Report error when ext disabled * Update timeout to 20 mins * Re enable ext for debugging * Re enable ext for debugging * Log agent status update * Create ext status file with error code * Create ext status file with error code * We should report handler status even if not installed in case of extensions disabled * Clean up code change * Update tests for extensions disabled * Update test comment * Update test * Remove unused line * Remove ununsed timeout * Test failing case * Remove old case * Remove unused import * Test multiconfig ext * Add multi-config test case * Clean up test * Improve logging * Fix dir for testfile * Remove ignore error rules * Remove ununsed imports * Set handler status to not ready explicitly * Use OS Util to get agent conf path * Retry tar operations after 'Unexpected EOF in archive' during node setup (#2891) * Update version to dummy 1.0.0.0' * Revert version change * Capture output of the copy commands during setup * Add verbose to copy command * Update typing for copy to node methods * Print contents of tar before extracting * Print contents of tar before extracting * Print contents of tar before extracting * Print contents of tar before extracting * Retry copying tarball if contents on test node do not match * Revert copy method def * Revert copy method def * Catch EOF error * Retry tar operations if we see failure * Revert target_path * Remove accidental copy of exception * Remove blank line * tar cvf and copy commands overwrite * Add log and telemetry event for extension disabled (#2897) * Update version to dummy 1.0.0.0' * Revert version change * Add logs and telemetry for processing extensions when extensions disabled * Reformat string * Agent status scenario (#2875) * Update version to dummy 1.0.0.0' * Revert version change * Create files for agent status scenario * Add agent status test logic * fix pylint error * Add comment for retry * Mark failures as exceptions * Improve messages in logs * Improve comments * Update comments * Check that agent status updates without processing additional goal states 3 times * Remove unused agent status exception * Update comment * Clean up comments, logs, and imports * Exception should inherit from baseexception * Import datetime * Import datetime * Import timedelta * instance view time is already formatted * Increse status update time * Increse status update time * Increse status update time * Increase timeout * Update comments and timeoutS * Allow retry if agent status timestamp isn't updated after 30s * Remove unused import * Update time value in comment * address PR comments * Check if properties are None * Make types & errors more readable * Re-use vm_agent variable * Add comment for dot operator * multi config scenario (#2898) * Update version to dummy 1.0.0.0' * Revert version change * multi config scenario bare bones * multi config scenario bare bones * Stash * Add multi config test * Run on arm64 * RCv2 is not supported on arm64 * Test should own VM * Add single config ext to test * Add single config ext to test * Do not fail test if there are unexpected extensions on the vm * Update comment for accuracy * Make resource name parameter optional * Clean up code * agent and ext cgroups scenario (#2866) * agent-cgroups scenario * address comments * address comments * fix-pylint * pylint warn * address comments * improved logging" * improved ext cgroups scenario * new changes * pylint fix * updated * address comments * pylint warn * address comment * merge conflicts * agent firewall scenario (#2879) * agent firewall scenario * address comments * improved logging * pylint warn * address comments * updated * address comments * pylint warning * pylint warning * address comment * merge conflicts * Add retry and improve the log messages in agent update test (#2890) * add retry * improve log messages * merge conflicts * Cleanup common directory (#2902) Co-authored-by: narrieta * improved logging (#2893) * skip test in mooncake and usgov (#2904) * extension telemetry pipeline scenario (#2901) * Update version to dummy 1.0.0.0' * Revert version change * Barebones for etp * Scenario should own VM because of conf change * Add extension telemetry pipeline test * Clean up code * Improve log messages * Fix pylint errors * Improve logging * Improve code comments * VmAccess is not supported on flatcar * Address PR comments * Add support_distros in VmExtensionIdentifier * Fix logic for support_distros in VmExtensionIdentifier * Use run_remote_test for remote script * Ignore logcollector fetch failure if it recovers (#2906) * download_fail unit test should use agent version in common instead of 9.9.9.9 (#2908) (#2912) (cherry picked from commit ed80388c02471a1e196fd8d77cf0a74eab13c5c7) * Download certs on FT GS after check_certificates only when missing from disk (#2907) (#2913) * Download certs on FT GS only when missing from disk * Improve telemetry for inconsistent GS * Fix string format (cherry picked from commit c13f7500c4e3c93f081d0ff6cdb46c6ffdcdd43a) * Update pipeline.yml to increase timeout to 90 minutes (#2910) Runs have been timing out after 60 minutes due to multiple scenarios sharing VMs * Fix agent memory usage check (#2903) * fix memory usage check * add test * added comment * fix test * disable ga versioning changes (#2917) * Disable ga versioning changes (#2909) * disbale rsm changes * add flag (cherry picked from commit 5a4fae833a92de4d44b1939e48678a043132fbd4) * merge conflicts * fix the ignore rule in agent update test (#2915) (#2918) * ignore the agent installed version * address comments * address comments * fixes (cherry picked from commit 8985a4207b8279b07fdc5186e22b001aaadbd27d) * Use Mariner 2 in FIPS test (#2916) * Use Mariner 2 in FIPS test --------- Co-authored-by: narrieta * Change pipeline timeout to 90 minutes (#2925) * fix version checking (#2920) Co-authored-by: Norberto Arrieta * mariner container image (#2926) * mariner container image * added packages repo * addressed comments * addressed comments * Fix for "local variable _COLLECT_NOEXEC_ERRORS referenced before assignment" (#2935) * Fix for "local variable _COLLECT_NOEXEC_ERRORS referenced before assignment" * pylint --------- Co-authored-by: narrieta * fix agent manifest call frequency (#2923) (#2932) * fix agent manifest call frequency * new approach (cherry picked from commit 655403254331a7f7413c3d7448d83193daa08af3) * enable rhel/centos cgroups (#2922) * Add support for EC certificates (#2936) * Add support for EC certificates * pylint * pylint * typo --------- Co-authored-by: narrieta * Add Cpu Arch in local logs and telemetry events (#2938) * Add cpu arch to telem and local logs * Change get_vm_arch to static method * update unit tests * Remove e2e pipeline file * Remove arch from heartbeat * Move get_vm_arch to osutil * fix syntax issue * Fix unit test * skip cgorup monitor (#2939) * Clarify support status of installing from source. (#2941) Co-authored-by: narrieta * agent cpu quota scenario (#2937) * agent_cpu_quota scenario * addressed comments * addressed comments * skip test version install (#2950) * skip test install * address comments * pylint * local run stuff * undo * Add support for VM Scale Sets to end-to-end tests (#2954) --------- Co-authored-by: narrieta * Ignore dependencies when the extension does not have any settings (#2957) (#2962) * Ignore dependencies when the extension does not have any settings * Remove message --------- Co-authored-by: narrieta (cherry picked from commit 79bc12c8ca9f8aaacfb44a070812afe31123a600) * Cache daemon version (#2942) (#2963) * cache daemon version * address comments * test update (cherry picked from commit 279d55725c44550da610d7e29b0e38bbdcf9fab0) * update warning message (#2946) (#2964) (cherry picked from commit 33552eecc277a44a875f862bb3ae6a6f40334c49) * fix self-update frequency to spread over 24 hrs for regular type and 4 hrs for hotfix (#2948) (#2965) * update self-update frequency * address comment * mark with comment * addressed comment (cherry picked from commit f15e6ef7b4d17d514b639e8bc2e78507a2d71096) * Reduce the firewall check period in agent firewall tests (#2966) * reduce firewall check period * reduce firewall check period * undo get daemon version change (#2951) (#2967) * undo daemon change * pylint (cherry picked from commit fabe7e5843b796fcdddb303fd724d9d823d65727) * disable agent update (#2953) (#2968) (cherry picked from commit 9b15b0486248090448cd69c46aebdd2b8f608694) * Change agent_cgroups to own Vm (#2972) * Change cgroups to own Vm * Agent cgroups should own vm * Check SSH connectivity during end-to-end tests (#2970) Co-authored-by: narrieta * Gathering Guest ProxyAgent Log Files (#2975) * Remove debug info from waagent.status.json (#2971) * Remove debug info from waagent.status.json * pylint warnings * pylint --------- Co-authored-by: narrieta * Extension sequencing scenario (#2969) * update tests * cleanup * . * . * . * . * . * . * . * . * . * Add new test cases * Update scenario to support new tests * Scenario should support failing extensions and extensions with no settings * Clean up test * Remove locations from test suite yml * Fix deployment issue * Support creating multiple resource groups for vmss in one run * AzureMonitorLinuxAgent is not supported on flatcar * AzureMonitor is not supported on flatcar * remove agent update * Address PR comments * Fix issue with getting random ssh client * Address PR Comments * Address PR Comments * Address PR comments * Do not keep rg count in runbook * Use try/finally with lock * only check logs after scenario startS * Change to instance member --------- Co-authored-by: narrieta * rename log file for agent publish scenario (#2956) * rename log file * add param * address comment * Fix name collisions on resource groups created by AgentTestSuite (#2981) Co-authored-by: narrieta * Save goal state history explicitly (#2977) * Save goal state explicitly * typo * remove default value in internal method --------- Co-authored-by: narrieta * Handle errors when adding logs to the archive (#2982) Co-authored-by: narrieta * Timing issue while checking cpu quota (#2976) * timing issue * fix pylint" * undo * Use case-insentive match when cleaning up test resource groups (#2986) Co-authored-by: narrieta * Update supported Ubuntu versions (#2980) * Fix pylint warning (#2988) Co-authored-by: narrieta * Add information about HTTP proxies (#2985) * Add information about HTTP proxies * no_proxy --------- Co-authored-by: narrieta * agent persist firewall scenario (#2983) * agent persist firewall scenario * address comments * new comments * GA versioning refactor plus fetch new rsm properties. (#2974) * GA versioning refactor * added comment * added abstract decorator * undo abstract change * update names * addressed comments * pylint * agent family * state name * address comments * conf change * Run remote date command to get test case start time (#2993) * Run remote date command to get test case start time * Remove unused import * ext_sequencing scenario: get enable time from extension status files (#2992) * Get enable time from extension status files * Check for empty array * add status example in comments * ssh connection retry on restarts (#3001) * Add e2e test scenario for hostname monitoring (#3003) * Validate hostname is published * Run on distro without known issues * Add comment about debugging network down * Create e2e scenario for hostname monitoring * Remove unused import * Increase timeout for hostname change * Add password to VM and check for agent status if ssh fails * run scenario on all endorsed distros * Use getdistro() to check distro * Add comment to get_distro * Add publish_hostname to runbook * Make get_distro.py executable * Address first round of PR comments * Do not enable hostname monitoring on distros where it is disabled * Skip test on ubuntu * Update get-waagent-conf-value to remove unused variable * AMA is not supported on cbl-mariner 1.0 (#3002) * Cbl-mariner 1.0 is not supported by AMA * Use get distro to check distro * Add comment to get_distro * log update time for self updater (#3004) * add update time log * log new agent update time * fix tests * Fix publish hostname in china and gov clouds (#3005) * Fix regex to parse china/gov domain names * Improve regex * Improve regex * Self update e2e test (#3000) * self-update test * addressed comments * fix tests * log * added comment * merge conflicts * Lisa should not cleanup failed environment if keep_environment=failed (#3006) * Throw exception for test suite if a test failure occurs * Remove unused import * Clean up * Add comment * fix(ubuntu): Point to correct dhcp lease files (#2979) From Ubuntu 18.04, the default dhcp client was systemd-networkd. However, WALA has been checking for the dhclient lease files. This PR seeks to correct this bug.Interestingly, it was already configuring systemd-networkd but checking for dhclient lease files. Co-authored-by: Norberto Arrieta * Use self-hosted pool for automation runs (#3007) Co-authored-by: narrieta * Add distros which use Python 2.6 (for reference only) (#3009) Co-authored-by: narrieta * Move cleanup pipeline to self-hosted pool (#3010) Co-authored-by: narrieta * NM should not be restarted during hostname publish if NM_CONTROLLED=y (#3008) * Only restart NM if NM_controlled=n * Clean up code * Clean up code * improve logging * Make check on NM_CONTROLLED value sctrict * Install missing dependency (jq) on Azure Pipeline Agents (#3013) * Install missing dependency (jq) on Azure Pipeline Agents * use if statement * remove if statement --------- Co-authored-by: narrieta * Do not reset the mode of a extension's log directory (#3014) Co-authored-by: narrieta * Daemon should remove stale published_hostname file and log useful warning (#3016) * Daemon should remove published_hostname file and log useful warning * Clean up fast track file if vm id has changed * Clean up initial_goal_state file if vm id has changed * Clean up rsm_update file if vm id has changed * Do not report TestFailedException in test results (#3019) Co-authored-by: narrieta * skip agent update run on arm64 distros (#3018) * Clean test VMs older than 12 hours (#3021) Co-authored-by: narrieta * honor rsm update with no time when agent receives new GS (#3015) * honor rsm update immediately * pylint * improve msg * address comments * address comments * address comments * added verbose logging * Don't check Agent log from the top after each test suite (#3022) * Don't check Agent log from the top after each test suite * fix initialization of override --------- Co-authored-by: narrieta * update the proxy agenet log folder for logcollector (#3028) * Log instance view before asserting (#3029) * Add config parameter to wait for cloud-init (Extensions.WaitForCloudInit) (#3031) * Add config parameter to wait for cloud-init (Extensions.WaitForCloudInit) --------- Co-authored-by: narrieta * Revert changes to publish_hostname in RedhatOSModernUtil (#3032) * Revert changes to publish_hostname in RedhatOSModernUtil * Fix pylint bad-super-call * Remove agent_wait_for_cloud_init from automated runs (#3034) Co-authored-by: narrieta * Adding AutoUpdate.UpdateToLatestVersion new flag support (#3020) * support new flag * address comments * added more info * updated * address comments * resolving comment * updated * Retry get instance view if only name property is present (#3036) * Retry get instance view if incomplete during assertions * Retry getting instance view if only name property is present * Fix regex in agent extension workflow (#3035) * Recover primary nic if down after publishing hostname in RedhatOSUtil (#3024) * Check nic state and recover if down: * Fix typo * Fix state comparison * Fix pylint errors * Fix string comparison * Report publish hostname failure in calling thread * Add todo to check nic state for all distros where we reset network * Update detection to check connection state and separate recover from publish * Pylint unused argument * refactor recover_nic argument * Network interface e2e test * e2e test for recovering the network interface on redhat distros * Only run scenario on distros which use RedhatOSUtil * Fix call to parent publish_hostname to include recover_nic arg * Update comments in default os util * Remove comment * Fix comment * Do not do detection/recover on RedhatOSMOdernUtil * Resolve PR comments * Make script executable * Revert pypy change * Fix publish hostname paramters * Add recover_network_interface scenario to runbook (#3037) * Implementation of new conf flag AutoUpdate.UpdateToLatestVersion support (#3027) * GA update to latest version flag * address comments * resloving comments * added TODO * ignore warning * resolving comment * address comments * config present check * added a comment * Fix daily pipeline failures for recover_network_interface (#3039) * Fix daily pipeline failures for recover_network_interface * Clear any unused settings properties when enabling cse --------- Co-authored-by: Norberto Arrieta * Keep failed VMs by default on pipeline runs (#3040) * enable RSM e2e tests (#3030) * enable RSM tests * merge conflicts * Check for 'Access denied' errors when testing SSH connectivity (#3042) Co-authored-by: narrieta * Add Ubuntu 24 to end-to-end tests (#3041) * Add Ubuntu 24 to end-to-end tests * disable AzureMonitorLinuxAgent --------- Co-authored-by: narrieta * Skip capture of VM information on test runs (#3043) Co-authored-by: narrieta * Create symlink for waagent.com on Flatcar (#3045) Co-authored-by: narrieta * don't allow agent update if attempts reached max limit (#3033) * set max update attempts * download refactor * pylint * disable RSM updates (#3044) * Skip test on alma and rocky until we investigate (#3047) * fix agent update UT (#3051) * version update to 2.10.0.8 (#3050) * modify agent update flag (#3053) --------- Co-authored-by: Norberto Arrieta Co-authored-by: maddieford <93676569+maddieford@users.noreply.github.com> Co-authored-by: Long Li Co-authored-by: sebastienb-stormshield Co-authored-by: Zheyu Shen Co-authored-by: Zhidong Peng Co-authored-by: d1r3ct0r Co-authored-by: narrieta --- .github/workflows/ci_pr.yml | 40 +- .gitignore | 2 +- CODEOWNERS | 2 +- README.md | 112 +- azurelinuxagent/agent.py | 38 +- .../common/agent_supported_feature.py | 23 +- azurelinuxagent/common/conf.py | 56 +- azurelinuxagent/common/event.py | 12 +- azurelinuxagent/common/exception.py | 18 + azurelinuxagent/common/osutil/default.py | 22 +- azurelinuxagent/common/osutil/factory.py | 11 +- azurelinuxagent/common/osutil/gaia.py | 2 +- azurelinuxagent/common/osutil/iosxe.py | 4 +- azurelinuxagent/common/osutil/nsbsd.py | 7 +- azurelinuxagent/common/osutil/redhat.py | 114 +- azurelinuxagent/common/osutil/suse.py | 2 +- azurelinuxagent/common/osutil/ubuntu.py | 28 +- ...sions_goal_state_from_extensions_config.py | 9 +- .../extensions_goal_state_from_vm_settings.py | 33 +- azurelinuxagent/common/protocol/goal_state.py | 98 +- azurelinuxagent/common/protocol/restapi.py | 24 +- azurelinuxagent/common/protocol/util.py | 8 +- azurelinuxagent/common/protocol/wire.py | 18 +- azurelinuxagent/common/utils/cryptutil.py | 19 +- azurelinuxagent/common/utils/fileutil.py | 12 +- azurelinuxagent/common/utils/shellutil.py | 39 +- azurelinuxagent/common/version.py | 2 +- azurelinuxagent/daemon/main.py | 7 +- azurelinuxagent/ga/agent_update_handler.py | 241 ++++ azurelinuxagent/{common => ga}/cgroup.py | 0 azurelinuxagent/{common => ga}/cgroupapi.py | 9 +- .../{common => ga}/cgroupconfigurator.py | 8 +- .../{common => ga}/cgroupstelemetry.py | 2 +- azurelinuxagent/ga/collect_logs.py | 20 +- .../ga/collect_telemetry_events.py | 2 +- azurelinuxagent/ga/env.py | 8 +- .../utils => ga}/extensionprocessutil.py | 68 +- azurelinuxagent/ga/exthandlers.py | 75 +- azurelinuxagent/ga/ga_version_updater.py | 182 +++ azurelinuxagent/ga/guestagent.py | 331 +++++ azurelinuxagent/{common => ga}/interfaces.py | 0 .../{common => ga}/logcollector.py | 30 +- .../{common => ga}/logcollector_manifests.py | 5 + azurelinuxagent/ga/monitor.py | 8 +- .../{common => ga}/persist_firewall_rules.py | 0 azurelinuxagent/ga/rsm_version_updater.py | 137 ++ .../ga/self_update_version_updater.py | 184 +++ azurelinuxagent/ga/send_telemetry_events.py | 2 +- azurelinuxagent/ga/update.py | 786 ++--------- azurelinuxagent/pa/deprovision/default.py | 14 +- azurelinuxagent/pa/provision/default.py | 8 +- azurelinuxagent/pa/rdma/centos.py | 2 +- azurelinuxagent/pa/rdma/factory.py | 2 +- azurelinuxagent/{common => pa/rdma}/rdma.py | 23 +- azurelinuxagent/pa/rdma/suse.py | 2 +- azurelinuxagent/pa/rdma/ubuntu.py | 2 +- config/alpine/waagent.conf | 6 +- config/arch/waagent.conf | 5 +- config/bigip/waagent.conf | 5 +- config/clearlinux/waagent.conf | 8 +- config/coreos/waagent.conf | 5 +- config/debian/waagent.conf | 5 +- config/devuan/waagent.conf | 5 +- config/freebsd/waagent.conf | 5 +- config/gaia/waagent.conf | 7 + config/iosxe/waagent.conf | 5 +- config/mariner/waagent.conf | 8 +- config/nsbsd/waagent.conf | 9 +- config/openbsd/waagent.conf | 5 +- config/photonos/waagent.conf | 8 +- config/suse/waagent.conf | 5 +- config/ubuntu/waagent.conf | 5 +- config/waagent.conf | 5 +- makepkg.py | 5 +- setup.py | 2 +- test-requirements.txt | 4 +- tests/common/dhcp/test_dhcp.py | 2 +- tests/common/osutil/test_alpine.py | 2 +- tests/common/osutil/test_arch.py | 2 +- tests/common/osutil/test_bigip.py | 2 +- tests/common/osutil/test_clearlinux.py | 2 +- tests/common/osutil/test_coreos.py | 2 +- tests/common/osutil/test_default.py | 24 +- tests/common/osutil/test_default_osutil.py | 2 +- tests/common/osutil/test_factory.py | 9 +- tests/common/osutil/test_freebsd.py | 2 +- tests/common/osutil/test_nsbsd.py | 2 +- tests/common/osutil/test_openbsd.py | 2 +- tests/common/osutil/test_openwrt.py | 2 +- tests/common/osutil/test_photonos.py | 2 +- tests/common/osutil/test_redhat.py | 2 +- tests/common/osutil/test_suse.py | 2 +- tests/common/osutil/test_ubuntu.py | 2 +- tests/{distro => common/protocol}/__init__.py | 0 .../protocol/test_datacontract.py | 0 ...sions_goal_state_from_extensions_config.py | 58 +- ..._extensions_goal_state_from_vm_settings.py | 76 +- .../{ => common}/protocol/test_goal_state.py | 68 +- .../protocol/test_healthservice.py | 4 +- .../{ => common}/protocol/test_hostplugin.py | 20 +- .../protocol/test_image_info_matcher.py | 0 tests/{ => common}/protocol/test_imds.py | 4 +- .../test_metadata_server_migration_util.py | 2 +- .../protocol/test_protocol_util.py | 6 +- tests/{ => common}/protocol/test_wire.py | 177 +-- tests/common/test_agent_supported_feature.py | 20 +- tests/common/test_conf.py | 58 +- tests/common/test_errorstate.py | 2 +- tests/common/test_event.py | 19 +- tests/common/test_logger.py | 2 +- tests/common/test_singletonperthread.py | 2 +- tests/common/test_telemetryevent.py | 2 +- tests/common/test_version.py | 2 +- tests/{protocol => common/utils}/__init__.py | 0 tests/{ => common}/utils/test_archive.py | 2 +- tests/{ => common}/utils/test_crypt_util.py | 15 +- .../utils/test_extension_process_util.py | 26 +- tests/{ => common}/utils/test_file_util.py | 2 +- .../utils/test_flexible_version.py | 0 tests/{ => common}/utils/test_network_util.py | 2 +- tests/{ => common}/utils/test_passwords.txt | 0 tests/{ => common}/utils/test_rest_util.py | 2 +- tests/{ => common}/utils/test_shell_util.py | 11 +- tests/{ => common}/utils/test_text_util.py | 2 +- tests/daemon/test_daemon.py | 2 +- tests/daemon/test_resourcedisk.py | 122 +- tests/{distro => daemon}/test_scvmm.py | 2 +- tests/data/2 | 14 + .../config/waagent_auto_update_disabled.conf | 11 + ...led_update_to_latest_version_disabled.conf | 11 + ...bled_update_to_latest_version_enabled.conf | 11 + .../config/waagent_auto_update_enabled.conf | 11 + ...led_update_to_latest_version_disabled.conf | 11 + ...bled_update_to_latest_version_enabled.conf | 11 + ...ent_update_to_latest_version_disabled.conf | 11 + ...gent_update_to_latest_version_enabled.conf | 11 + ... => WALinuxAgent-9.9.9.10-no_manifest.zip} | Bin ....xml => ext_conf-agent_family_version.xml} | 4 + .../ext_conf-rsm_version_properties_false.xml | 152 +++ ... => vm_settings-agent_family_version.json} | 4 + ...gs-requested_version_properties_false.json | 145 +++ tests/data/test_waagent.conf | 6 +- tests/data/wire/ec-key.pem | 5 + tests/data/wire/ec-key.pub.pem | 4 + tests/data/wire/ext_conf_missing_family.xml | 19 - ...d_version.xml => ext_conf_rsm_version.xml} | 4 + ...t_conf_version_missing_in_agent_family.xml | 31 + ... ext_conf_version_missing_in_manifest.xml} | 4 + .../wire/ext_conf_version_not_from_rsm.xml | 33 + ...t_conf_vm_not_enabled_for_rsm_upgrades.xml | 33 + tests/data/wire/ga_manifest.xml | 5 +- tests/data/wire/ga_manifest_no_uris.xml | 39 + tests/data/wire/rsa-key.pem | 28 + tests/data/wire/rsa-key.pub.pem | 9 + tests/distro/test_resourceDisk.py | 148 --- tests/ga/test_agent_update_handler.py | 537 ++++++++ tests/{common => ga}/test_cgroupapi.py | 34 +- .../{common => ga}/test_cgroupconfigurator.py | 42 +- tests/{common => ga}/test_cgroups.py | 4 +- tests/{common => ga}/test_cgroupstelemetry.py | 74 +- tests/ga/test_collect_logs.py | 14 +- tests/ga/test_collect_telemetry_events.py | 4 +- tests/ga/test_env.py | 2 +- tests/ga/test_extension.py | 330 ++--- tests/ga/test_exthandlers.py | 36 +- .../ga/test_exthandlers_download_extension.py | 8 +- .../ga/test_exthandlers_exthandlerinstance.py | 2 +- tests/ga/test_guestagent.py | 301 +++++ tests/{common => ga}/test_logcollector.py | 72 +- tests/ga/test_monitor.py | 20 +- tests/ga/test_multi_config_extension.py | 16 +- tests/ga/test_periodic_operation.py | 2 +- .../test_persist_firewall_rules.py | 10 +- tests/ga/test_remoteaccess.py | 10 +- tests/ga/test_remoteaccess_handler.py | 6 +- tests/ga/test_report_status.py | 119 +- tests/ga/test_send_telemetry_events.py | 14 +- tests/ga/test_update.py | 1144 ++++++----------- tests/{utils => lib}/__init__.py | 0 tests/{utils => lib}/cgroups_tools.py | 0 tests/{utils => lib}/event_logger_tools.py | 10 +- tests/{ga => lib}/extension_emulator.py | 10 +- .../http_request_predicates.py} | 0 tests/{utils => lib}/miscellaneous_tools.py | 0 .../mock_cgroup_environment.py | 7 +- tests/{common => lib}/mock_command.py | 0 tests/{common => lib}/mock_environment.py | 2 +- .../mocks.py => lib/mock_update_handler.py} | 75 +- .../mocks.py => lib/mock_wire_protocol.py} | 16 +- tests/{ => lib}/tools.py | 5 +- .../wire_protocol_data.py} | 15 +- tests/pa/test_deprovision.py | 2 +- tests/pa/test_provision.py | 2 +- tests/test_agent.py | 20 +- .../GuestAgentDcrTest.py | 123 ++ .../HandlerManifest.json | 14 + tests_e2e/GuestAgentDcrTestExtension/Makefile | 8 + .../Utils/HandlerUtil.py | 387 ++++++ .../Utils/LogUtil.py | 50 + .../Utils/ScriptUtil.py | 140 ++ .../Utils/WAAgentUtil.py | 140 ++ .../Utils/test/MockUtil.py | 44 + .../Utils/test/env.py | 24 + .../Utils/test/mock.sh | 14 +- .../Utils/test/test_logutil.py | 35 + .../test/test_null_protected_settings.py | 48 + .../Utils/test/test_redacted_settings.py | 47 + .../Utils/test/test_scriptutil.py | 55 + .../GuestAgentDcrTestExtension/manifest.xml | 17 + .../GuestAgentDcrTestExtension/references | 2 + tests_e2e/orchestrator/docker/Dockerfile | 48 +- tests_e2e/orchestrator/lib/agent_junit.py | 18 +- .../orchestrator/lib/agent_test_loader.py | 192 ++- .../orchestrator/lib/agent_test_suite.py | 984 +++++++++----- .../lib/agent_test_suite_combinator.py | 685 +++++++--- .../lib/update_arm_template_hook.py | 88 ++ tests_e2e/orchestrator/runbook.yml | 200 ++- .../sample_runbooks/existing_vm.yml | 143 --- tests_e2e/orchestrator/scripts/agent-service | 92 ++ tests_e2e/orchestrator/scripts/collect-logs | 5 +- tests_e2e/orchestrator/scripts/install-agent | 82 +- tests_e2e/orchestrator/scripts/prepare-pypy | 56 + .../orchestrator/scripts/update-waagent-conf | 48 + .../orchestrator/scripts/waagent-version | 10 +- tests_e2e/orchestrator/templates/vmss.json | 253 ++++ tests_e2e/pipeline/pipeline-cleanup.yml | 90 +- tests_e2e/pipeline/pipeline.yml | 57 +- .../pipeline/scripts/collect_artifacts.sh | 69 + tests_e2e/pipeline/scripts/execute_tests.sh | 75 +- tests_e2e/pipeline/scripts/setup-agent.sh | 54 + tests_e2e/test_suites/agent_bvt.yml | 6 +- tests_e2e/test_suites/agent_cgroups.yml | 9 + tests_e2e/test_suites/agent_ext_workflow.yml | 14 + tests_e2e/test_suites/agent_firewall.yml | 15 + .../test_suites/agent_not_provisioned.yml | 12 + .../test_suites/agent_persist_firewall.yml | 19 + tests_e2e/test_suites/agent_publish.yml | 12 + tests_e2e/test_suites/agent_status.yml | 9 + tests_e2e/test_suites/agent_update.yml | 15 + .../test_suites/agent_wait_for_cloud_init.yml | 13 + tests_e2e/test_suites/ext_cgroups.yml | 13 + tests_e2e/test_suites/ext_sequencing.yml | 10 + .../test_suites/ext_telemetry_pipeline.yml | 9 + tests_e2e/test_suites/extensions_disabled.yml | 9 + tests_e2e/test_suites/fail.yml | 8 +- tests_e2e/test_suites/fips.yml | 16 + tests_e2e/test_suites/images.yml | 126 +- .../test_suites/keyvault_certificates.yml | 9 + tests_e2e/test_suites/multi_config_ext.yml | 9 + .../test_suites/no_outbound_connections.yml | 20 + tests_e2e/test_suites/pass.yml | 3 +- tests_e2e/test_suites/publish_hostname.yml | 8 + .../test_suites/recover_network_interface.yml | 17 + tests_e2e/test_suites/vmss.yml | 8 + .../extension_operations.py | 23 +- .../tests/{bvts => agent_bvt}/run_command.py | 19 +- .../tests/{bvts => agent_bvt}/vm_access.py | 22 +- .../tests/agent_cgroups/agent_cgroups.py | 43 + .../tests/agent_cgroups/agent_cpu_quota.py | 40 + tests_e2e/tests/agent_ext_workflow/README.md | 45 + .../agent_ext_workflow/extension_workflow.py | 443 +++++++ .../tests/agent_firewall/agent_firewall.py | 42 + .../agent_not_provisioned.py | 99 ++ .../disable_agent_provisioning.py | 63 + .../agent_persist_firewall.py | 78 ++ .../tests/agent_publish/agent_publish.py | 105 ++ tests_e2e/tests/agent_status/agent_status.py | 192 +++ tests_e2e/tests/agent_update/rsm_update.py | 279 ++++ tests_e2e/tests/agent_update/self_update.py | 172 +++ .../add_cloud_init_script.py | 63 + .../agent_wait_for_cloud_init.py | 91 ++ tests_e2e/tests/bvts/__init__.py | 0 tests_e2e/tests/ext_cgroups/ext_cgroups.py | 43 + .../tests/ext_cgroups/install_extensions.py | 112 ++ .../ext_sequencing/ext_seq_test_cases.py | 318 +++++ .../tests/ext_sequencing/ext_sequencing.py | 309 +++++ .../ext_telemetry_pipeline.py | 111 ++ .../extensions_disabled.py | 142 ++ tests_e2e/tests/fips/fips.py | 73 ++ .../keyvault_certificates.py | 95 ++ tests_e2e/tests/lib/agent_log.py | 115 +- tests_e2e/tests/lib/agent_test.py | 72 +- tests_e2e/tests/lib/agent_test_context.py | 185 ++- tests_e2e/tests/lib/azure_clouds.py | 24 + tests_e2e/tests/lib/azure_sdk_client.py | 59 + tests_e2e/tests/lib/cgroup_helpers.py | 150 +++ tests_e2e/tests/lib/firewall_helpers.py | 209 +++ tests_e2e/tests/lib/logging.py | 19 +- tests_e2e/tests/lib/network_security_rule.py | 182 +++ tests_e2e/tests/lib/remote_test.py | 48 + tests_e2e/tests/lib/resource_group_client.py | 74 ++ tests_e2e/tests/lib/retry.py | 53 +- tests_e2e/tests/lib/shell.py | 2 +- tests_e2e/tests/lib/ssh_client.py | 46 +- tests_e2e/tests/lib/update_arm_template.py | 141 ++ tests_e2e/tests/lib/virtual_machine.py | 143 --- tests_e2e/tests/lib/virtual_machine_client.py | 196 +++ ...py => virtual_machine_extension_client.py} | 174 +-- .../lib/virtual_machine_scale_set_client.py | 107 ++ ...ntifiers.py => vm_extension_identifier.py} | 47 +- .../multi_config_ext/multi_config_ext.py | 162 +++ .../check_fallback_to_hgap.py | 51 + .../check_no_outbound_connections.py | 59 + .../deny_outbound_connections.py | 47 + .../publish_hostname/publish_hostname.py | 209 +++ .../recover_network_interface.py | 139 ++ tests_e2e/tests/samples/error_remote_test.py | 32 + tests_e2e/tests/{ => samples}/error_test.py | 6 +- tests_e2e/tests/samples/fail_remote_test.py | 32 + tests_e2e/tests/{ => samples}/fail_test.py | 6 +- tests_e2e/tests/samples/pass_remote_test.py | 32 + tests_e2e/tests/{ => samples}/pass_test.py | 4 +- tests_e2e/tests/samples/vmss_test.py | 37 + .../agent_cgroups-check_cgroups_agent.py | 115 ++ .../agent_cpu_quota-check_agent_cpu_quota.py | 215 ++++ .../scripts/agent_cpu_quota-start_service.py | 96 ++ ..._ext_workflow-assert_operation_sequence.py | 183 +++ ...nt_ext_workflow-check_data_in_agent_log.py | 49 + ...g_between_agent_start_and_gs_processing.py | 117 ++ ...gent_firewall-verify_all_firewall_rules.py | 372 ++++++ .../agent_persist_firewall-access_wireserver | 85 ++ .../scripts/agent_persist_firewall-test_setup | 30 + ..._firewall-verify_firewall_rules_on_boot.py | 176 +++ ...firewall-verify_firewalld_rules_readded.py | 170 +++ ...verify_persist_firewall_service_running.py | 70 + .../scripts/agent_publish-check_update.py | 111 ++ ..._publish-get_agent_log_record_timestamp.py | 75 ++ .../agent_status-get_last_gs_processed.py | 47 + .../scripts/agent_update-modify_agent_version | 35 + .../scripts/agent_update-self_update_check.py | 62 + ...agent_update-self_update_latest_version.py | 69 + .../agent_update-self_update_test_setup | 74 ++ ...ate-verify_agent_reported_update_status.py | 61 + ...ate-verify_versioning_supported_feature.py | 54 + .../scripts/agent_update-wait_for_rsm_gs.py | 71 + .../ext_cgroups-check_cgroups_extensions.py | 224 ++++ .../ext_sequencing-get_ext_enable_time.py | 89 ++ ...telemetry_pipeline-add_extension_events.py | 224 ++++ .../tests/scripts/fips-check_fips_mariner | 56 + .../tests/scripts/fips-enable_fips_mariner | 53 + .../tests/scripts/get-waagent-conf-value | 41 + tests_e2e/tests/scripts/get_distro.py | 35 + ...ver_network_interface-get_nm_controlled.py | 39 + .../scripts/samples-error_remote_test.py | 36 + .../tests/scripts/samples-fail_remote_test.py | 37 + .../tests/scripts/samples-pass_remote_test.py | 36 + 346 files changed, 17459 insertions(+), 3956 deletions(-) create mode 100644 azurelinuxagent/ga/agent_update_handler.py rename azurelinuxagent/{common => ga}/cgroup.py (100%) rename azurelinuxagent/{common => ga}/cgroupapi.py (97%) rename azurelinuxagent/{common => ga}/cgroupconfigurator.py (99%) rename azurelinuxagent/{common => ga}/cgroupstelemetry.py (98%) rename azurelinuxagent/{common/utils => ga}/extensionprocessutil.py (68%) create mode 100644 azurelinuxagent/ga/ga_version_updater.py create mode 100644 azurelinuxagent/ga/guestagent.py rename azurelinuxagent/{common => ga}/interfaces.py (100%) rename azurelinuxagent/{common => ga}/logcollector.py (93%) rename azurelinuxagent/{common => ga}/logcollector_manifests.py (96%) rename azurelinuxagent/{common => ga}/persist_firewall_rules.py (100%) create mode 100644 azurelinuxagent/ga/rsm_version_updater.py create mode 100644 azurelinuxagent/ga/self_update_version_updater.py rename azurelinuxagent/{common => pa/rdma}/rdma.py (97%) rename tests/{distro => common/protocol}/__init__.py (100%) rename tests/{ => common}/protocol/test_datacontract.py (100%) rename tests/{ => common}/protocol/test_extensions_goal_state_from_extensions_config.py (54%) rename tests/{ => common}/protocol/test_extensions_goal_state_from_vm_settings.py (70%) rename tests/{ => common}/protocol/test_goal_state.py (90%) rename tests/{ => common}/protocol/test_healthservice.py (99%) rename tests/{ => common}/protocol/test_hostplugin.py (98%) rename tests/{ => common}/protocol/test_image_info_matcher.py (100%) rename tests/{ => common}/protocol/test_imds.py (99%) rename tests/{ => common}/protocol/test_metadata_server_migration_util.py (99%) rename tests/{ => common}/protocol/test_protocol_util.py (98%) rename tests/{ => common}/protocol/test_wire.py (88%) rename tests/{protocol => common/utils}/__init__.py (100%) rename tests/{ => common}/utils/test_archive.py (99%) rename tests/{ => common}/utils/test_crypt_util.py (83%) rename tests/{ => common}/utils/test_extension_process_util.py (92%) rename tests/{ => common}/utils/test_file_util.py (99%) rename tests/{ => common}/utils/test_flexible_version.py (100%) rename tests/{ => common}/utils/test_network_util.py (99%) rename tests/{ => common}/utils/test_passwords.txt (100%) rename tests/{ => common}/utils/test_rest_util.py (99%) rename tests/{ => common}/utils/test_shell_util.py (97%) rename tests/{ => common}/utils/test_text_util.py (99%) rename tests/{distro => daemon}/test_scvmm.py (98%) create mode 100644 tests/data/2 create mode 100644 tests/data/config/waagent_auto_update_disabled.conf create mode 100644 tests/data/config/waagent_auto_update_disabled_update_to_latest_version_disabled.conf create mode 100644 tests/data/config/waagent_auto_update_disabled_update_to_latest_version_enabled.conf create mode 100644 tests/data/config/waagent_auto_update_enabled.conf create mode 100644 tests/data/config/waagent_auto_update_enabled_update_to_latest_version_disabled.conf create mode 100644 tests/data/config/waagent_auto_update_enabled_update_to_latest_version_enabled.conf create mode 100644 tests/data/config/waagent_update_to_latest_version_disabled.conf create mode 100644 tests/data/config/waagent_update_to_latest_version_enabled.conf rename tests/data/ga/{WALinuxAgent-9.9.9.9-no_manifest.zip => WALinuxAgent-9.9.9.10-no_manifest.zip} (100%) rename tests/data/hostgaplugin/{ext_conf-requested_version.xml => ext_conf-agent_family_version.xml} (97%) create mode 100644 tests/data/hostgaplugin/ext_conf-rsm_version_properties_false.xml rename tests/data/hostgaplugin/{vm_settings-requested_version.json => vm_settings-agent_family_version.json} (97%) create mode 100644 tests/data/hostgaplugin/vm_settings-requested_version_properties_false.json create mode 100644 tests/data/wire/ec-key.pem create mode 100644 tests/data/wire/ec-key.pub.pem rename tests/data/wire/{ext_conf_requested_version.xml => ext_conf_rsm_version.xml} (89%) create mode 100644 tests/data/wire/ext_conf_version_missing_in_agent_family.xml rename tests/data/wire/{ext_conf_missing_requested_version.xml => ext_conf_version_missing_in_manifest.xml} (89%) create mode 100644 tests/data/wire/ext_conf_version_not_from_rsm.xml create mode 100644 tests/data/wire/ext_conf_vm_not_enabled_for_rsm_upgrades.xml create mode 100644 tests/data/wire/ga_manifest_no_uris.xml create mode 100644 tests/data/wire/rsa-key.pem create mode 100644 tests/data/wire/rsa-key.pub.pem delete mode 100644 tests/distro/test_resourceDisk.py create mode 100644 tests/ga/test_agent_update_handler.py rename tests/{common => ga}/test_cgroupapi.py (91%) rename tests/{common => ga}/test_cgroupconfigurator.py (96%) rename tests/{common => ga}/test_cgroups.py (98%) rename tests/{common => ga}/test_cgroupstelemetry.py (84%) create mode 100644 tests/ga/test_guestagent.py rename tests/{common => ga}/test_logcollector.py (85%) rename tests/{common => ga}/test_persist_firewall_rules.py (98%) rename tests/{utils => lib}/__init__.py (100%) rename tests/{utils => lib}/cgroups_tools.py (100%) rename tests/{utils => lib}/event_logger_tools.py (89%) rename tests/{ga => lib}/extension_emulator.py (98%) rename tests/{protocol/HttpRequestPredicates.py => lib/http_request_predicates.py} (100%) rename tests/{utils => lib}/miscellaneous_tools.py (100%) rename tests/{common => lib}/mock_cgroup_environment.py (96%) rename tests/{common => lib}/mock_command.py (100%) rename tests/{common => lib}/mock_environment.py (99%) rename tests/{ga/mocks.py => lib/mock_update_handler.py} (58%) rename tests/{protocol/mocks.py => lib/mock_wire_protocol.py} (93%) rename tests/{ => lib}/tools.py (99%) rename tests/{protocol/mockwiredata.py => lib/wire_protocol_data.py} (96%) create mode 100644 tests_e2e/GuestAgentDcrTestExtension/GuestAgentDcrTest.py create mode 100644 tests_e2e/GuestAgentDcrTestExtension/HandlerManifest.json create mode 100644 tests_e2e/GuestAgentDcrTestExtension/Makefile create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/HandlerUtil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/LogUtil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/ScriptUtil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/WAAgentUtil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/test/MockUtil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/test/env.py rename azurelinuxagent/distro/__init__.py => tests_e2e/GuestAgentDcrTestExtension/Utils/test/mock.sh (79%) mode change 100644 => 100755 create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_logutil.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_null_protected_settings.py create mode 100644 tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_redacted_settings.py create mode 100755 tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_scriptutil.py create mode 100644 tests_e2e/GuestAgentDcrTestExtension/manifest.xml create mode 100644 tests_e2e/GuestAgentDcrTestExtension/references create mode 100644 tests_e2e/orchestrator/lib/update_arm_template_hook.py delete mode 100644 tests_e2e/orchestrator/sample_runbooks/existing_vm.yml create mode 100755 tests_e2e/orchestrator/scripts/agent-service create mode 100755 tests_e2e/orchestrator/scripts/prepare-pypy create mode 100755 tests_e2e/orchestrator/scripts/update-waagent-conf rename azurelinuxagent/distro/suse/__init__.py => tests_e2e/orchestrator/scripts/waagent-version (75%) mode change 100644 => 100755 create mode 100644 tests_e2e/orchestrator/templates/vmss.json create mode 100755 tests_e2e/pipeline/scripts/collect_artifacts.sh create mode 100755 tests_e2e/pipeline/scripts/setup-agent.sh create mode 100644 tests_e2e/test_suites/agent_cgroups.yml create mode 100644 tests_e2e/test_suites/agent_ext_workflow.yml create mode 100644 tests_e2e/test_suites/agent_firewall.yml create mode 100644 tests_e2e/test_suites/agent_not_provisioned.yml create mode 100644 tests_e2e/test_suites/agent_persist_firewall.yml create mode 100644 tests_e2e/test_suites/agent_publish.yml create mode 100644 tests_e2e/test_suites/agent_status.yml create mode 100644 tests_e2e/test_suites/agent_update.yml create mode 100644 tests_e2e/test_suites/agent_wait_for_cloud_init.yml create mode 100644 tests_e2e/test_suites/ext_cgroups.yml create mode 100644 tests_e2e/test_suites/ext_sequencing.yml create mode 100644 tests_e2e/test_suites/ext_telemetry_pipeline.yml create mode 100644 tests_e2e/test_suites/extensions_disabled.yml create mode 100644 tests_e2e/test_suites/fips.yml create mode 100644 tests_e2e/test_suites/keyvault_certificates.yml create mode 100644 tests_e2e/test_suites/multi_config_ext.yml create mode 100644 tests_e2e/test_suites/no_outbound_connections.yml create mode 100644 tests_e2e/test_suites/publish_hostname.yml create mode 100644 tests_e2e/test_suites/recover_network_interface.yml create mode 100644 tests_e2e/test_suites/vmss.yml rename tests_e2e/tests/{bvts => agent_bvt}/extension_operations.py (81%) rename tests_e2e/tests/{bvts => agent_bvt}/run_command.py (80%) rename tests_e2e/tests/{bvts => agent_bvt}/vm_access.py (74%) create mode 100644 tests_e2e/tests/agent_cgroups/agent_cgroups.py create mode 100644 tests_e2e/tests/agent_cgroups/agent_cpu_quota.py create mode 100644 tests_e2e/tests/agent_ext_workflow/README.md create mode 100644 tests_e2e/tests/agent_ext_workflow/extension_workflow.py create mode 100644 tests_e2e/tests/agent_firewall/agent_firewall.py create mode 100755 tests_e2e/tests/agent_not_provisioned/agent_not_provisioned.py create mode 100755 tests_e2e/tests/agent_not_provisioned/disable_agent_provisioning.py create mode 100644 tests_e2e/tests/agent_persist_firewall/agent_persist_firewall.py create mode 100644 tests_e2e/tests/agent_publish/agent_publish.py create mode 100644 tests_e2e/tests/agent_status/agent_status.py create mode 100644 tests_e2e/tests/agent_update/rsm_update.py create mode 100644 tests_e2e/tests/agent_update/self_update.py create mode 100755 tests_e2e/tests/agent_wait_for_cloud_init/add_cloud_init_script.py create mode 100755 tests_e2e/tests/agent_wait_for_cloud_init/agent_wait_for_cloud_init.py delete mode 100644 tests_e2e/tests/bvts/__init__.py create mode 100644 tests_e2e/tests/ext_cgroups/ext_cgroups.py create mode 100644 tests_e2e/tests/ext_cgroups/install_extensions.py create mode 100644 tests_e2e/tests/ext_sequencing/ext_seq_test_cases.py create mode 100644 tests_e2e/tests/ext_sequencing/ext_sequencing.py create mode 100755 tests_e2e/tests/ext_telemetry_pipeline/ext_telemetry_pipeline.py create mode 100755 tests_e2e/tests/extensions_disabled/extensions_disabled.py create mode 100755 tests_e2e/tests/fips/fips.py create mode 100755 tests_e2e/tests/keyvault_certificates/keyvault_certificates.py create mode 100644 tests_e2e/tests/lib/azure_clouds.py create mode 100644 tests_e2e/tests/lib/azure_sdk_client.py create mode 100644 tests_e2e/tests/lib/cgroup_helpers.py create mode 100644 tests_e2e/tests/lib/firewall_helpers.py create mode 100644 tests_e2e/tests/lib/network_security_rule.py create mode 100644 tests_e2e/tests/lib/remote_test.py create mode 100644 tests_e2e/tests/lib/resource_group_client.py create mode 100644 tests_e2e/tests/lib/update_arm_template.py delete mode 100644 tests_e2e/tests/lib/virtual_machine.py create mode 100644 tests_e2e/tests/lib/virtual_machine_client.py rename tests_e2e/tests/lib/{vm_extension.py => virtual_machine_extension_client.py} (51%) create mode 100644 tests_e2e/tests/lib/virtual_machine_scale_set_client.py rename tests_e2e/tests/lib/{identifiers.py => vm_extension_identifier.py} (55%) create mode 100644 tests_e2e/tests/multi_config_ext/multi_config_ext.py create mode 100755 tests_e2e/tests/no_outbound_connections/check_fallback_to_hgap.py create mode 100755 tests_e2e/tests/no_outbound_connections/check_no_outbound_connections.py create mode 100755 tests_e2e/tests/no_outbound_connections/deny_outbound_connections.py create mode 100644 tests_e2e/tests/publish_hostname/publish_hostname.py create mode 100644 tests_e2e/tests/recover_network_interface/recover_network_interface.py create mode 100755 tests_e2e/tests/samples/error_remote_test.py rename tests_e2e/tests/{ => samples}/error_test.py (83%) create mode 100755 tests_e2e/tests/samples/fail_remote_test.py rename tests_e2e/tests/{ => samples}/fail_test.py (87%) create mode 100755 tests_e2e/tests/samples/pass_remote_test.py rename tests_e2e/tests/{ => samples}/pass_test.py (91%) create mode 100755 tests_e2e/tests/samples/vmss_test.py create mode 100755 tests_e2e/tests/scripts/agent_cgroups-check_cgroups_agent.py create mode 100755 tests_e2e/tests/scripts/agent_cpu_quota-check_agent_cpu_quota.py create mode 100755 tests_e2e/tests/scripts/agent_cpu_quota-start_service.py create mode 100755 tests_e2e/tests/scripts/agent_ext_workflow-assert_operation_sequence.py create mode 100755 tests_e2e/tests/scripts/agent_ext_workflow-check_data_in_agent_log.py create mode 100755 tests_e2e/tests/scripts/agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py create mode 100755 tests_e2e/tests/scripts/agent_firewall-verify_all_firewall_rules.py create mode 100755 tests_e2e/tests/scripts/agent_persist_firewall-access_wireserver create mode 100755 tests_e2e/tests/scripts/agent_persist_firewall-test_setup create mode 100755 tests_e2e/tests/scripts/agent_persist_firewall-verify_firewall_rules_on_boot.py create mode 100755 tests_e2e/tests/scripts/agent_persist_firewall-verify_firewalld_rules_readded.py create mode 100755 tests_e2e/tests/scripts/agent_persist_firewall-verify_persist_firewall_service_running.py create mode 100755 tests_e2e/tests/scripts/agent_publish-check_update.py create mode 100755 tests_e2e/tests/scripts/agent_publish-get_agent_log_record_timestamp.py create mode 100755 tests_e2e/tests/scripts/agent_status-get_last_gs_processed.py create mode 100755 tests_e2e/tests/scripts/agent_update-modify_agent_version create mode 100755 tests_e2e/tests/scripts/agent_update-self_update_check.py create mode 100755 tests_e2e/tests/scripts/agent_update-self_update_latest_version.py create mode 100755 tests_e2e/tests/scripts/agent_update-self_update_test_setup create mode 100755 tests_e2e/tests/scripts/agent_update-verify_agent_reported_update_status.py create mode 100755 tests_e2e/tests/scripts/agent_update-verify_versioning_supported_feature.py create mode 100755 tests_e2e/tests/scripts/agent_update-wait_for_rsm_gs.py create mode 100755 tests_e2e/tests/scripts/ext_cgroups-check_cgroups_extensions.py create mode 100755 tests_e2e/tests/scripts/ext_sequencing-get_ext_enable_time.py create mode 100755 tests_e2e/tests/scripts/ext_telemetry_pipeline-add_extension_events.py create mode 100755 tests_e2e/tests/scripts/fips-check_fips_mariner create mode 100755 tests_e2e/tests/scripts/fips-enable_fips_mariner create mode 100755 tests_e2e/tests/scripts/get-waagent-conf-value create mode 100755 tests_e2e/tests/scripts/get_distro.py create mode 100755 tests_e2e/tests/scripts/recover_network_interface-get_nm_controlled.py create mode 100755 tests_e2e/tests/scripts/samples-error_remote_test.py create mode 100755 tests_e2e/tests/scripts/samples-fail_remote_test.py create mode 100755 tests_e2e/tests/scripts/samples-pass_remote_test.py diff --git a/.github/workflows/ci_pr.yml b/.github/workflows/ci_pr.yml index e5592688c..84b3ab68e 100644 --- a/.github/workflows/ci_pr.yml +++ b/.github/workflows/ci_pr.yml @@ -8,7 +8,7 @@ on: workflow_dispatch: jobs: - test-legacy-python-versions: + test-python-2_6-and-3_4-versions: strategy: fail-fast: false @@ -50,16 +50,42 @@ jobs: ./ci/nosetests.sh exit $? + test-python-2_7: + + strategy: + fail-fast: false + + name: "Python 2.7 Unit Tests" + runs-on: ubuntu-20.04 + defaults: + run: + shell: bash -l {0} + + env: + NOSEOPTS: "--verbose" + + steps: + - uses: actions/checkout@v3 + + - name: Install Python 2.7 + run: | + apt-get update + apt-get install -y curl bzip2 sudo + curl https://dcrdata.blob.core.windows.net/python/python-2.7.tar.bz2 -o python-2.7.tar.bz2 + sudo tar xjvf python-2.7.tar.bz2 --directory / + + - name: Test with nosetests + run: | + source /home/waagent/virtualenv/python2.7.16/bin/activate + ./ci/nosetests.sh + exit $? + test-current-python-versions: strategy: fail-fast: false matrix: include: - - - python-version: 2.7 - PYLINTOPTS: "--rcfile=ci/2.7.pylintrc --ignore=tests_e2e,makepkg.py" - - python-version: 3.5 PYLINTOPTS: "--rcfile=ci/3.6.pylintrc --ignore=tests_e2e,makepkg.py" @@ -123,6 +149,6 @@ jobs: - name: Upload Coverage if: matrix.python-version == 3.9 - uses: codecov/codecov-action@v2 + uses: codecov/codecov-action@v3 with: - file: ./coverage.xml \ No newline at end of file + file: ./coverage.xml diff --git a/.gitignore b/.gitignore index fd64d3314..79226a492 100644 --- a/.gitignore +++ b/.gitignore @@ -90,4 +90,4 @@ ENV/ # pyenv .python-version -.vscode/ +.vscode/ \ No newline at end of file diff --git a/CODEOWNERS b/CODEOWNERS index 8707e60a5..aebbe4c94 100644 --- a/CODEOWNERS +++ b/CODEOWNERS @@ -21,4 +21,4 @@ # # Linux Agent team # -* @narrieta @ZhidongPeng @nagworld9 @maddieford +* @narrieta @ZhidongPeng @nagworld9 @maddieford @gabstamsft diff --git a/README.md b/README.md index ae6a85106..5a5b126f2 100644 --- a/README.md +++ b/README.md @@ -58,13 +58,33 @@ The information flow from the platform to the agent occurs via two channels: * A TCP endpoint exposing a REST API used to obtain deployment and topology configuration. -The agent will use an HTTP proxy if provided via the `http_proxy` (for `http` requests) or -`https_proxy` (for `https` requests) environment variables. The `HttpProxy.Host` and -`HttpProxy.Port` configuration variables (see below), if used, will override the environment -settings. Due to limitations of Python, the agent *does not* support HTTP proxies requiring -authentication. Note that when the agent service is managed by systemd, environment variables -such as `http_proxy` and `https_proxy` should be defined using one the mechanisms provided by -systemd (e.g. by using Environment or EnvironmentFile in the service file). +### HTTP Proxy +The Agent will use an HTTP proxy if provided via the `http_proxy` (for `http` requests) or +`https_proxy` (for `https` requests) environment variables. Due to limitations of Python, +the agent *does not* support HTTP proxies requiring authentication. + +Similarly, the Agent will bypass the proxy if the environment variable `no_proxy` is set. + +Note that the way to define those environment variables for the Agent service varies across different distros. For distros +that use systemd, a common approach is to use Environment or EnvironmentFile in the [Service] section of the service +definition, for example using an override or a drop-in file (see "systemctl edit" for overrides). + +Example +```bash + # cat /etc/systemd/system/walinuxagent.service.d/http-proxy.conf + [Service] + Environment="http_proxy=http://proxy.example.com:80/" + Environment="https_proxy=http://proxy.example.com:80/" + # +``` + +The Agent passes its environment to the VM Extensions it executes, including `http_proxy` and `https_proxy`, so defining +a proxy for the Agent will also define it for the VM Extensions. + + +The [`HttpProxy.Host` and `HttpProxy.Port`](#httpproxyhost-httpproxyport) configuration variables, if used, override +the environment settings. Note that this configuration variables are local to the Agent process and are not passed to +VM Extensions. ## Requirements @@ -84,11 +104,13 @@ Waagent depends on some system packages in order to function properly: ## Installation -Installation via your distribution's package repository is preferred. -You can also customize your own RPM or DEB packages using the configuration -samples provided (see deb and rpm sections below). +Installing via your distribution's package repository is the only method that is supported. + +You can install from source for more advanced options, such as installing to a custom location or creating +custom images. Installing from source, though, may override customizations done to the Agent by your +distribution, and is meant only for advanced users. We provide very limited support for this method. -For more advanced installation options, such as installing to custom locations or prefixes, you can use **setuptools** to install from source by running: +To install from source, you can use **setuptools**: ```bash sudo python setup.py install --register-service @@ -108,11 +130,18 @@ You can view more installation options by running: The agent's log file is kept at `/var/log/waagent.log`. +Lastly, you can also customize your own RPM or DEB packages using the configuration +samples provided in the deb and rpm sections below. This method is also meant for advanced users and we +provide very limited support for it. + + ## Upgrade -Upgrading via your distribution's package repository is strongly preferred. +Upgrading via your distribution's package repository or using automatic updates are the only supported +methods. More information can be found here: [Update Linux Agent](https://learn.microsoft.com/en-us/azure/virtual-machines/extensions/update-linux-agent) -If upgrading manually, same with installation above by running: +To upgrade the Agent from source, you can use **setuptools**. Upgrading from source is meant for advanced +users and we provide very limited support for it. ```bash sudo python setup.py install --force @@ -232,6 +261,30 @@ without the agent. In order to do that, the `provisionVMAgent` flag must be set provisioning time, via whichever API is being used. We will provide more details on this on our wiki when it is generally available. +#### __Extensions.WaitForCloudInit__ + +_Type: Boolean_ +_Default: n_ + +Waits for cloud-init to complete (cloud-init status --wait) before executing VM extensions. + +Both cloud-init and VM extensions are common ways to customize a VM during initial deployment. By +default, the agent will start executing extensions while cloud-init may still be in the 'config' +stage and won't wait for the 'final' stage to complete. Cloud-init and extensions may execute operations +that conflict with each other (for example, both of them may try to install packages). Setting this option +to 'y' ensures that VM extensions are executed only after cloud-init has completed all its stages. + +Note that using this option requires creating a custom image with the value of this option set to 'y', in +order to ensure that the wait is performed during the initial deployment of the VM. + +#### __Extensions.WaitForCloudInitTimeout__ + +_Type: Integer_ +_Default: 3600_ + +Timeout in seconds for the Agent to wait on cloud-init. If the timeout elapses, the Agent will continue +executing VM extensions. See Extensions.WaitForCloudInit for more details. + #### __Extensions.GoalStatePeriod__ _Type: Integer_ @@ -244,19 +297,38 @@ _Note_: setting up this parameter to more than a few minutes can make the state the VM be reported as unresponsive/unavailable on the Azure portal. Also, this setting affects how fast the agent starts executing extensions. -#### __AutoUpdate.Enabled__ +#### __AutoUpdate.UpdateToLatestVersion__ -_Type: Boolean_ +_Type: Boolean_ _Default: y_ -Enables auto-update of the Extension Handler. The Extension Handler is responsible +Enables auto-update of the Extension Handler. The Extension Handler is responsible for managing extensions and reporting VM status. The core functionality of the agent -is contained in the Extension Handler, and we encourage users to enable this option +is contained in the Extension Handler, and we encourage users to enable this option in order to maintain an up to date version. + +When this option is enabled, the Agent will install new versions when they become +available. When disabled, the Agent will not install any new versions, but it will use +the most recent version already installed on the VM. -On most distros the default value is 'y'. +_Notes_: +1. This option was added on version 2.10.0.8 of the Agent. For previous versions, see AutoUpdate.Enabled. +2. If both options are specified in waagent.conf, AutoUpdate.UpdateToLatestVersion overrides the value set for AutoUpdate.Enabled. +3. Changing config option requires a service restart to pick up the updated setting. -For more information on the agent version, see our [FAQ](https://github.com/Azure/WALinuxAgent/wiki/FAQ#what-does-goal-state-agent-mean-in-waagent---version-output). +For more information on the agent version, see our [FAQ](https://github.com/Azure/WALinuxAgent/wiki/FAQ#what-does-goal-state-agent-mean-in-waagent---version-output).
+For more information on the agent update, see our [FAQ](https://github.com/Azure/WALinuxAgent/wiki/FAQ#how-auto-update-works-for-extension-handler).
+For more information on the AutoUpdate.UpdateToLatestVersion vs AutoUpdate.Enabled, see our [FAQ](https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion).
+ +#### __AutoUpdate.Enabled__ + +_Type: Boolean_ +_Default: y_ + +Enables auto-update of the Extension Handler. This flag is supported for legacy reasons and we strongly recommend using AutoUpdate.UpdateToLatestVersion instead. +The difference between these 2 flags is that, when set to 'n', AutoUpdate.Enabled will use the version of the Extension Handler that is pre-installed on the image, while AutoUpdate.UpdateToLatestVersion will use the most recent version that has already been installed on the VM (via auto-update). + +On most distros the default value is 'y'. #### __Provisioning.Agent__ @@ -555,7 +627,7 @@ directory. _Type: String_ _Default: None_ -If set, the agent will use this proxy server to access the internet. These values +If set, the agent will use this proxy server for HTTP/HTTPS requests. These values *will* override the `http_proxy` or `https_proxy` environment variables. Lastly, `HttpProxy.Host` is required (if to be used) and `HttpProxy.Port` is optional. diff --git a/azurelinuxagent/agent.py b/azurelinuxagent/agent.py index 8c303482e..2811e215e 100644 --- a/azurelinuxagent/agent.py +++ b/azurelinuxagent/agent.py @@ -28,14 +28,15 @@ import subprocess import sys import threading -from azurelinuxagent.common import cgroupconfigurator, logcollector -from azurelinuxagent.common.cgroupapi import SystemdCgroupsApi +from azurelinuxagent.ga import logcollector, cgroupconfigurator +from azurelinuxagent.ga.cgroup import AGENT_LOG_COLLECTOR, CpuCgroup, MemoryCgroup +from azurelinuxagent.ga.cgroupapi import SystemdCgroupsApi import azurelinuxagent.common.conf as conf import azurelinuxagent.common.event as event import azurelinuxagent.common.logger as logger from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.logcollector import LogCollector, OUTPUT_RESULTS_FILE_PATH +from azurelinuxagent.ga.logcollector import LogCollector, OUTPUT_RESULTS_FILE_PATH from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.utils import fileutil, textutil from azurelinuxagent.common.utils.flexible_version import FlexibleVersion @@ -104,7 +105,7 @@ def __init__(self, verbose, conf_file_path=None): if os.path.isfile(ext_log_dir): raise Exception("{0} is a file".format(ext_log_dir)) if not os.path.isdir(ext_log_dir): - fileutil.mkdir(ext_log_dir, mode=0o755, owner="root") + fileutil.mkdir(ext_log_dir, mode=0o755, owner=self.osutil.get_root_username()) except Exception as e: logger.error( "Exception occurred while creating extension " @@ -204,11 +205,10 @@ def collect_logs(self, is_full_mode): logger.info("Running log collector mode normal") # Check the cgroups unit - cpu_cgroup_path, memory_cgroup_path, log_collector_monitor = None, None, None - if CollectLogsHandler.should_validate_cgroups(): - cgroups_api = SystemdCgroupsApi() - cpu_cgroup_path, memory_cgroup_path = cgroups_api.get_process_cgroup_paths("self") - + log_collector_monitor = None + cgroups_api = SystemdCgroupsApi() + cpu_cgroup_path, memory_cgroup_path = cgroups_api.get_process_cgroup_paths("self") + if CollectLogsHandler.is_enabled_monitor_cgroups_check(): cpu_slice_matches = (cgroupconfigurator.LOGCOLLECTOR_SLICE in cpu_cgroup_path) memory_slice_matches = (cgroupconfigurator.LOGCOLLECTOR_SLICE in memory_cgroup_path) @@ -221,10 +221,24 @@ def collect_logs(self, is_full_mode): sys.exit(logcollector.INVALID_CGROUPS_ERRCODE) + def initialize_cgroups_tracking(cpu_cgroup_path, memory_cgroup_path): + cpu_cgroup = CpuCgroup(AGENT_LOG_COLLECTOR, cpu_cgroup_path) + msg = "Started tracking cpu cgroup {0}".format(cpu_cgroup) + logger.info(msg) + cpu_cgroup.initialize_cpu_usage() + memory_cgroup = MemoryCgroup(AGENT_LOG_COLLECTOR, memory_cgroup_path) + msg = "Started tracking memory cgroup {0}".format(memory_cgroup) + logger.info(msg) + return [cpu_cgroup, memory_cgroup] + try: - log_collector = LogCollector(is_full_mode, cpu_cgroup_path, memory_cgroup_path) - log_collector_monitor = get_log_collector_monitor_handler(log_collector.cgroups) - log_collector_monitor.run() + log_collector = LogCollector(is_full_mode) + # Running log collector resource(CPU, Memory) monitoring only if agent starts the log collector. + # If Log collector start by any other means, then it will not be monitored. + if CollectLogsHandler.is_enabled_monitor_cgroups_check(): + tracked_cgroups = initialize_cgroups_tracking(cpu_cgroup_path, memory_cgroup_path) + log_collector_monitor = get_log_collector_monitor_handler(tracked_cgroups) + log_collector_monitor.run() archive = log_collector.collect_logs_and_get_archive() logger.info("Log collection successfully completed. Archive can be found at {0} " "and detailed log output can be found at {1}".format(archive, OUTPUT_RESULTS_FILE_PATH)) diff --git a/azurelinuxagent/common/agent_supported_feature.py b/azurelinuxagent/common/agent_supported_feature.py index d7f93e224..694c63639 100644 --- a/azurelinuxagent/common/agent_supported_feature.py +++ b/azurelinuxagent/common/agent_supported_feature.py @@ -14,6 +14,7 @@ # # Requires Python 2.6+ and Openssl 1.0+ # +from azurelinuxagent.common import conf class SupportedFeatureNames(object): @@ -23,6 +24,7 @@ class SupportedFeatureNames(object): MultiConfig = "MultipleExtensionsPerHandler" ExtensionTelemetryPipeline = "ExtensionTelemetryPipeline" FastTrack = "FastTrack" + GAVersioningGovernance = "VersioningGovernance" # Guest Agent Versioning class AgentSupportedFeature(object): @@ -72,9 +74,28 @@ def __init__(self): supported=self.__SUPPORTED) +class _GAVersioningGovernanceFeature(AgentSupportedFeature): + """ + CRP would drive the RSM update if agent reports that it does support RSM upgrades with this flag otherwise CRP fallback to largest version. + Agent doesn't report supported feature flag if auto update is disabled or old version of agent running that doesn't understand GA versioning. + + Note: Especially Windows need this flag to report to CRP that GA doesn't support the updates. So linux adopted same flag to have a common solution. + """ + + __NAME = SupportedFeatureNames.GAVersioningGovernance + __VERSION = "1.0" + __SUPPORTED = conf.get_auto_update_to_latest_version() + + def __init__(self): + super(_GAVersioningGovernanceFeature, self).__init__(name=self.__NAME, + version=self.__VERSION, + supported=self.__SUPPORTED) + + # This is the list of features that Agent supports and we advertise to CRP __CRP_ADVERTISED_FEATURES = { - SupportedFeatureNames.MultiConfig: _MultiConfigFeature() + SupportedFeatureNames.MultiConfig: _MultiConfigFeature(), + SupportedFeatureNames.GAVersioningGovernance: _GAVersioningGovernanceFeature() } diff --git a/azurelinuxagent/common/conf.py b/azurelinuxagent/common/conf.py index 46765ea98..666228531 100644 --- a/azurelinuxagent/common/conf.py +++ b/azurelinuxagent/common/conf.py @@ -87,6 +87,12 @@ def get_int(self, key, default_value): except ValueError: return self._get_default(default_value) + def is_present(self, key): + """ + Returns True if the given flag present in the configuration file, False otherwise. + """ + return self.values.get(key) is not None + __conf__ = ConfigurationProvider() @@ -117,6 +123,7 @@ def load_conf_from_file(conf_file_path, conf=__conf__): "Logs.Console": True, "Logs.Collect": True, "Extensions.Enabled": True, + "Extensions.WaitForCloudInit": False, "Provisioning.AllowResetSysUser": False, "Provisioning.RegenerateSshHostKeyPair": False, "Provisioning.DeleteRootPassword": False, @@ -128,6 +135,7 @@ def load_conf_from_file(conf_file_path, conf=__conf__): "ResourceDisk.EnableSwap": False, "ResourceDisk.EnableSwapEncryption": False, "AutoUpdate.Enabled": True, + "AutoUpdate.UpdateToLatestVersion": True, "EnableOverProvisioning": True, # # "Debug" options are experimental and may be removed in later @@ -138,7 +146,7 @@ def load_conf_from_file(conf_file_path, conf=__conf__): "Debug.CgroupDisableOnQuotaCheckFailure": True, "Debug.EnableAgentMemoryUsageCheck": False, "Debug.EnableFastTrack": True, - "Debug.EnableGAVersioning": False + "Debug.EnableGAVersioning": True } @@ -169,6 +177,7 @@ def load_conf_from_file(conf_file_path, conf=__conf__): __INTEGER_OPTIONS__ = { "Extensions.GoalStatePeriod": 6, "Extensions.InitialGoalStatePeriod": 6, + "Extensions.WaitForCloudInitTimeout": 3600, "OS.EnableFirewallPeriod": 300, "OS.RemovePersistentNetRulesPeriod": 30, "OS.RootDeviceScsiTimeoutPeriod": 30, @@ -227,6 +236,13 @@ def get_switch_default_value(option): raise ValueError("{0} is not a valid configuration parameter.".format(option)) +def is_present(key, conf=__conf__): + """ + Returns True if the given flag present in the configuration file, False otherwise. + """ + return conf.is_present(key) + + def enable_firewall(conf=__conf__): return conf.get_switch("OS.EnableFirewall", False) @@ -371,6 +387,14 @@ def get_extensions_enabled(conf=__conf__): return conf.get_switch("Extensions.Enabled", True) +def get_wait_for_cloud_init(conf=__conf__): + return conf.get_switch("Extensions.WaitForCloudInit", False) + + +def get_wait_for_cloud_init_timeout(conf=__conf__): + return conf.get_switch("Extensions.WaitForCloudInitTimeout", 3600) + + def get_goal_state_period(conf=__conf__): return conf.get_int("Extensions.GoalStatePeriod", 6) @@ -502,6 +526,21 @@ def get_monitor_network_configuration_changes(conf=__conf__): return conf.get_switch("Monitor.NetworkConfigurationChanges", False) +def get_auto_update_to_latest_version(conf=__conf__): + """ + If set to True, agent will update to the latest version + NOTE: + when both turned on, both AutoUpdate.Enabled and AutoUpdate.UpdateToLatestVersion same meaning: update to latest version + when turned off, AutoUpdate.Enabled: reverts to pre-installed agent, AutoUpdate.UpdateToLatestVersion: uses latest version already installed on the vm and does not download new agents + Even we are deprecating AutoUpdate.Enabled, we still need to support if users explicitly setting it instead new flag. + If AutoUpdate.UpdateToLatestVersion is present, it overrides any value set for AutoUpdate.Enabled (if present). + If AutoUpdate.UpdateToLatestVersion is not present but AutoUpdate.Enabled is present and set to 'n', we adhere to AutoUpdate.Enabled flag's behavior + if both not present, we default to True. + """ + default = get_autoupdate_enabled(conf=conf) + return conf.get_switch("AutoUpdate.UpdateToLatestVersion", default) + + def get_cgroup_check_period(conf=__conf__): """ How often to perform checks on cgroups (are the processes in the cgroups as expected, @@ -610,26 +649,25 @@ def get_etp_collection_period(conf=__conf__): return conf.get_int("Debug.EtpCollectionPeriod", 300) -def get_hotfix_upgrade_frequency(conf=__conf__): +def get_self_update_hotfix_frequency(conf=__conf__): """ - Determines the frequency to check for Hotfix upgrades (. version changed in new upgrades). + Determines the frequency to check for Hotfix upgrades ( version changed in new upgrades). NOTE: This option is experimental and may be removed in later versions of the Agent. """ - return conf.get_int("Debug.AutoUpdateHotfixFrequency", 4 * 60 * 60) + return conf.get_int("Debug.SelfUpdateHotfixFrequency", 4 * 60 * 60) -def get_normal_upgrade_frequency(conf=__conf__): +def get_self_update_regular_frequency(conf=__conf__): """ - Determines the frequency to check for Normal upgrades (. version changed in new upgrades). + Determines the frequency to check for regular upgrades (.. version changed in new upgrades). NOTE: This option is experimental and may be removed in later versions of the Agent. """ - return conf.get_int("Debug.AutoUpdateNormalFrequency", 24 * 60 * 60) + return conf.get_int("Debug.SelfUpdateRegularFrequency", 24 * 60 * 60) def get_enable_ga_versioning(conf=__conf__): """ - If True, the agent uses GA Versioning for auto-updating the agent vs automatically auto-updating to the highest version. - + If True, the agent looks for rsm updates(checking requested version in GS) otherwise it will fall back to self-update and finds the highest version from PIR. NOTE: This option is experimental and may be removed in later versions of the Agent. """ return conf.get_switch("Debug.EnableGAVersioning", False) diff --git a/azurelinuxagent/common/event.py b/azurelinuxagent/common/event.py index 1f903a9fa..435a95e27 100644 --- a/azurelinuxagent/common/event.py +++ b/azurelinuxagent/common/event.py @@ -75,6 +75,7 @@ class WALAEventOperation: CGroupsCleanUp = "CGroupsCleanUp" CGroupsDisabled = "CGroupsDisabled" CGroupsInfo = "CGroupsInfo" + CloudInit = "CloudInit" CollectEventErrors = "CollectEventErrors" CollectEventUnicodeErrors = "CollectEventUnicodeErrors" ConfigurationChange = "ConfigurationChange" @@ -94,6 +95,7 @@ class WALAEventOperation: HealthCheck = "HealthCheck" HealthObservation = "HealthObservation" HeartBeat = "HeartBeat" + HostnamePublishing = "HostnamePublishing" HostPlugin = "HostPlugin" HostPluginHeartbeat = "HostPluginHeartbeat" HostPluginHeartbeatExtended = "HostPluginHeartbeatExtended" @@ -104,9 +106,12 @@ class WALAEventOperation: InitializeHostPlugin = "InitializeHostPlugin" Log = "Log" LogCollection = "LogCollection" + NoExec = "NoExec" OSInfo = "OSInfo" + OpenSsl = "OpenSsl" Partition = "Partition" PersistFirewallRules = "PersistFirewallRules" + ProvisionAfterExtensions = "ProvisionAfterExtensions" PluginSettingsVersionMismatch = "PluginSettingsVersionMismatch" InvalidExtensionConfig = "InvalidExtensionConfig" Provision = "Provision" @@ -364,10 +369,14 @@ def __init__(self): # Parameters from OS osutil = get_osutil() + keyword_name = { + "CpuArchitecture": osutil.get_vm_arch() + } self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.OSVersion, EventLogger._get_os_version())) self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.ExecutionMode, AGENT_EXECUTION_MODE)) self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.RAM, int(EventLogger._get_ram(osutil)))) self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.Processors, int(EventLogger._get_processors(osutil)))) + self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.KeywordName, json.dumps(keyword_name))) # Parameters from goal state self._common_parameters.append(TelemetryEventParam(CommonTelemetryEventSchema.TenantName, "TenantName_UNINITIALIZED")) @@ -595,8 +604,7 @@ def add_common_event_parameters(self, event, event_timestamp): TelemetryEventParam(CommonTelemetryEventSchema.OpcodeName, event_timestamp.strftime(logger.Logger.LogTimeFormatInUTC)), TelemetryEventParam(CommonTelemetryEventSchema.EventTid, threading.current_thread().ident), TelemetryEventParam(CommonTelemetryEventSchema.EventPid, os.getpid()), - TelemetryEventParam(CommonTelemetryEventSchema.TaskName, threading.current_thread().getName()), - TelemetryEventParam(CommonTelemetryEventSchema.KeywordName, '')] + TelemetryEventParam(CommonTelemetryEventSchema.TaskName, threading.current_thread().getName())] if event.eventId == TELEMETRY_EVENT_EVENT_ID and event.providerId == TELEMETRY_EVENT_PROVIDER_ID: # Currently only the GuestAgentExtensionEvents has these columns, the other tables dont have them so skipping diff --git a/azurelinuxagent/common/exception.py b/azurelinuxagent/common/exception.py index 048466232..42170db85 100644 --- a/azurelinuxagent/common/exception.py +++ b/azurelinuxagent/common/exception.py @@ -75,6 +75,24 @@ def __init__(self, msg=None, inner=None): super(AgentNetworkError, self).__init__(msg, inner) +class AgentUpdateError(AgentError): + """ + When agent failed to update. + """ + + def __init__(self, msg=None, inner=None): + super(AgentUpdateError, self).__init__(msg, inner) + + +class AgentFamilyMissingError(AgentError): + """ + When agent family is missing. + """ + + def __init__(self, msg=None, inner=None): + super(AgentFamilyMissingError, self).__init__(msg, inner) + + class CGroupsException(AgentError): """ Exception to classify any cgroups related issue. diff --git a/azurelinuxagent/common/osutil/default.py b/azurelinuxagent/common/osutil/default.py index 9fb97f157..c52146ca7 100644 --- a/azurelinuxagent/common/osutil/default.py +++ b/azurelinuxagent/common/osutil/default.py @@ -149,6 +149,14 @@ def get_systemd_unit_file_install_path(): def get_agent_bin_path(): return "/usr/sbin" + @staticmethod + def get_vm_arch(): + try: + return platform.machine() + except Exception as e: + logger.warn("Unable to determine cpu architecture: {0}", ustr(e)) + return "unknown" + def get_firewall_dropped_packets(self, dst_ip=None): # If a previous attempt failed, do not retry global _enable_firewall # pylint: disable=W0603 @@ -374,6 +382,9 @@ def get_userentry(username): except KeyError: return None + def get_root_username(self): + return "root" + def is_sys_user(self, username): """ Check whether use is a system user. @@ -1179,11 +1190,20 @@ def restart_if(self, ifname, retries=3, wait=5): else: logger.warn("exceeded restart retries") - def publish_hostname(self, hostname): + def check_and_recover_nic_state(self, ifname): + # TODO: This should be implemented for all distros where we reset the network during publishing hostname. Currently it is only implemented in RedhatOSUtil. + pass + + def publish_hostname(self, hostname, recover_nic=False): + """ + Publishes the provided hostname. + """ self.set_dhcp_hostname(hostname) self.set_hostname_record(hostname) ifname = self.get_if_name() self.restart_if(ifname) + if recover_nic: + self.check_and_recover_nic_state(ifname) def set_scsi_disks_timeout(self, timeout): for dev in os.listdir("/sys/block"): diff --git a/azurelinuxagent/common/osutil/factory.py b/azurelinuxagent/common/osutil/factory.py index 83123e3f5..e2f15afb5 100644 --- a/azurelinuxagent/common/osutil/factory.py +++ b/azurelinuxagent/common/osutil/factory.py @@ -66,15 +66,14 @@ def _get_osutil(distro_name, distro_code_name, distro_version, distro_full_name) return ClearLinuxUtil() if distro_name == "ubuntu": - if Version(distro_version) in [Version("12.04"), Version("12.10")]: + ubuntu_version = Version(distro_version) + if ubuntu_version in [Version("12.04"), Version("12.10")]: return Ubuntu12OSUtil() - if Version(distro_version) in [Version("14.04"), Version("14.10")]: + if ubuntu_version in [Version("14.04"), Version("14.10")]: return Ubuntu14OSUtil() - if Version(distro_version) in [Version('16.04'), Version('16.10'), Version('17.04')]: + if ubuntu_version in [Version('16.04'), Version('16.10'), Version('17.04')]: return Ubuntu16OSUtil() - if Version(distro_version) in [Version('18.04'), Version('18.10'), - Version('19.04'), Version('19.10'), - Version('20.04')]: + if Version('18.04') <= ubuntu_version <= Version('24.04'): return Ubuntu18OSUtil() if distro_full_name == "Snappy Ubuntu Core": return UbuntuSnappyOSUtil() diff --git a/azurelinuxagent/common/osutil/gaia.py b/azurelinuxagent/common/osutil/gaia.py index 8271163c2..849d5d1fa 100644 --- a/azurelinuxagent/common/osutil/gaia.py +++ b/azurelinuxagent/common/osutil/gaia.py @@ -202,7 +202,7 @@ def set_hostname(self, hostname): def set_dhcp_hostname(self, hostname): logger.warn('set_dhcp_hostname is ignored on GAiA') - def publish_hostname(self, hostname): + def publish_hostname(self, hostname, recover_nic=False): logger.warn('publish_hostname is ignored on GAiA') def del_account(self, username): diff --git a/azurelinuxagent/common/osutil/iosxe.py b/azurelinuxagent/common/osutil/iosxe.py index ace28f073..4ff2b9d97 100644 --- a/azurelinuxagent/common/osutil/iosxe.py +++ b/azurelinuxagent/common/osutil/iosxe.py @@ -58,12 +58,12 @@ def set_hostname(self, hostname): logger.warn("[{0}] failed with error: {1}, attempting fallback".format(' '.join(hostnamectl_cmd), ustr(e))) DefaultOSUtil.set_hostname(self, hostname) - def publish_hostname(self, hostname): + def publish_hostname(self, hostname, recover_nic=False): """ Restart NetworkManager first before publishing hostname """ shellutil.run("service NetworkManager restart") - super(IosxeOSUtil, self).publish_hostname(hostname) + super(IosxeOSUtil, self).publish_hostname(hostname, recover_nic) def register_agent_service(self): return shellutil.run("systemctl enable waagent", chk_err=False) diff --git a/azurelinuxagent/common/osutil/nsbsd.py b/azurelinuxagent/common/osutil/nsbsd.py index 016f506f0..00723aa0b 100644 --- a/azurelinuxagent/common/osutil/nsbsd.py +++ b/azurelinuxagent/common/osutil/nsbsd.py @@ -28,6 +28,7 @@ class NSBSDOSUtil(FreeBSDOSUtil): def __init__(self): super(NSBSDOSUtil, self).__init__() + self.agent_conf_file_path = '/etc/waagent.conf' if self.resolver is None: # NSBSD doesn't have a system resolver, configure a python one @@ -37,7 +38,7 @@ def __init__(self): except ImportError: raise OSUtilError("Python DNS resolver not available. Cannot proceed!") - self.resolver = dns.resolver.Resolver() + self.resolver = dns.resolver.Resolver(configure=False) servers = [] cmd = "getconf /usr/Firewall/ConfigFiles/dns Servers | tail -n +2" ret, output = shellutil.run_get_output(cmd) # pylint: disable=W0612 @@ -47,6 +48,7 @@ def __init__(self): server = server[:-1] # remove last '=' cmd = "grep '{}' /etc/hosts".format(server) + " | awk '{print $1}'" ret, ip = shellutil.run_get_output(cmd) + ip = ip.strip() # Remove new line char servers.append(ip) self.resolver.nameservers = servers dns.resolver.override_system_resolver(self.resolver) @@ -74,6 +76,9 @@ def conf_sshd(self, disable_password): logger.info("{0} SSH password-based authentication methods." .format("Disabled" if disable_password else "Enabled")) + def get_root_username(self): + return "admin" + def useradd(self, username, expiration=None, comment=None): """ Create user account with 'username' diff --git a/azurelinuxagent/common/osutil/redhat.py b/azurelinuxagent/common/osutil/redhat.py index 312dd1608..a9a103477 100644 --- a/azurelinuxagent/common/osutil/redhat.py +++ b/azurelinuxagent/common/osutil/redhat.py @@ -117,12 +117,109 @@ def set_hostname(self, hostname): logger.warn("[{0}] failed, attempting fallback".format(' '.join(hostnamectl_cmd))) DefaultOSUtil.set_hostname(self, hostname) - def publish_hostname(self, hostname): + def get_nm_controlled(self, ifname): + filepath = "/etc/sysconfig/network-scripts/ifcfg-{0}".format(ifname) + nm_controlled_cmd = ['grep', 'NM_CONTROLLED=', filepath] + try: + result = shellutil.run_command(nm_controlled_cmd, log_error=False).rstrip() + + if result and len(result.split('=')) > 1: + # Remove trailing white space and ' or " characters + value = result.split('=')[1].replace("'", '').replace('"', '').rstrip() + if value == "n" or value == "no": + return False + except shellutil.CommandError as e: + # Command might fail because NM_CONTROLLED value is not in interface config file (exit code 1). + # Log warning for any other exit code. + # NM_CONTROLLED=y by default if not specified. + if e.returncode != 1: + logger.warn("[{0}] failed: {1}.\nAgent will continue to publish hostname without NetworkManager restart".format(' '.join(nm_controlled_cmd), e)) + except Exception as e: + logger.warn("Unexpected error while retrieving value of NM_CONTROLLED in {0}: {1}.\nAgent will continue to publish hostname without NetworkManager restart".format(filepath, e)) + + return True + + def get_nic_operational_and_general_states(self, ifname): + """ + Checks the contents of /sys/class/net/{ifname}/operstate and the results of 'nmcli -g general.state device show {ifname}' to determine the state of the provided interface. + Raises an exception if the network interface state cannot be determined. + """ + filepath = "/sys/class/net/{0}/operstate".format(ifname) + nic_general_state_cmd = ['nmcli', '-g', 'general.state', 'device', 'show', ifname] + if not os.path.isfile(filepath): + msg = "Unable to determine primary network interface {0} state, because state file does not exist: {1}".format(ifname, filepath) + logger.warn(msg) + raise Exception(msg) + + try: + nic_oper_state = fileutil.read_file(filepath).rstrip().lower() + nic_general_state = shellutil.run_command(nic_general_state_cmd, log_error=True).rstrip().lower() + if nic_oper_state != "up": + logger.warn("The primary network interface {0} operational state is '{1}'.".format(ifname, nic_oper_state)) + else: + logger.info("The primary network interface {0} operational state is '{1}'.".format(ifname, nic_oper_state)) + if nic_general_state != "100 (connected)": + logger.warn("The primary network interface {0} general state is '{1}'.".format(ifname, nic_general_state)) + else: + logger.info("The primary network interface {0} general state is '{1}'.".format(ifname, nic_general_state)) + return nic_oper_state, nic_general_state + except Exception as e: + msg = "Unexpected error while determining the primary network interface state: {0}".format(e) + logger.warn(msg) + raise Exception(msg) + + def check_and_recover_nic_state(self, ifname): """ - Restart NetworkManager first before publishing hostname + Checks if the provided network interface is in an 'up' state. If the network interface is in a 'down' state, + attempt to recover the interface by restarting the Network Manager service. + + Raises an exception if an attempt to bring the interface into an 'up' state fails, or if the state + of the network interface cannot be determined. """ + nic_operstate, nic_general_state = self.get_nic_operational_and_general_states(ifname) + if nic_operstate == "down" or "disconnected" in nic_general_state: + logger.info("Restarting the Network Manager service to recover network interface {0}".format(ifname)) + self.restart_network_manager() + # Interface does not come up immediately after NetworkManager restart. Wait 5 seconds before checking + # network interface state. + time.sleep(5) + nic_operstate, nic_general_state = self.get_nic_operational_and_general_states(ifname) + # It is possible for network interface to be in an unknown or unmanaged state. Log warning if state is not + # down, disconnected, up, or connected + if nic_operstate != "up" or nic_general_state != "100 (connected)": + msg = "Network Manager restart failed to bring network interface {0} into 'up' and 'connected' state".format(ifname) + logger.warn(msg) + raise Exception(msg) + else: + logger.info("Network Manager restart successfully brought the network interface {0} into 'up' and 'connected' state".format(ifname)) + elif nic_operstate != "up" or nic_general_state != "100 (connected)": + # We already logged a warning with the network interface state in get_nic_operstate(). Raise an exception + # for the env thread to send to telemetry. + raise Exception("The primary network interface {0} operational state is '{1}' and general state is '{2}'.".format(ifname, nic_operstate, nic_general_state)) + + def restart_network_manager(self): shellutil.run("service NetworkManager restart") - super(RedhatOSUtil, self).publish_hostname(hostname) + + def publish_hostname(self, hostname, recover_nic=False): + """ + Restart NetworkManager first before publishing hostname, only if the network interface is not controlled by the + NetworkManager service (as determined by NM_CONTROLLED=n in the interface configuration). If the NetworkManager + service is restarted before the agent publishes the hostname, and NM_controlled=y, a race condition may happen + between the NetworkManager service and the Guest Agent making changes to the network interface configuration + simultaneously. + + Note: check_and_recover_nic_state(ifname) raises an Exception if an attempt to recover the network interface + fails, or if the network interface state cannot be determined. Callers should handle this exception by sending + an event to telemetry. + + TODO: Improve failure reporting and add success reporting to telemetry for hostname changes. Right now we are only reporting failures to telemetry by raising an Exception in publish_hostname for the calling thread to handle by reporting the failure to telemetry. + """ + ifname = self.get_if_name() + nm_controlled = self.get_nm_controlled(ifname) + if not nm_controlled: + self.restart_network_manager() + # TODO: Current recover logic is only effective when the NetworkManager manages the network interface. Update the recover logic so it is effective even when NM_CONTROLLED=n + super(RedhatOSUtil, self).publish_hostname(hostname, recover_nic and nm_controlled) def register_agent_service(self): return shellutil.run("systemctl enable {0}".format(self.service_name), chk_err=False) @@ -164,3 +261,14 @@ def restart_if(self, ifname, retries=3, wait=5): time.sleep(wait) else: logger.warn("exceeded restart retries") + + def check_and_recover_nic_state(self, ifname): + # TODO: Implement and test a way to recover the network interface for RedhatOSModernUtil + pass + + def publish_hostname(self, hostname, recover_nic=False): + # RedhatOSUtil was updated to conditionally run NetworkManager restart in response to a race condition between + # NetworkManager restart and the agent restarting the network interface during publish_hostname. Keeping the + # NetworkManager restart in RedhatOSModernUtil because the issue was not reproduced on these versions. + shellutil.run("service NetworkManager restart") + DefaultOSUtil.publish_hostname(self, hostname) diff --git a/azurelinuxagent/common/osutil/suse.py b/azurelinuxagent/common/osutil/suse.py index 52fd3ce56..ced0113dc 100644 --- a/azurelinuxagent/common/osutil/suse.py +++ b/azurelinuxagent/common/osutil/suse.py @@ -72,7 +72,7 @@ def __init__(self): super(SUSEOSUtil, self).__init__() self.dhclient_name = 'wickedd-dhcp4' - def publish_hostname(self, hostname): + def publish_hostname(self, hostname, recover_nic=False): self.set_dhcp_hostname(hostname) self.set_hostname_record(hostname) ifname = self.get_if_name() diff --git a/azurelinuxagent/common/osutil/ubuntu.py b/azurelinuxagent/common/osutil/ubuntu.py index 5a21511c9..2959464d0 100644 --- a/azurelinuxagent/common/osutil/ubuntu.py +++ b/azurelinuxagent/common/osutil/ubuntu.py @@ -16,6 +16,8 @@ # Requires Python 2.6+ and Openssl 1.0+ # +import glob +import textwrap import time import azurelinuxagent.common.logger as logger @@ -88,7 +90,7 @@ def unregister_agent_service(self): class Ubuntu18OSUtil(Ubuntu16OSUtil): """ - Ubuntu 18.04, 18.10, 19.04, 19.10, 20.04 + Ubuntu >=18.04 and <=24.04 """ def __init__(self): super(Ubuntu18OSUtil, self).__init__() @@ -132,6 +134,30 @@ def start_agent_service(self): def stop_agent_service(self): return shellutil.run("systemctl stop {0}".format(self.service_name), chk_err=False) + def get_dhcp_lease_endpoint(self): + pathglob = "/run/systemd/netif/leases/*" + logger.info("looking for leases in path [{0}]".format(pathglob)) + endpoint = None + for lease_file in glob.glob(pathglob): + try: + with open(lease_file) as f: + lease = f.read() + for line in lease.splitlines(): + if line.startswith("OPTION_245"): + option_245 = line.split("=")[1] + options = [int(i, 16) for i in textwrap.wrap(option_245, 2)] + endpoint = "{0}.{1}.{2}.{3}".format(*options) + logger.info("found endpoint [{0}]".format(endpoint)) + except Exception as e: + logger.info( + "Failed to parse {0}: {1}".format(lease_file, str(e)) + ) + if endpoint is not None: + logger.info("cached endpoint found [{0}]".format(endpoint)) + else: + logger.info("cached endpoint not found") + return endpoint + class UbuntuOSUtil(Ubuntu16OSUtil): def __init__(self): # pylint: disable=W0235 diff --git a/azurelinuxagent/common/protocol/extensions_goal_state_from_extensions_config.py b/azurelinuxagent/common/protocol/extensions_goal_state_from_extensions_config.py index a8bfa2505..2b98819a2 100644 --- a/azurelinuxagent/common/protocol/extensions_goal_state_from_extensions_config.py +++ b/azurelinuxagent/common/protocol/extensions_goal_state_from_extensions_config.py @@ -61,9 +61,16 @@ def _parse_extensions_config(self, xml_text, wire_client): for ga_family in ga_families: name = findtext(ga_family, "Name") version = findtext(ga_family, "Version") + is_version_from_rsm = findtext(ga_family, "IsVersionFromRSM") + is_vm_enabled_for_rsm_upgrades = findtext(ga_family, "IsVMEnabledForRSMUpgrades") uris_list = find(ga_family, "Uris") uris = findall(uris_list, "Uri") - family = VMAgentFamily(name, version) + family = VMAgentFamily(name) + family.version = version + if is_version_from_rsm is not None: # checking None because converting string to lowercase + family.is_version_from_rsm = is_version_from_rsm.lower() == "true" + if is_vm_enabled_for_rsm_upgrades is not None: # checking None because converting string to lowercase + family.is_vm_enabled_for_rsm_upgrades = is_vm_enabled_for_rsm_upgrades.lower() == "true" for uri in uris: family.uris.append(gettext(uri)) self._agent_families.append(family) diff --git a/azurelinuxagent/common/protocol/extensions_goal_state_from_vm_settings.py b/azurelinuxagent/common/protocol/extensions_goal_state_from_vm_settings.py index f6496bfd3..041ddedcd 100644 --- a/azurelinuxagent/common/protocol/extensions_goal_state_from_vm_settings.py +++ b/azurelinuxagent/common/protocol/extensions_goal_state_from_vm_settings.py @@ -22,6 +22,7 @@ from azurelinuxagent.common import logger from azurelinuxagent.common.AgentGlobals import AgentGlobals +from azurelinuxagent.common.event import WALAEventOperation, add_event from azurelinuxagent.common.future import ustr from azurelinuxagent.common.protocol.extensions_goal_state import ExtensionsGoalState, GoalStateChannel, VmSettingsParseError from azurelinuxagent.common.protocol.restapi import VMAgentFamily, Extension, ExtensionRequestedState, ExtensionSettings @@ -242,6 +243,8 @@ def _parse_agent_manifests(self, vm_settings): # { # "name": "Prod", # "version": "9.9.9.9", + # "isVersionFromRSM": true, + # "isVMEnabledForRSMUpgrades": true, # "uris": [ # "https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml", # "https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml" @@ -266,10 +269,15 @@ def _parse_agent_manifests(self, vm_settings): for family in families: name = family["name"] version = family.get("version") + is_version_from_rsm = family.get("isVersionFromRSM") + is_vm_enabled_for_rsm_upgrades = family.get("isVMEnabledForRSMUpgrades") uris = family.get("uris") if uris is None: uris = [] - agent_family = VMAgentFamily(name, version) + agent_family = VMAgentFamily(name) + agent_family.version = version + agent_family.is_version_from_rsm = is_version_from_rsm + agent_family.is_vm_enabled_for_rsm_upgrades = is_vm_enabled_for_rsm_upgrades for u in uris: agent_family.uris.append(u) self._agent_families.append(agent_family) @@ -492,11 +500,28 @@ def _parse_dependency_level(depends_on, extension): length = len(depends_on) if length > 1: raise Exception('dependsOn should be an array with exactly one item for single-config extensions ({0}) (got {1})'.format(extension.name, depends_on)) - elif length == 0: + if length == 0: logger.warn('dependsOn is an empty array for extension {0}; setting the dependency level to 0'.format(extension.name)) - extension.settings[0].dependencyLevel = 0 + dependency_level = 0 else: - extension.settings[0].dependencyLevel = depends_on[0]['dependencyLevel'] + dependency_level = depends_on[0]['dependencyLevel'] + depends_on_extension = depends_on[0].get('dependsOnExtension') + if depends_on_extension is None: + # TODO: Consider removing this check and its telemetry after a few releases if we do not receive any telemetry indicating + # that dependsOnExtension is actually missing from the vmSettings + message = 'Missing dependsOnExtension on extension {0}'.format(extension.name) + logger.warn(message) + add_event(WALAEventOperation.ProvisionAfterExtensions, message=message, is_success=False, log_event=False) + else: + message = '{0} depends on {1}'.format(extension.name, depends_on_extension) + logger.info(message) + add_event(WALAEventOperation.ProvisionAfterExtensions, message=message, is_success=True, log_event=False) + if len(extension.settings) == 0: + message = 'Extension {0} does not have any settings. Will ignore dependency (dependency level: {1})'.format(extension.name, dependency_level) + logger.warn(message) + add_event(WALAEventOperation.ProvisionAfterExtensions, message=message, is_success=False, log_event=False) + else: + extension.settings[0].dependencyLevel = dependency_level else: # multi-config settings_by_name = {} diff --git a/azurelinuxagent/common/protocol/goal_state.py b/azurelinuxagent/common/protocol/goal_state.py index 6b2a0c2cf..2eb89c1eb 100644 --- a/azurelinuxagent/common/protocol/goal_state.py +++ b/azurelinuxagent/common/protocol/goal_state.py @@ -70,7 +70,7 @@ def __init__(self, msg, inner=None): class GoalState(object): - def __init__(self, wire_client, goal_state_properties=GoalStateProperties.All, silent=False): + def __init__(self, wire_client, goal_state_properties=GoalStateProperties.All, silent=False, save_to_history=False): """ Fetches the goal state using the given wire client. @@ -84,6 +84,7 @@ def __init__(self, wire_client, goal_state_properties=GoalStateProperties.All, s try: self._wire_client = wire_client self._history = None + self._save_to_history = save_to_history self._extensions_goal_state = None # populated from vmSettings or extensionsConfig self._goal_state_properties = goal_state_properties self.logger = logger.Logger(logger.DEFAULT_LOGGER) @@ -185,8 +186,9 @@ def fetch_extension_manifest(self, extension_name, uris): def _fetch_manifest(self, manifest_type, name, uris): try: is_fast_track = self.extensions_goal_state.source == GoalStateSource.FastTrack - xml_text = self._wire_client.fetch_manifest(uris, use_verify_header=is_fast_track) - self._history.save_manifest(name, xml_text) + xml_text = self._wire_client.fetch_manifest(manifest_type, uris, use_verify_header=is_fast_track) + if self._save_to_history: + self._history.save_manifest(name, xml_text) return ExtensionManifest(xml_text) except Exception as e: raise ProtocolError("Failed to retrieve {0} manifest. Error: {1}".format(manifest_type, ustr(e))) @@ -208,9 +210,15 @@ def update(self, silent=False): try: self._update(force_update=False) except GoalStateInconsistentError as e: - self.logger.warn("Detected an inconsistency in the goal state: {0}", ustr(e)) + message = "Detected an inconsistency in the goal state: {0}".format(ustr(e)) + self.logger.warn(message) + add_event(op=WALAEventOperation.GoalState, is_success=False, message=message) + self._update(force_update=True) - self.logger.info("The goal state is consistent") + + message = "The goal state is consistent" + self.logger.info(message) + add_event(op=WALAEventOperation.GoalState, message=message) def _update(self, force_update): # @@ -219,7 +227,9 @@ def _update(self, force_update): timestamp = datetime.datetime.utcnow() if force_update: - self.logger.info("Refreshing goal state and vmSettings") + message = "Refreshing goal state and vmSettings" + self.logger.info(message) + add_event(op=WALAEventOperation.GoalState, message=message) incarnation, xml_text, xml_doc = GoalState._fetch_goal_state(self._wire_client) goal_state_updated = force_update or incarnation != self._incarnation @@ -255,11 +265,12 @@ def _update(self, force_update): # Start a new history subdirectory and capture the updated goal state tag = "{0}".format(incarnation) if vm_settings is None else "{0}-{1}".format(incarnation, vm_settings.etag) - self._history = GoalStateHistory(timestamp, tag) - if goal_state_updated: - self._history.save_goal_state(xml_text) - if vm_settings_updated: - self._history.save_vm_settings(vm_settings.get_redacted_text()) + if self._save_to_history: + self._history = GoalStateHistory(timestamp, tag) + if goal_state_updated: + self._history.save_goal_state(xml_text) + if vm_settings_updated: + self._history.save_vm_settings(vm_settings.get_redacted_text()) # # Continue fetching the rest of the goal state @@ -290,13 +301,11 @@ def _update(self, force_update): # Track goal state comes after that, the extensions will need the new certificate. The Agent needs to refresh the goal state in that # case, to ensure it fetches the new certificate. # - if self._extensions_goal_state.source == GoalStateSource.FastTrack: + if self._extensions_goal_state.source == GoalStateSource.FastTrack and self._goal_state_properties & GoalStateProperties.Certificates: self._check_certificates() + self._check_and_download_missing_certs_on_disk() def _check_certificates(self): - # Re-download certificates in case they have been removed from disk since last download - if self._goal_state_properties & GoalStateProperties.Certificates and self._certs_uri is not None: - self._download_certificates(self._certs_uri) # Check that certificates needed by extensions are in goal state certs.summary for extension in self.extensions_goal_state.extensions: for settings in extension.settings: @@ -318,15 +327,43 @@ def _download_certificates(self, certs_uri): if len(certs.warnings) > 0: self.logger.warn(certs.warnings) add_event(op=WALAEventOperation.GoalState, message=certs.warnings) - self._history.save_certificates(json.dumps(certs.summary)) + if self._save_to_history: + self._history.save_certificates(json.dumps(certs.summary)) return certs + def _check_and_download_missing_certs_on_disk(self): + # Re-download certificates if any have been removed from disk since last download + if self._certs_uri is not None: + certificates = self.certs.summary + certs_missing_from_disk = False + + for c in certificates: + cert_path = os.path.join(conf.get_lib_dir(), c['thumbprint'] + '.crt') + if not os.path.isfile(cert_path): + certs_missing_from_disk = True + message = "Certificate required by goal state is not on disk: {0}".format(cert_path) + self.logger.info(message) + add_event(op=WALAEventOperation.GoalState, message=message) + if certs_missing_from_disk: + # Try to re-download certs. Sometimes download may fail if certs_uri is outdated/contains wrong + # container id (for example, when the VM is moved to a new container after resuming from + # hibernation). If download fails we should report and continue with goal state processing, as some + # extensions in the goal state may succeed. + try: + self._download_certificates(self._certs_uri) + except Exception as e: + message = "Unable to download certificates. Goal state processing will continue, some " \ + "extensions requiring certificates may fail. Error: {0}".format(ustr(e)) + self.logger.warn(message) + add_event(op=WALAEventOperation.GoalState, is_success=False, message=message) + def _restore_wire_server_goal_state(self, incarnation, xml_text, xml_doc, vm_settings_support_stopped_error): msg = 'The HGAP stopped supporting vmSettings; will fetched the goal state from the WireServer.' self.logger.info(msg) add_event(op=WALAEventOperation.VmSettings, message=msg) - self._history = GoalStateHistory(datetime.datetime.utcnow(), incarnation) - self._history.save_goal_state(xml_text) + if self._save_to_history: + self._history = GoalStateHistory(datetime.datetime.utcnow(), incarnation) + self._history.save_goal_state(xml_text) self._extensions_goal_state = self._fetch_full_wire_server_goal_state(incarnation, xml_doc) if self._extensions_goal_state.created_on_timestamp < vm_settings_support_stopped_error.timestamp: self._extensions_goal_state.is_outdated = True @@ -336,7 +373,8 @@ def _restore_wire_server_goal_state(self, incarnation, xml_text, xml_doc, vm_set add_event(op=WALAEventOperation.VmSettings, message=msg) def save_to_history(self, data, file_name): - self._history.save(data, file_name) + if self._save_to_history: + self._history.save(data, file_name) @staticmethod def _fetch_goal_state(wire_client): @@ -431,21 +469,24 @@ def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc): else: xml_text = self._wire_client.fetch_config(extensions_config_uri, self._wire_client.get_header()) extensions_config = ExtensionsGoalStateFactory.create_from_extensions_config(incarnation, xml_text, self._wire_client) - self._history.save_extensions_config(extensions_config.get_redacted_text()) + if self._save_to_history: + self._history.save_extensions_config(extensions_config.get_redacted_text()) hosting_env = None if GoalStateProperties.HostingEnv & self._goal_state_properties: hosting_env_uri = findtext(xml_doc, "HostingEnvironmentConfig") xml_text = self._wire_client.fetch_config(hosting_env_uri, self._wire_client.get_header()) hosting_env = HostingEnv(xml_text) - self._history.save_hosting_env(xml_text) + if self._save_to_history: + self._history.save_hosting_env(xml_text) shared_config = None if GoalStateProperties.SharedConfig & self._goal_state_properties: shared_conf_uri = findtext(xml_doc, "SharedConfig") xml_text = self._wire_client.fetch_config(shared_conf_uri, self._wire_client.get_header()) shared_config = SharedConfig(xml_text) - self._history.save_shared_conf(xml_text) + if self._save_to_history: + self._history.save_shared_conf(xml_text) # SharedConfig.xml is used by other components (Azsec and Singularity/HPC Infiniband), so save it to the agent's root directory as well shared_config_file = os.path.join(conf.get_lib_dir(), SHARED_CONF_FILE_NAME) try: @@ -464,7 +505,8 @@ def _fetch_full_wire_server_goal_state(self, incarnation, xml_doc): if remote_access_uri is not None: xml_text = self._wire_client.fetch_config(remote_access_uri, self._wire_client.get_header_for_cert()) remote_access = RemoteAccess(xml_text) - self._history.save_remote_access(xml_text) + if self._save_to_history: + self._history.save_remote_access(xml_text) self._incarnation = incarnation self._role_instance_id = role_instance_id @@ -547,8 +589,6 @@ def __init__(self, xml_text, my_logger): # The parsing process use public key to match prv and crt. buf = [] - begin_crt = False # pylint: disable=W0612 - begin_prv = False # pylint: disable=W0612 prvs = {} thumbprints = {} index = 0 @@ -556,17 +596,12 @@ def __init__(self, xml_text, my_logger): with open(pem_file) as pem: for line in pem.readlines(): buf.append(line) - if re.match(r'[-]+BEGIN.*KEY[-]+', line): - begin_prv = True - elif re.match(r'[-]+BEGIN.*CERTIFICATE[-]+', line): - begin_crt = True - elif re.match(r'[-]+END.*KEY[-]+', line): + if re.match(r'[-]+END.*KEY[-]+', line): tmp_file = Certificates._write_to_tmp_file(index, 'prv', buf) pub = cryptutil.get_pubkey_from_prv(tmp_file) prvs[pub] = tmp_file buf = [] index += 1 - begin_prv = False elif re.match(r'[-]+END.*CERTIFICATE[-]+', line): tmp_file = Certificates._write_to_tmp_file(index, 'crt', buf) pub = cryptutil.get_pubkey_from_crt(tmp_file) @@ -581,7 +616,6 @@ def __init__(self, xml_text, my_logger): os.rename(tmp_file, os.path.join(conf.get_lib_dir(), crt)) buf = [] index += 1 - begin_crt = False # Rename prv key with thumbprint as the file name for pubkey in prvs: diff --git a/azurelinuxagent/common/protocol/restapi.py b/azurelinuxagent/common/protocol/restapi.py index 725e2d7bb..35b40cf13 100644 --- a/azurelinuxagent/common/protocol/restapi.py +++ b/azurelinuxagent/common/protocol/restapi.py @@ -22,7 +22,6 @@ from azurelinuxagent.common.datacontract import DataContract, DataContractList from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.utils.flexible_version import FlexibleVersion from azurelinuxagent.common.utils.textutil import getattrib from azurelinuxagent.common.version import DISTRO_VERSION, DISTRO_NAME, CURRENT_VERSION @@ -69,23 +68,16 @@ def __init__(self): class VMAgentFamily(object): - def __init__(self, name, version=None): + def __init__(self, name): self.name = name - # This is the Requested version as specified by the Goal State, it defaults to 0.0.0.0 if not specified in GS - self.requested_version_string = VERSION_0 if version is None else version - self.uris = [] - - @property - def requested_version(self): - return FlexibleVersion(self.requested_version_string) + # Two-state: None, string. Set to None if version not specified in the GS + self.version = None + # Tri-state: None, True, False. Set to None if this property not specified in the GS. + self.is_version_from_rsm = None + # Tri-state: None, True, False. Set to None if this property not specified in the GS. + self.is_vm_enabled_for_rsm_upgrades = None - @property - def is_requested_version_specified(self): - """ - If we don't get any requested_version from the GS, we default it to 0.0.0.0. - This property identifies if a requested Version was passed in the GS or not. - """ - return self.requested_version > FlexibleVersion(VERSION_0) + self.uris = [] def __repr__(self): return self.__str__() diff --git a/azurelinuxagent/common/protocol/util.py b/azurelinuxagent/common/protocol/util.py index 7d7f90168..b90e9542c 100644 --- a/azurelinuxagent/common/protocol/util.py +++ b/azurelinuxagent/common/protocol/util.py @@ -188,7 +188,7 @@ def _clear_wireserver_endpoint(self): return logger.error("Failed to clear wiresever endpoint: {0}", e) - def _detect_protocol(self, init_goal_state=True): + def _detect_protocol(self, save_to_history, init_goal_state=True): """ Probe protocol endpoints in turn. """ @@ -217,7 +217,7 @@ def _detect_protocol(self, init_goal_state=True): try: protocol = WireProtocol(endpoint) - protocol.detect(init_goal_state=init_goal_state) + protocol.detect(init_goal_state=init_goal_state, save_to_history=save_to_history) self._set_wireserver_endpoint(endpoint) return protocol @@ -268,7 +268,7 @@ def clear_protocol(self): finally: self._lock.release() - def get_protocol(self, init_goal_state=True): + def get_protocol(self, init_goal_state=True, save_to_history=False): """ Detect protocol by endpoint. :returns: protocol instance @@ -296,7 +296,7 @@ def get_protocol(self, init_goal_state=True): logger.info("Detect protocol endpoint") - protocol = self._detect_protocol(init_goal_state=init_goal_state) + protocol = self._detect_protocol(save_to_history=save_to_history, init_goal_state=init_goal_state) IOErrorCounter.set_protocol_endpoint(endpoint=protocol.get_endpoint()) self._save_protocol(WIRE_PROTOCOL_NAME) diff --git a/azurelinuxagent/common/protocol/wire.py b/azurelinuxagent/common/protocol/wire.py index 38a3e0621..c93624cb1 100644 --- a/azurelinuxagent/common/protocol/wire.py +++ b/azurelinuxagent/common/protocol/wire.py @@ -73,7 +73,7 @@ def __init__(self, endpoint): raise ProtocolError("WireProtocol endpoint is None") self.client = WireClient(endpoint) - def detect(self, init_goal_state=True): + def detect(self, init_goal_state=True, save_to_history=False): self.client.check_wire_protocol_version() trans_prv_file = os.path.join(conf.get_lib_dir(), @@ -86,7 +86,7 @@ def detect(self, init_goal_state=True): # Initialize the goal state, including all the inner properties if init_goal_state: logger.info('Initializing goal state during protocol detection') - self.client.reset_goal_state() + self.client.reset_goal_state(save_to_history=save_to_history) def update_host_plugin_from_goal_state(self): self.client.update_host_plugin_from_goal_state() @@ -582,8 +582,8 @@ def call_storage_service(http_req, *args, **kwargs): def fetch_artifacts_profile_blob(self, uri): return self._fetch_content("artifacts profile blob", [uri], use_verify_header=False)[1] # _fetch_content returns a (uri, content) tuple - def fetch_manifest(self, uris, use_verify_header): - uri, content = self._fetch_content("manifest", uris, use_verify_header=use_verify_header) + def fetch_manifest(self, manifest_type, uris, use_verify_header): + uri, content = self._fetch_content("{0} manifest".format(manifest_type), uris, use_verify_header=use_verify_header) self.get_host_plugin().update_manifest_uri(uri) return content @@ -777,13 +777,13 @@ def update_host_plugin(self, container_id, role_config_name): self._host_plugin.update_container_id(container_id) self._host_plugin.update_role_config_name(role_config_name) - def update_goal_state(self, silent=False): + def update_goal_state(self, silent=False, save_to_history=False): """ Updates the goal state if the incarnation or etag changed """ try: if self._goal_state is None: - self._goal_state = GoalState(self, silent=silent) + self._goal_state = GoalState(self, silent=silent, save_to_history=save_to_history) else: self._goal_state.update(silent=silent) @@ -792,7 +792,7 @@ def update_goal_state(self, silent=False): except Exception as exception: raise ProtocolError("Error fetching goal state: {0}".format(ustr(exception))) - def reset_goal_state(self, goal_state_properties=GoalStateProperties.All, silent=False): + def reset_goal_state(self, goal_state_properties=GoalStateProperties.All, silent=False, save_to_history=False): """ Resets the goal state """ @@ -800,7 +800,7 @@ def reset_goal_state(self, goal_state_properties=GoalStateProperties.All, silent if not silent: logger.info("Forcing an update of the goal state.") - self._goal_state = GoalState(self, goal_state_properties=goal_state_properties, silent=silent) + self._goal_state = GoalState(self, goal_state_properties=goal_state_properties, silent=silent, save_to_history=save_to_history) except ProtocolError: raise @@ -936,7 +936,7 @@ def upload_status_blob(self): if extensions_goal_state.status_upload_blob is None: # the status upload blob is in ExtensionsConfig so force a full goal state refresh - self.reset_goal_state(silent=True) + self.reset_goal_state(silent=True, save_to_history=True) extensions_goal_state = self.get_goal_state().extensions_goal_state if extensions_goal_state.status_upload_blob is None: diff --git a/azurelinuxagent/common/utils/cryptutil.py b/azurelinuxagent/common/utils/cryptutil.py index 5514cb505..b7c942274 100644 --- a/azurelinuxagent/common/utils/cryptutil.py +++ b/azurelinuxagent/common/utils/cryptutil.py @@ -53,10 +53,21 @@ def gen_transport_cert(self, prv_file, crt_file): def get_pubkey_from_prv(self, file_name): if not os.path.exists(file_name): raise IOError(errno.ENOENT, "File not found", file_name) - else: - cmd = [self.openssl_cmd, "rsa", "-in", file_name, "-pubout"] - pub = shellutil.run_command(cmd, log_error=True) - return pub + + # OpenSSL's pkey command may not be available on older versions so try 'rsa' first. + try: + command = [self.openssl_cmd, "rsa", "-in", file_name, "-pubout"] + return shellutil.run_command(command, log_error=False) + except shellutil.CommandError as error: + if not ("Not an RSA key" in error.stderr or "expecting an rsa key" in error.stderr): + logger.error( + "Command: [{0}], return code: [{1}], stdout: [{2}] stderr: [{3}]", + " ".join(command), + error.returncode, + error.stdout, + error.stderr) + raise + return shellutil.run_command([self.openssl_cmd, "pkey", "-in", file_name, "-pubout"], log_error=True) def get_pubkey_from_crt(self, file_name): if not os.path.exists(file_name): diff --git a/azurelinuxagent/common/utils/fileutil.py b/azurelinuxagent/common/utils/fileutil.py index 03090a427..94eb5cf1b 100644 --- a/azurelinuxagent/common/utils/fileutil.py +++ b/azurelinuxagent/common/utils/fileutil.py @@ -99,13 +99,15 @@ def get_line_startingwith(prefix, filepath): return None -def mkdir(dirpath, mode=None, owner=None): +def mkdir(dirpath, mode=None, owner=None, reset_mode_and_owner=True): if not os.path.isdir(dirpath): os.makedirs(dirpath) - if mode is not None: - chmod(dirpath, mode) - if owner is not None: - chowner(dirpath, owner) + reset_mode_and_owner = True # force setting the mode and owner + if reset_mode_and_owner: + if mode is not None: + chmod(dirpath, mode) + if owner is not None: + chowner(dirpath, owner) def chowner(path, owner): diff --git a/azurelinuxagent/common/utils/shellutil.py b/azurelinuxagent/common/utils/shellutil.py index 50fd4592f..d2bfd787e 100644 --- a/azurelinuxagent/common/utils/shellutil.py +++ b/azurelinuxagent/common/utils/shellutil.py @@ -18,9 +18,17 @@ # import os import subprocess +import sys import tempfile import threading +if sys.version_info[0] == 2: + # TimeoutExpired was introduced on Python 3; define a dummy class for Python 2 + class TimeoutExpired(Exception): + pass +else: + from subprocess import TimeoutExpired + import azurelinuxagent.common.logger as logger from azurelinuxagent.common.future import ustr @@ -206,7 +214,7 @@ def __run_command(command_action, command, log_error, encode_output): # W0622: Redefining built-in 'input' -- disabled: the parameter name mimics subprocess.communicate() -def run_command(command, input=None, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, log_error=False, encode_input=True, encode_output=True, track_process=True): # pylint:disable=W0622 +def run_command(command, input=None, stdin=None, stdout=subprocess.PIPE, stderr=subprocess.PIPE, log_error=False, encode_input=True, encode_output=True, track_process=True, timeout=None): # pylint:disable=W0622 """ Executes the given command and returns its stdout. @@ -227,7 +235,9 @@ def run_command(command, input=None, stdin=None, stdout=subprocess.PIPE, stderr= value for these parameters is anything other than the default (subprocess.PIPE)), then the corresponding values returned by this function or the CommandError exception will be empty strings. - Note: This is the preferred method to execute shell commands over `azurelinuxagent.common.utils.shellutil.run` function. + NOTE: The 'timeout' parameter is ignored on Python 2 + + NOTE: This is the preferred method to execute shell commands over `azurelinuxagent.common.utils.shellutil.run` function. """ if input is not None and stdin is not None: raise ValueError("The input and stdin arguments are mutually exclusive") @@ -246,7 +256,30 @@ def command_action(): else: process = subprocess.Popen(command, stdin=popen_stdin, stdout=stdout, stderr=stderr, shell=False) - command_stdout, command_stderr = process.communicate(input=communicate_input) + try: + if sys.version_info[0] == 2: # communicate() doesn't support timeout on Python 2 + command_stdout, command_stderr = process.communicate(input=communicate_input) + else: + command_stdout, command_stderr = process.communicate(input=communicate_input, timeout=timeout) + except TimeoutExpired: + if log_error: + logger.error(u"Command [{0}] timed out", __format_command(command)) + + command_stdout, command_stderr = '', '' + + try: + process.kill() + # try to get any output from the command, but ignore any errors if we can't + try: + command_stdout, command_stderr = process.communicate() + # W0702: No exception type(s) specified (bare-except) + except: # pylint: disable=W0702 + pass + except Exception as exception: + if log_error: + logger.error(u"Can't terminate timed out process: {0}", ustr(exception)) + raise CommandError(command=__format_command(command), return_code=-1, stdout=command_stdout, stderr="command timeout\n{0}".format(command_stderr)) + if track_process: _on_command_completed(process.pid) diff --git a/azurelinuxagent/common/version.py b/azurelinuxagent/common/version.py index 8e12eff5f..02dc47430 100644 --- a/azurelinuxagent/common/version.py +++ b/azurelinuxagent/common/version.py @@ -209,7 +209,7 @@ def has_logrotate(): # # When doing a release, be sure to use the actual agent version. Current agent version: 2.4.0.0 # -AGENT_VERSION = '2.9.1.1' +AGENT_VERSION = '2.10.0.8' AGENT_LONG_VERSION = "{0}-{1}".format(AGENT_NAME, AGENT_VERSION) AGENT_DESCRIPTION = """ The Azure Linux Agent supports the provisioning and running of Linux diff --git a/azurelinuxagent/daemon/main.py b/azurelinuxagent/daemon/main.py index 1eb58ec99..342daf4ac 100644 --- a/azurelinuxagent/daemon/main.py +++ b/azurelinuxagent/daemon/main.py @@ -30,7 +30,7 @@ from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.protocol.goal_state import GoalState, GoalStateProperties from azurelinuxagent.common.protocol.util import get_protocol_util -from azurelinuxagent.common.rdma import setup_rdma_device +from azurelinuxagent.pa.rdma.rdma import setup_rdma_device from azurelinuxagent.common.utils import textutil from azurelinuxagent.common.version import AGENT_NAME, AGENT_LONG_NAME, \ AGENT_VERSION, \ @@ -105,9 +105,8 @@ def sleep_if_disabled(self): agent_disabled_file_path = conf.get_disable_agent_file_path() if os.path.exists(agent_disabled_file_path): import threading - logger.warn("Disabling the guest agent by sleeping forever; " - "to re-enable, remove {0} and restart" - .format(agent_disabled_file_path)) + logger.warn("Disabling the guest agent by sleeping forever; to re-enable, remove {0} and restart".format(agent_disabled_file_path)) + logger.warn("To enable VM extensions, also ensure that the VM's osProfile.allowExtensionOperations property is set to true.") self.running = False disable_event = threading.Event() disable_event.wait() diff --git a/azurelinuxagent/ga/agent_update_handler.py b/azurelinuxagent/ga/agent_update_handler.py new file mode 100644 index 000000000..8caec1087 --- /dev/null +++ b/azurelinuxagent/ga/agent_update_handler.py @@ -0,0 +1,241 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2020 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Requires Python 2.6+ and Openssl 1.0+ +import os + +from azurelinuxagent.common import conf, logger +from azurelinuxagent.common.event import add_event, WALAEventOperation +from azurelinuxagent.common.exception import AgentUpgradeExitException, AgentUpdateError, AgentFamilyMissingError +from azurelinuxagent.common.future import ustr +from azurelinuxagent.common.protocol.restapi import VMAgentUpdateStatuses, VMAgentUpdateStatus, VERSION_0 +from azurelinuxagent.common.utils import textutil +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.common.version import get_daemon_version +from azurelinuxagent.ga.rsm_version_updater import RSMVersionUpdater +from azurelinuxagent.ga.self_update_version_updater import SelfUpdateVersionUpdater + + +def get_agent_update_handler(protocol): + return AgentUpdateHandler(protocol) + + +class AgentUpdateHandler(object): + """ + This class handles two type of agent updates. Handler initializes the updater to SelfUpdateVersionUpdater and switch to appropriate updater based on below conditions: + RSM update: This is the update requested by RSM. The contract between CRP and agent is we get following properties in the goal state: + version: it will have what version to update + isVersionFromRSM: True if the version is from RSM deployment. + isVMEnabledForRSMUpgrades: True if the VM is enabled for RSM upgrades. + if vm enabled for RSM upgrades, we use RSM update path. But if requested update is not by rsm deployment + we ignore the update. + Self update: We fallback to this if above is condition not met. This update to the largest version available in the manifest + Note: Self-update don't support downgrade. + + Handler keeps the rsm state of last update is with RSM or not on every new goal state. Once handler decides which updater to use, then + does following steps: + 1. Retrieve the agent version from the goal state. + 2. Check if we allowed to update for that version. + 3. Log the update message. + 4. Purge the extra agents from disk. + 5. Download the new agent. + 6. Proceed with update. + + [Note: 1.0.8.147 is the minimum supported version of HGPA which will have the isVersionFromRSM and isVMEnabledForRSMUpgrades properties in vmsettings.] + """ + def __init__(self, protocol): + self._protocol = protocol + self._gs_id = "unknown" + self._ga_family_type = conf.get_autoupdate_gafamily() + self._daemon_version = self._get_daemon_version_for_update() + self._last_attempted_update_error_msg = "" + + # restore the state of rsm update. Default to self-update if last update is not with RSM. + if not self._get_is_last_update_with_rsm(): + self._updater = SelfUpdateVersionUpdater(self._gs_id) + else: + self._updater = RSMVersionUpdater(self._gs_id, self._daemon_version) + + @staticmethod + def _get_daemon_version_for_update(): + daemon_version = get_daemon_version() + if daemon_version != FlexibleVersion(VERSION_0): + return daemon_version + # We return 0.0.0.0 if daemon version is not specified. In that case, + # use the min version as 2.2.53 as we started setting the daemon version starting 2.2.53. + return FlexibleVersion("2.2.53") + + @staticmethod + def _get_rsm_update_state_file(): + """ + This file keeps if last attempted update is rsm or not. + """ + return os.path.join(conf.get_lib_dir(), "rsm_update.json") + + def _save_rsm_update_state(self): + """ + Save the rsm state empty file when we switch to RSM + """ + try: + with open(self._get_rsm_update_state_file(), "w"): + pass + except Exception as e: + logger.warn("Error creating the RSM state ({0}): {1}", self._get_rsm_update_state_file(), ustr(e)) + + def _remove_rsm_update_state(self): + """ + Remove the rsm state file when we switch to self-update + """ + try: + if os.path.exists(self._get_rsm_update_state_file()): + os.remove(self._get_rsm_update_state_file()) + except Exception as e: + logger.warn("Error removing the RSM state ({0}): {1}", self._get_rsm_update_state_file(), ustr(e)) + + def _get_is_last_update_with_rsm(self): + """ + Returns True if state file exists as this consider as last update with RSM is true + """ + return os.path.exists(self._get_rsm_update_state_file()) + + def _get_agent_family_manifest(self, goal_state): + """ + Get the agent_family from last GS for the given family + Returns: first entry of Manifest + Exception if no manifests found in the last GS and log it only on new goal state + """ + family = self._ga_family_type + agent_families = goal_state.extensions_goal_state.agent_families + family_found = False + agent_family_manifests = [] + for m in agent_families: + if m.name == family: + family_found = True + if len(m.uris) > 0: + agent_family_manifests.append(m) + + if not family_found: + raise AgentFamilyMissingError(u"Agent family: {0} not found in the goal state: {1}, skipping agent update \n" + u"[Note: This error is permanent for this goal state and Will not log same error until we receive new goal state]".format(family, self._gs_id)) + + if len(agent_family_manifests) == 0: + raise AgentFamilyMissingError( + u"No manifest links found for agent family: {0} for goal state: {1}, skipping agent update \n" + u"[Note: This error is permanent for this goal state and will not log same error until we receive new goal state]".format( + family, self._gs_id)) + return agent_family_manifests[0] + + def run(self, goal_state, ext_gs_updated): + + try: + # If auto update is disabled, we don't proceed with update + if not conf.get_auto_update_to_latest_version(): + return + + # Update the state only on new goal state + if ext_gs_updated: + self._gs_id = goal_state.extensions_goal_state.id + self._updater.sync_new_gs_id(self._gs_id) + + agent_family = self._get_agent_family_manifest(goal_state) + + # Updater will return True or False if we need to switch the updater + # If self-updater receives RSM update enabled, it will switch to RSM updater + # If RSM updater receives RSM update disabled, it will switch to self-update + # No change in updater if GS not updated + is_rsm_update_enabled = self._updater.is_rsm_update_enabled(agent_family, ext_gs_updated) + + if not is_rsm_update_enabled and isinstance(self._updater, RSMVersionUpdater): + msg = "VM not enabled for RSM updates, switching to self-update mode" + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + self._updater = SelfUpdateVersionUpdater(self._gs_id) + self._remove_rsm_update_state() + + if is_rsm_update_enabled and isinstance(self._updater, SelfUpdateVersionUpdater): + msg = "VM enabled for RSM updates, switching to RSM update mode" + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + self._updater = RSMVersionUpdater(self._gs_id, self._daemon_version) + self._save_rsm_update_state() + + # If updater is changed in previous step, we allow update as it consider as first attempt. If not, it checks below condition + # RSM checks new goal state; self-update checks manifest download interval + if not self._updater.is_update_allowed_this_time(ext_gs_updated): + return + + self._updater.retrieve_agent_version(agent_family, goal_state) + + if not self._updater.is_retrieved_version_allowed_to_update(agent_family): + return + self._updater.log_new_agent_update_message() + agent = self._updater.download_and_get_new_agent(self._protocol, agent_family, goal_state) + + # Below condition is to break the update loop if new agent is in bad state in previous attempts + # If the bad agent update already attempted 3 times, we don't want to continue with update anymore. + # Otherewise we allow the update by increment the update attempt count and clear the bad state to make good agent + # [Note: As a result, it is breaking contract between RSM and agent, we may NOT honor the RSM retries for that version] + if agent.get_update_attempt_count() >= 3: + msg = "Attempted enough update retries for version: {0} but still agent not recovered from bad state. So, we stop updating to this version".format(str(agent.version)) + raise AgentUpdateError(msg) + else: + agent.clear_error() + agent.inc_update_attempt_count() + msg = "Agent update attempt count: {0} for version: {1}".format(agent.get_update_attempt_count(), str(agent.version)) + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + + self._updater.purge_extra_agents_from_disk() + self._updater.proceed_with_update() + + except Exception as err: + log_error = True + if isinstance(err, AgentUpgradeExitException): + raise err + elif isinstance(err, AgentUpdateError): + error_msg = ustr(err) + elif isinstance(err, AgentFamilyMissingError): + error_msg = ustr(err) + # Agent family missing error is permanent in the given goal state, so we don't want to log it on every iteration of main loop if there is no new goal state + log_error = ext_gs_updated + else: + error_msg = "Unable to update Agent: {0}".format(textutil.format_exception(err)) + if log_error: + logger.warn(error_msg) + add_event(op=WALAEventOperation.AgentUpgrade, is_success=False, message=error_msg, log_event=False) + self._last_attempted_update_error_msg = error_msg + + def get_vmagent_update_status(self): + """ + This function gets the VMAgent update status as per the last attempted update. + Returns: None if fail to report or update never attempted with rsm version specified in GS + Note: We send the status regardless of updater type. Since we call this main loop, want to avoid fetching agent family to decide and send only if + vm enabled for rsm updates. + """ + try: + if conf.get_enable_ga_versioning(): + if not self._last_attempted_update_error_msg: + status = VMAgentUpdateStatuses.Success + code = 0 + else: + status = VMAgentUpdateStatuses.Error + code = 1 + return VMAgentUpdateStatus(expected_version=str(self._updater.version), status=status, code=code, message=self._last_attempted_update_error_msg) + except Exception as err: + msg = "Unable to report agent update status: {0}".format(textutil.format_exception(err)) + logger.warn(msg) + add_event(op=WALAEventOperation.AgentUpgrade, is_success=False, message=msg, log_event=True) + return None diff --git a/azurelinuxagent/common/cgroup.py b/azurelinuxagent/ga/cgroup.py similarity index 100% rename from azurelinuxagent/common/cgroup.py rename to azurelinuxagent/ga/cgroup.py diff --git a/azurelinuxagent/common/cgroupapi.py b/azurelinuxagent/ga/cgroupapi.py similarity index 97% rename from azurelinuxagent/common/cgroupapi.py rename to azurelinuxagent/ga/cgroupapi.py index ca0ef3bb5..6f4bf4ab3 100644 --- a/azurelinuxagent/common/cgroupapi.py +++ b/azurelinuxagent/ga/cgroupapi.py @@ -23,15 +23,15 @@ import uuid from azurelinuxagent.common import logger -from azurelinuxagent.common.cgroup import CpuCgroup, MemoryCgroup -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import CpuCgroup, MemoryCgroup +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.conf import get_agent_pid_file_path from azurelinuxagent.common.exception import CGroupsException, ExtensionErrorCodes, ExtensionError, \ ExtensionOperationError from azurelinuxagent.common.future import ustr from azurelinuxagent.common.osutil import systemd from azurelinuxagent.common.utils import fileutil, shellutil -from azurelinuxagent.common.utils.extensionprocessutil import handle_process_completion, read_output, \ +from azurelinuxagent.ga.extensionprocessutil import handle_process_completion, read_output, \ TELEMETRY_MESSAGE_MAX_LEN from azurelinuxagent.common.utils.flexible_version import FlexibleVersion from azurelinuxagent.common.version import get_distro @@ -59,7 +59,8 @@ def cgroups_supported(): distro_version = FlexibleVersion(distro_info[1]) except ValueError: return False - return distro_name.lower() == 'ubuntu' and distro_version.major >= 16 + return (distro_name.lower() == 'ubuntu' and distro_version.major >= 16) or \ + (distro_name.lower() in ('centos', 'redhat') and 8 <= distro_version.major < 9) @staticmethod def track_cgroups(extension_cgroups): diff --git a/azurelinuxagent/common/cgroupconfigurator.py b/azurelinuxagent/ga/cgroupconfigurator.py similarity index 99% rename from azurelinuxagent/common/cgroupconfigurator.py rename to azurelinuxagent/ga/cgroupconfigurator.py index 767786f01..e52fc15d0 100644 --- a/azurelinuxagent/common/cgroupconfigurator.py +++ b/azurelinuxagent/ga/cgroupconfigurator.py @@ -23,15 +23,15 @@ from azurelinuxagent.common import conf from azurelinuxagent.common import logger -from azurelinuxagent.common.cgroup import CpuCgroup, AGENT_NAME_TELEMETRY, MetricsCounter, MemoryCgroup -from azurelinuxagent.common.cgroupapi import CGroupsApi, SystemdCgroupsApi, SystemdRunError, EXTENSION_SLICE_PREFIX -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import CpuCgroup, AGENT_NAME_TELEMETRY, MetricsCounter, MemoryCgroup +from azurelinuxagent.ga.cgroupapi import CGroupsApi, SystemdCgroupsApi, SystemdRunError, EXTENSION_SLICE_PREFIX +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.exception import ExtensionErrorCodes, CGroupsException, AgentMemoryExceededException from azurelinuxagent.common.future import ustr from azurelinuxagent.common.osutil import get_osutil, systemd from azurelinuxagent.common.version import get_distro from azurelinuxagent.common.utils import shellutil, fileutil -from azurelinuxagent.common.utils.extensionprocessutil import handle_process_completion +from azurelinuxagent.ga.extensionprocessutil import handle_process_completion from azurelinuxagent.common.event import add_event, WALAEventOperation AZURE_SLICE = "azure.slice" diff --git a/azurelinuxagent/common/cgroupstelemetry.py b/azurelinuxagent/ga/cgroupstelemetry.py similarity index 98% rename from azurelinuxagent/common/cgroupstelemetry.py rename to azurelinuxagent/ga/cgroupstelemetry.py index 7b6bba0aa..5943b45ad 100644 --- a/azurelinuxagent/common/cgroupstelemetry.py +++ b/azurelinuxagent/ga/cgroupstelemetry.py @@ -17,7 +17,7 @@ import threading from azurelinuxagent.common import logger -from azurelinuxagent.common.cgroup import CpuCgroup +from azurelinuxagent.ga.cgroup import CpuCgroup from azurelinuxagent.common.future import ustr diff --git a/azurelinuxagent/ga/collect_logs.py b/azurelinuxagent/ga/collect_logs.py index 95c42f3a7..4987d865e 100644 --- a/azurelinuxagent/ga/collect_logs.py +++ b/azurelinuxagent/ga/collect_logs.py @@ -21,16 +21,16 @@ import sys import threading import time -from azurelinuxagent.common import cgroupconfigurator, logcollector +from azurelinuxagent.ga import logcollector, cgroupconfigurator import azurelinuxagent.common.conf as conf from azurelinuxagent.common import logger -from azurelinuxagent.common.cgroup import MetricsCounter +from azurelinuxagent.ga.cgroup import MetricsCounter from azurelinuxagent.common.event import elapsed_milliseconds, add_event, WALAEventOperation, report_metric from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.interfaces import ThreadHandlerInterface -from azurelinuxagent.common.logcollector import COMPRESSED_ARCHIVE_PATH, GRACEFUL_KILL_ERRCODE -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator, LOGCOLLECTOR_MEMORY_LIMIT +from azurelinuxagent.ga.interfaces import ThreadHandlerInterface +from azurelinuxagent.ga.logcollector import COMPRESSED_ARCHIVE_PATH, GRACEFUL_KILL_ERRCODE +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator, LOGCOLLECTOR_MEMORY_LIMIT from azurelinuxagent.common.protocol.util import get_protocol_util from azurelinuxagent.common.utils import shellutil from azurelinuxagent.common.utils.shellutil import CommandError @@ -83,16 +83,16 @@ def get_thread_name(): return CollectLogsHandler._THREAD_NAME @staticmethod - def enable_cgroups_validation(): + def enable_monitor_cgroups_check(): os.environ[CollectLogsHandler.__CGROUPS_FLAG_ENV_VARIABLE] = "1" @staticmethod - def disable_cgroups_validation(): + def disable_monitor_cgroups_check(): if CollectLogsHandler.__CGROUPS_FLAG_ENV_VARIABLE in os.environ: del os.environ[CollectLogsHandler.__CGROUPS_FLAG_ENV_VARIABLE] @staticmethod - def should_validate_cgroups(): + def is_enabled_monitor_cgroups_check(): if CollectLogsHandler.__CGROUPS_FLAG_ENV_VARIABLE in os.environ: return os.environ[CollectLogsHandler.__CGROUPS_FLAG_ENV_VARIABLE] == "1" return False @@ -147,7 +147,7 @@ def daemon(self): time.sleep(_INITIAL_LOG_COLLECTION_DELAY) try: - CollectLogsHandler.enable_cgroups_validation() + CollectLogsHandler.enable_monitor_cgroups_check() if self.protocol_util is None or self.protocol is None: self.init_protocols() @@ -162,7 +162,7 @@ def daemon(self): except Exception as e: logger.error("An error occurred in the log collection thread; will exit the thread.\n{0}", ustr(e)) finally: - CollectLogsHandler.disable_cgroups_validation() + CollectLogsHandler.disable_monitor_cgroups_check() def collect_and_send_logs(self): if self._collect_logs(): diff --git a/azurelinuxagent/ga/collect_telemetry_events.py b/azurelinuxagent/ga/collect_telemetry_events.py index 01049ee87..e0144a639 100644 --- a/azurelinuxagent/ga/collect_telemetry_events.py +++ b/azurelinuxagent/ga/collect_telemetry_events.py @@ -31,7 +31,7 @@ CollectOrReportEventDebugInfo, EVENT_FILE_REGEX, parse_event from azurelinuxagent.common.exception import InvalidExtensionEventError, ServiceStoppedError from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.interfaces import ThreadHandlerInterface +from azurelinuxagent.ga.interfaces import ThreadHandlerInterface from azurelinuxagent.common.telemetryevent import TelemetryEvent, TelemetryEventParam, \ GuestAgentGenericLogsSchema, GuestAgentExtensionEventsSchema from azurelinuxagent.common.utils import textutil diff --git a/azurelinuxagent/ga/env.py b/azurelinuxagent/ga/env.py index 5e1705934..0e73e7d3e 100644 --- a/azurelinuxagent/ga/env.py +++ b/azurelinuxagent/ga/env.py @@ -28,7 +28,7 @@ from azurelinuxagent.common.dhcp import get_dhcp_handler from azurelinuxagent.common.event import add_periodic, WALAEventOperation, add_event from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.interfaces import ThreadHandlerInterface +from azurelinuxagent.ga.interfaces import ThreadHandlerInterface from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.protocol.util import get_protocol_util from azurelinuxagent.common.version import AGENT_NAME, CURRENT_VERSION @@ -171,7 +171,11 @@ def _operation(self): self._hostname, curr_hostname) self._osutil.set_hostname(curr_hostname) - self._osutil.publish_hostname(curr_hostname) + try: + self._osutil.publish_hostname(curr_hostname, recover_nic=True) + except Exception as e: + msg = "Error while publishing the hostname: {0}".format(e) + add_event(AGENT_NAME, op=WALAEventOperation.HostnamePublishing, is_success=False, message=msg, log_event=False) self._hostname = curr_hostname diff --git a/azurelinuxagent/common/utils/extensionprocessutil.py b/azurelinuxagent/ga/extensionprocessutil.py similarity index 68% rename from azurelinuxagent/common/utils/extensionprocessutil.py rename to azurelinuxagent/ga/extensionprocessutil.py index 9038f6145..d2b37551b 100644 --- a/azurelinuxagent/common/utils/extensionprocessutil.py +++ b/azurelinuxagent/ga/extensionprocessutil.py @@ -18,10 +18,13 @@ # import os +import re import signal import time +from azurelinuxagent.common import conf from azurelinuxagent.common import logger +from azurelinuxagent.common.event import WALAEventOperation, add_event from azurelinuxagent.common.exception import ExtensionErrorCodes, ExtensionOperationError, ExtensionError from azurelinuxagent.common.future import ustr @@ -73,7 +76,7 @@ def handle_process_completion(process, command, timeout, stdout, stderr, error_c process_output = read_output(stdout, stderr) if timed_out: - if cpu_cgroup is not None:# Report CPUThrottledTime when timeout happens + if cpu_cgroup is not None: # Report CPUThrottledTime when timeout happens raise ExtensionError("Timeout({0});CPUThrottledTime({1}secs): {2}\n{3}".format(timeout, throttled_time, command, process_output), code=ExtensionErrorCodes.PluginHandlerScriptTimedout) @@ -81,12 +84,65 @@ def handle_process_completion(process, command, timeout, stdout, stderr, error_c code=ExtensionErrorCodes.PluginHandlerScriptTimedout) if return_code != 0: - raise ExtensionOperationError("Non-zero exit code: {0}, {1}\n{2}".format(return_code, command, process_output), - code=error_code, exit_code=return_code) + noexec_warning = "" + if return_code == 126: # Permission denied + noexec_path = _check_noexec() + if noexec_path is not None: + noexec_warning = "\nWARNING: {0} is mounted with the noexec flag, which can prevent execution of VM Extensions.".format(noexec_path) + raise ExtensionOperationError( + "Non-zero exit code: {0}, {1}{2}\n{3}".format(return_code, command, noexec_warning, process_output), + code=error_code, + exit_code=return_code) return process_output +# +# Collect a sample of errors while checking for the noexec flag. Consider removing this telemetry after a few releases. +# +_COLLECT_NOEXEC_ERRORS = True + + +def _check_noexec(): + """ + Check if /var is mounted with the noexec flag. + """ + # W0603: Using the global statement (global-statement) + # OK to disable; _COLLECT_NOEXEC_ERRORS is used only within _check_noexec, but needs to persist across calls. + global _COLLECT_NOEXEC_ERRORS # pylint: disable=W0603 + + try: + agent_dir = conf.get_lib_dir() + with open('/proc/mounts', 'r') as f: + while True: + line = f.readline() + if line == "": # EOF + break + # The mount point is on the second column, and the flags are on the fourth. e.g. + # + # # grep /var /proc/mounts + # /dev/mapper/rootvg-varlv /var xfs rw,seclabel,noexec,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0 + # + columns = line.split() + mount_point = columns[1] + flags = columns[3] + if agent_dir.startswith(mount_point) and "noexec" in flags: + message = "The noexec flag is set on {0}. This can prevent extensions from executing.".format(mount_point) + logger.warn(message) + add_event(op=WALAEventOperation.NoExec, is_success=False, message=message) + return mount_point + except Exception as e: + message = "Error while checking the noexec flag: {0}".format(e) + logger.warn(message) + if _COLLECT_NOEXEC_ERRORS: + _COLLECT_NOEXEC_ERRORS = False + add_event(op=WALAEventOperation.NoExec, is_success=False, log_event=False, message="Error while checking the noexec flag: {0}".format(e)) + return None + + +SAS_TOKEN_RE = re.compile(r'(https://\S+\?)((sv|st|se|sr|sp|sip|spr|sig)=\S+)+', flags=re.IGNORECASE) + + def read_output(stdout, stderr): """ Read the output of the process sent to stdout and stderr and trim them to the max appropriate length. @@ -103,7 +159,11 @@ def read_output(stdout, stderr): stderr = ustr(stderr.read(TELEMETRY_MESSAGE_MAX_LEN), encoding='utf-8', errors='backslashreplace') - return format_stdout_stderr(stdout, stderr) + def redact(s): + # redact query strings that look like SAS tokens + return SAS_TOKEN_RE.sub(r'\1', s) + + return format_stdout_stderr(redact(stdout), redact(stderr)) except Exception as e: return format_stdout_stderr("", "Cannot read stdout/stderr: {0}".format(ustr(e))) diff --git a/azurelinuxagent/ga/exthandlers.py b/azurelinuxagent/ga/exthandlers.py index 0aa4ed93d..fcb14d22b 100644 --- a/azurelinuxagent/ga/exthandlers.py +++ b/azurelinuxagent/ga/exthandlers.py @@ -33,11 +33,12 @@ from azurelinuxagent.common import conf from azurelinuxagent.common import logger +from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common import version from azurelinuxagent.common.agent_supported_feature import get_agent_supported_features_list_for_extensions, \ SupportedFeatureNames, get_supported_feature_by_name, get_agent_supported_features_list_for_crp -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator from azurelinuxagent.common.datacontract import get_properties, set_properties from azurelinuxagent.common.errorstate import ErrorState from azurelinuxagent.common.event import add_event, elapsed_milliseconds, WALAEventOperation, \ @@ -52,8 +53,7 @@ from azurelinuxagent.common.utils import textutil from azurelinuxagent.common.utils.archive import ARCHIVE_DIRECTORY_NAME from azurelinuxagent.common.utils.flexible_version import FlexibleVersion -from azurelinuxagent.common.version import AGENT_NAME, CURRENT_VERSION, \ - PY_VERSION_MAJOR, PY_VERSION_MICRO, PY_VERSION_MINOR +from azurelinuxagent.common.version import AGENT_NAME, CURRENT_VERSION _HANDLER_NAME_PATTERN = r'^([^-]+)' _HANDLER_VERSION_PATTERN = r'(\d+(?:\.\d+)*)' @@ -299,7 +299,7 @@ def run(self): # we make a deep copy of the extensions, since changes are made to self.ext_handlers while processing the extensions self.ext_handlers = copy.deepcopy(egs.extensions) - if not self._extension_processing_allowed(): + if self._extensions_on_hold(): return utc_start = datetime.datetime.utcnow() @@ -433,17 +433,15 @@ def _cleanup_outdated_handlers(self): except OSError as e: logger.warn("Failed to remove extension package {0}: {1}".format(pkg, e.strerror)) - def _extension_processing_allowed(self): - if not conf.get_extensions_enabled(): - logger.verbose("Extension handling is disabled") - return False - + def _extensions_on_hold(self): if conf.get_enable_overprovisioning(): if self.protocol.get_goal_state().extensions_goal_state.on_hold: - logger.info("Extension handling is on hold") - return False + msg = "Extension handling is on hold" + logger.info(msg) + add_event(op=WALAEventOperation.ExtensionProcessing, message=msg) + return True - return True + return False @staticmethod def __get_dependency_level(tup): @@ -478,10 +476,30 @@ def handle_ext_handlers(self, goal_state_id): max_dep_level = self.__get_dependency_level(all_extensions[-1]) if any(all_extensions) else 0 depends_on_err_msg = None + extensions_enabled = conf.get_extensions_enabled() for extension, ext_handler in all_extensions: handler_i = ExtHandlerInstance(ext_handler, self.protocol, extension=extension) + # In case of extensions disabled, we skip processing extensions. But CRP is still waiting for some status + # back for the skipped extensions. In order to propagate the status back to CRP, we will report status back + # here with an error message. + if not extensions_enabled: + agent_conf_file_path = get_osutil().agent_conf_file_path + msg = "Extension will not be processed since extension processing is disabled. To enable extension " \ + "processing, set Extensions.Enabled=y in '{0}'".format(agent_conf_file_path) + ext_full_name = handler_i.get_extension_full_name(extension) + logger.info('') + logger.info("{0}: {1}".format(ext_full_name, msg)) + add_event(op=WALAEventOperation.ExtensionProcessing, message="{0}: {1}".format(ext_full_name, msg)) + handler_i.set_handler_status(status=ExtHandlerStatusValue.not_ready, message=msg, code=-1) + handler_i.create_status_file_if_not_exist(extension, + status=ExtensionStatusValue.error, + code=-1, + operation=handler_i.operation, + message=msg) + continue + # In case of depends-on errors, we skip processing extensions if there was an error processing dependent extensions. # But CRP is still waiting for some status back for the skipped extensions. In order to propagate the status back to CRP, # we will report status back here with the relevant error message for each of the dependent extension. @@ -945,33 +963,6 @@ def report_ext_handlers_status(self, goal_state_changed=False, vm_agent_update_s message=msg) return None - def get_ext_handlers_status_debug_info(self, vm_status): - status_blob_text = self.protocol.get_status_blob_data() - if status_blob_text is None: - status_blob_text = "" - - support_multi_config = {} - vm_status_data = get_properties(vm_status) - vm_handler_statuses = vm_status_data.get('vmAgent', {}).get('extensionHandlers') - for handler_status in vm_handler_statuses: - if handler_status.get('name') is not None: - support_multi_config[handler_status.get('name')] = handler_status.get('supports_multi_config') - - debug_text = json.dumps({ - "agentName": AGENT_NAME, - "daemonVersion": str(version.get_daemon_version()), - "pythonVersion": "Python: {0}.{1}.{2}".format(PY_VERSION_MAJOR, PY_VERSION_MINOR, PY_VERSION_MICRO), - "extensionSupportedFeatures": [name for name, _ in get_agent_supported_features_list_for_extensions().items()], - "supportsMultiConfig": support_multi_config - }) - - return '''{{ - "__comment__": "The __status__ property is the actual status reported to CRP", - "__status__": {0}, - "__debug__": {1} -}} -'''.format(status_blob_text, debug_text) - def report_ext_handler_status(self, vm_status, ext_handler, goal_state_changed): ext_handler_i = ExtHandlerInstance(ext_handler, self.protocol) @@ -991,7 +982,9 @@ def report_ext_handler_status(self, vm_status, ext_handler, goal_state_changed): # For MultiConfig, we need to report status per extension even for Handler level failures. # If we have HandlerStatus for a MultiConfig handler and GS is requesting for it, we would report status per # extension even if HandlerState == NotInstalled (Sample scenario: ExtensionsGoalStateError, DecideVersionError, etc) - if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config: + # We also need to report extension status for an uninstalled handler if extensions are disabled because CRP + # waits for extension runtime status before failing the extension operation. + if handler_state != ExtHandlerState.NotInstalled or ext_handler.supports_multi_config or not conf.get_extensions_enabled(): # Since we require reading the Manifest for reading the heartbeat, this would fail if HandlerManifest not found. # Only try to read heartbeat if HandlerState != NotInstalled. @@ -1058,7 +1051,7 @@ def get_extension_full_name(self, extension=None): def __set_command_execution_log(self, extension, execution_log_max_size): try: - fileutil.mkdir(self.get_log_dir(), mode=0o755) + fileutil.mkdir(self.get_log_dir(), mode=0o755, reset_mode_and_owner=False) except IOError as e: self.logger.error(u"Failed to create extension log dir: {0}", e) else: diff --git a/azurelinuxagent/ga/ga_version_updater.py b/azurelinuxagent/ga/ga_version_updater.py new file mode 100644 index 000000000..82a621eac --- /dev/null +++ b/azurelinuxagent/ga/ga_version_updater.py @@ -0,0 +1,182 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2020 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Requires Python 2.6+ and Openssl 1.0+ + +import glob +import os +import shutil + +from azurelinuxagent.common import conf, logger +from azurelinuxagent.common.exception import AgentUpdateError +from azurelinuxagent.common.future import ustr +from azurelinuxagent.common.protocol.extensions_goal_state import GoalStateSource +from azurelinuxagent.common.utils import fileutil +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.common.version import AGENT_NAME, AGENT_DIR_PATTERN, CURRENT_VERSION +from azurelinuxagent.ga.guestagent import GuestAgent, AGENT_MANIFEST_FILE + + +class GAVersionUpdater(object): + + def __init__(self, gs_id): + self._gs_id = gs_id + self._version = FlexibleVersion("0.0.0.0") # Initialize to zero and retrieve from goal state later stage + self._agent_manifest = None # Initialize to None and fetch from goal state at different stage for different updater + + def is_update_allowed_this_time(self, ext_gs_updated): + """ + This function checks if we allowed to update the agent. + @param ext_gs_updated: True if extension goal state updated else False + @return false when we don't allow updates. + """ + raise NotImplementedError + + def is_rsm_update_enabled(self, agent_family, ext_gs_updated): + """ + return True if we need to switch to RSM-update from self-update and vice versa. + @param agent_family: agent family + @param ext_gs_updated: True if extension goal state updated else False + @return: False when agent need to stop rsm updates + True: when agent need to switch to rsm update + """ + raise NotImplementedError + + def retrieve_agent_version(self, agent_family, goal_state): + """ + This function fetches the agent version from the goal state for the given family. + @param agent_family: agent family + @param goal_state: goal state + """ + raise NotImplementedError + + def is_retrieved_version_allowed_to_update(self, agent_family): + """ + Checks all base condition if new version allow to update. + @param agent_family: agent family + @return: True if allowed to update else False + """ + raise NotImplementedError + + def log_new_agent_update_message(self): + """ + This function logs the update message after we check agent allowed to update. + """ + raise NotImplementedError + + def proceed_with_update(self): + """ + performs upgrade/downgrade + @return: AgentUpgradeExitException + """ + raise NotImplementedError + + @property + def version(self): + """ + Return version + """ + return self._version + + def sync_new_gs_id(self, gs_id): + """ + Update gs_id + @param gs_id: goal state id + """ + self._gs_id = gs_id + + @staticmethod + def download_new_agent_pkg(package_to_download, protocol, is_fast_track_goal_state): + """ + Function downloads the new agent. + @param package_to_download: package to download + @param protocol: protocol object + @param is_fast_track_goal_state: True if goal state is fast track else False + """ + agent_name = "{0}-{1}".format(AGENT_NAME, package_to_download.version) + agent_dir = os.path.join(conf.get_lib_dir(), agent_name) + agent_pkg_path = ".".join((os.path.join(conf.get_lib_dir(), agent_name), "zip")) + agent_handler_manifest_file = os.path.join(agent_dir, AGENT_MANIFEST_FILE) + if not os.path.exists(agent_dir) or not os.path.isfile(agent_handler_manifest_file): + protocol.client.download_zip_package("agent package", package_to_download.uris, agent_pkg_path, agent_dir, use_verify_header=is_fast_track_goal_state) + else: + logger.info("Agent {0} was previously downloaded - skipping download", agent_name) + + if not os.path.isfile(agent_handler_manifest_file): + try: + # Clean up the agent directory if the manifest file is missing + logger.info("Agent handler manifest file is missing, cleaning up the agent directory: {0}".format(agent_dir)) + if os.path.isdir(agent_dir): + shutil.rmtree(agent_dir, ignore_errors=True) + except Exception as err: + logger.warn("Unable to delete Agent directory: {0}".format(err)) + raise AgentUpdateError("Downloaded agent package: {0} is missing agent handler manifest file: {1}".format(agent_name, agent_handler_manifest_file)) + + def download_and_get_new_agent(self, protocol, agent_family, goal_state): + """ + Function downloads the new agent and returns the downloaded version. + @param protocol: protocol object + @param agent_family: agent family + @param goal_state: goal state + @return: GuestAgent: downloaded agent + """ + if self._agent_manifest is None: # Fetch agent manifest if it's not already done + self._agent_manifest = goal_state.fetch_agent_manifest(agent_family.name, agent_family.uris) + package_to_download = self._get_agent_package_to_download(self._agent_manifest, self._version) + is_fast_track_goal_state = goal_state.extensions_goal_state.source == GoalStateSource.FastTrack + self.download_new_agent_pkg(package_to_download, protocol, is_fast_track_goal_state) + agent = GuestAgent.from_agent_package(package_to_download) + return agent + + def purge_extra_agents_from_disk(self): + """ + Remove the agents from disk except current version and new agent version + """ + known_agents = [CURRENT_VERSION, self._version] + self._purge_unknown_agents_from_disk(known_agents) + + def _get_agent_package_to_download(self, agent_manifest, version): + """ + Returns the package of the given Version found in the manifest. If not found, returns exception + """ + for pkg in agent_manifest.pkg_list.versions: + if FlexibleVersion(pkg.version) == version: + # Found a matching package, only download that one + return pkg + + raise AgentUpdateError("No matching package found in the agent manifest for version: {0} in goal state incarnation: {1}, " + "skipping agent update".format(str(version), self._gs_id)) + + @staticmethod + def _purge_unknown_agents_from_disk(known_agents): + """ + Remove from disk all directories and .zip files of unknown agents + """ + path = os.path.join(conf.get_lib_dir(), "{0}-*".format(AGENT_NAME)) + + for agent_path in glob.iglob(path): + try: + name = fileutil.trim_ext(agent_path, "zip") + m = AGENT_DIR_PATTERN.match(name) + if m is not None and FlexibleVersion(m.group(1)) not in known_agents: + if os.path.isfile(agent_path): + logger.info(u"Purging outdated Agent file {0}", agent_path) + os.remove(agent_path) + else: + logger.info(u"Purging outdated Agent directory {0}", agent_path) + shutil.rmtree(agent_path) + except Exception as e: + logger.warn(u"Purging {0} raised exception: {1}", agent_path, ustr(e)) diff --git a/azurelinuxagent/ga/guestagent.py b/azurelinuxagent/ga/guestagent.py new file mode 100644 index 000000000..b4b2d05b3 --- /dev/null +++ b/azurelinuxagent/ga/guestagent.py @@ -0,0 +1,331 @@ +import json +import os +import shutil +import time + +from azurelinuxagent.common.event import add_event, WALAEventOperation +from azurelinuxagent.common.future import ustr +from azurelinuxagent.common.utils import textutil + +from azurelinuxagent.common import logger, conf +from azurelinuxagent.common.exception import UpdateError +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.common.version import AGENT_DIR_PATTERN, AGENT_NAME +from azurelinuxagent.ga.exthandlers import HandlerManifest + +AGENT_ERROR_FILE = "error.json" # File name for agent error record +AGENT_MANIFEST_FILE = "HandlerManifest.json" +MAX_FAILURE = 3 # Max failure allowed for agent before declare bad agent +AGENT_UPDATE_COUNT_FILE = "update_attempt.json" # File for tracking agent update attempt count + + +class GuestAgent(object): + def __init__(self, path, pkg): + """ + If 'path' is given, the object is initialized to the version installed under that path. + + If 'pkg' is given, the version specified in the package information is downloaded and the object is + initialized to that version. + + NOTE: Prefer using the from_installed_agent and from_agent_package methods instead of calling __init__ directly + """ + self.pkg = pkg + version = None + if path is not None: + m = AGENT_DIR_PATTERN.match(path) + if m is None: + raise UpdateError(u"Illegal agent directory: {0}".format(path)) + version = m.group(1) + elif self.pkg is not None: + version = pkg.version + + if version is None: + raise UpdateError(u"Illegal agent version: {0}".format(version)) + self.version = FlexibleVersion(version) + + location = u"disk" if path is not None else u"package" + logger.verbose(u"Loading Agent {0} from {1}", self.name, location) + + self.error = GuestAgentError(self.get_agent_error_file()) + self.error.load() + + self.update_attempt_data = GuestAgentUpdateAttempt(self.get_agent_update_count_file()) + self.update_attempt_data.load() + + try: + self._ensure_loaded() + except Exception as e: + # If we're unable to unpack the agent, delete the Agent directory + try: + if os.path.isdir(self.get_agent_dir()): + shutil.rmtree(self.get_agent_dir(), ignore_errors=True) + except Exception as err: + logger.warn("Unable to delete Agent files: {0}".format(err)) + msg = u"Agent {0} install failed with exception:".format( + self.name) + detailed_msg = '{0} {1}'.format(msg, textutil.format_exception(e)) + add_event( + AGENT_NAME, + version=self.version, + op=WALAEventOperation.Install, + is_success=False, + message=detailed_msg) + + @staticmethod + def from_installed_agent(path): + """ + Creates an instance of GuestAgent using the agent installed in the given 'path'. + """ + return GuestAgent(path, None) + + @staticmethod + def from_agent_package(package): + """ + Creates an instance of GuestAgent using the information provided in the 'package'; if that version of the agent is not installed it, it installs it. + """ + return GuestAgent(None, package) + + @property + def name(self): + return "{0}-{1}".format(AGENT_NAME, self.version) + + def get_agent_cmd(self): + return self.manifest.get_enable_command() + + def get_agent_dir(self): + return os.path.join(conf.get_lib_dir(), self.name) + + def get_agent_error_file(self): + return os.path.join(conf.get_lib_dir(), self.name, AGENT_ERROR_FILE) + + def get_agent_update_count_file(self): + return os.path.join(conf.get_lib_dir(), self.name, AGENT_UPDATE_COUNT_FILE) + + def get_agent_manifest_path(self): + return os.path.join(self.get_agent_dir(), AGENT_MANIFEST_FILE) + + def get_agent_pkg_path(self): + return ".".join((os.path.join(conf.get_lib_dir(), self.name), "zip")) + + def clear_error(self): + self.error.clear() + self.error.save() + + @property + def is_available(self): + return self.is_downloaded and not self.is_blacklisted + + @property + def is_blacklisted(self): + return self.error is not None and self.error.is_blacklisted + + @property + def is_downloaded(self): + return self.is_blacklisted or \ + os.path.isfile(self.get_agent_manifest_path()) + + def mark_failure(self, is_fatal=False, reason=''): + try: + if not os.path.isdir(self.get_agent_dir()): + os.makedirs(self.get_agent_dir()) + self.error.mark_failure(is_fatal=is_fatal, reason=reason) + self.error.save() + if self.error.is_blacklisted: + msg = u"Agent {0} is permanently blacklisted".format(self.name) + logger.warn(msg) + add_event(op=WALAEventOperation.AgentBlacklisted, is_success=False, message=msg, log_event=False, + version=self.version) + except Exception as e: + logger.warn(u"Agent {0} failed recording error state: {1}", self.name, ustr(e)) + + def inc_update_attempt_count(self): + try: + self.update_attempt_data.inc_count() + self.update_attempt_data.save() + except Exception as e: + logger.warn(u"Agent {0} failed recording update attempt: {1}", self.name, ustr(e)) + + def get_update_attempt_count(self): + return self.update_attempt_data.count + + def _ensure_loaded(self): + self._load_manifest() + self._load_error() + + def _load_error(self): + try: + self.error = GuestAgentError(self.get_agent_error_file()) + self.error.load() + logger.verbose(u"Agent {0} error state: {1}", self.name, ustr(self.error)) + except Exception as e: + logger.warn(u"Agent {0} failed loading error state: {1}", self.name, ustr(e)) + + def _load_manifest(self): + path = self.get_agent_manifest_path() + if not os.path.isfile(path): + msg = u"Agent {0} is missing the {1} file".format(self.name, AGENT_MANIFEST_FILE) + raise UpdateError(msg) + + with open(path, "r") as manifest_file: + try: + manifests = json.load(manifest_file) + except Exception as e: + msg = u"Agent {0} has a malformed {1} ({2})".format(self.name, AGENT_MANIFEST_FILE, ustr(e)) + raise UpdateError(msg) + if type(manifests) is list: + if len(manifests) <= 0: + msg = u"Agent {0} has an empty {1}".format(self.name, AGENT_MANIFEST_FILE) + raise UpdateError(msg) + manifest = manifests[0] + else: + manifest = manifests + + try: + self.manifest = HandlerManifest(manifest) # pylint: disable=W0201 + if len(self.manifest.get_enable_command()) <= 0: + raise Exception(u"Manifest is missing the enable command") + except Exception as e: + msg = u"Agent {0} has an illegal {1}: {2}".format( + self.name, + AGENT_MANIFEST_FILE, + ustr(e)) + raise UpdateError(msg) + + logger.verbose( + u"Agent {0} loaded manifest from {1}", + self.name, + self.get_agent_manifest_path()) + logger.verbose(u"Successfully loaded Agent {0} {1}: {2}", + self.name, + AGENT_MANIFEST_FILE, + ustr(self.manifest.data)) + return + + +class GuestAgentError(object): + def __init__(self, path): + self.last_failure = 0.0 + self.was_fatal = False + if path is None: + raise UpdateError(u"GuestAgentError requires a path") + self.path = path + self.failure_count = 0 + self.reason = '' + + self.clear() + return + + def mark_failure(self, is_fatal=False, reason=''): + self.last_failure = time.time() + self.failure_count += 1 + self.was_fatal = is_fatal + self.reason = reason + return + + def clear(self): + self.last_failure = 0.0 + self.failure_count = 0 + self.was_fatal = False + self.reason = '' + return + + @property + def is_blacklisted(self): + return self.was_fatal or self.failure_count >= MAX_FAILURE + + def load(self): + if self.path is not None and os.path.isfile(self.path): + try: + with open(self.path, 'r') as f: + self.from_json(json.load(f)) + except Exception as error: + # The error.json file is only supposed to be written only by the agent. + # If for whatever reason the file is malformed, just delete it to reset state of the errors. + logger.warn( + "Ran into error when trying to load error file {0}, deleting it to clean state. Error: {1}".format( + self.path, textutil.format_exception(error))) + try: + os.remove(self.path) + except Exception: + # We try best case efforts to delete the file, ignore error if we're unable to do so + pass + return + + def save(self): + if os.path.isdir(os.path.dirname(self.path)): + with open(self.path, 'w') as f: + json.dump(self.to_json(), f) + return + + def from_json(self, data): + self.last_failure = max(self.last_failure, data.get(u"last_failure", 0.0)) + self.failure_count = max(self.failure_count, data.get(u"failure_count", 0)) + self.was_fatal = self.was_fatal or data.get(u"was_fatal", False) + reason = data.get(u"reason", '') + self.reason = reason if reason != '' else self.reason + return + + def to_json(self): + data = { + u"last_failure": self.last_failure, + u"failure_count": self.failure_count, + u"was_fatal": self.was_fatal, + u"reason": ustr(self.reason) + } + return data + + def __str__(self): + return "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( + self.last_failure, + self.failure_count, + self.was_fatal, + self.reason) + + +class GuestAgentUpdateAttempt(object): + def __init__(self, path): + self.count = 0 + if path is None: + raise UpdateError(u"GuestAgentUpdateAttempt requires a path") + self.path = path + + self.clear() + + def inc_count(self): + self.count += 1 + + def clear(self): + self.count = 0 + + def load(self): + if self.path is not None and os.path.isfile(self.path): + try: + with open(self.path, 'r') as f: + self.from_json(json.load(f)) + except Exception as error: + # The update_attempt.json file is only supposed to be written only by the agent. + # If for whatever reason the file is malformed, just delete it to reset state of the errors. + logger.warn( + "Ran into error when trying to load error file {0}, deleting it to clean state. Error: {1}".format( + self.path, textutil.format_exception(error))) + try: + os.remove(self.path) + except Exception: + # We try best case efforts to delete the file, ignore error if we're unable to do so + pass + + def save(self): + if os.path.isdir(os.path.dirname(self.path)): + with open(self.path, 'w') as f: + json.dump(self.to_json(), f) + + def from_json(self, data): + self.count = data.get(u"count", 0) + + def to_json(self): + data = { + u"count": self.count + } + return data + + diff --git a/azurelinuxagent/common/interfaces.py b/azurelinuxagent/ga/interfaces.py similarity index 100% rename from azurelinuxagent/common/interfaces.py rename to azurelinuxagent/ga/interfaces.py diff --git a/azurelinuxagent/common/logcollector.py b/azurelinuxagent/ga/logcollector.py similarity index 93% rename from azurelinuxagent/common/logcollector.py rename to azurelinuxagent/ga/logcollector.py index fe62a7db6..e21b1f51f 100644 --- a/azurelinuxagent/common/logcollector.py +++ b/azurelinuxagent/ga/logcollector.py @@ -26,11 +26,10 @@ from datetime import datetime from heapq import heappush, heappop -from azurelinuxagent.common.cgroup import CpuCgroup, AGENT_LOG_COLLECTOR, MemoryCgroup from azurelinuxagent.common.conf import get_lib_dir, get_ext_log_dir, get_agent_log_file from azurelinuxagent.common.event import initialize_event_logger_vminfo_common_parameters from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.logcollector_manifests import MANIFEST_NORMAL, MANIFEST_FULL +from azurelinuxagent.ga.logcollector_manifests import MANIFEST_NORMAL, MANIFEST_FULL # Please note: be careful when adding agent dependencies in this module. # This module uses its own logger and logs to its own file, not to the agent log. @@ -71,14 +70,13 @@ class LogCollector(object): _TRUNCATED_FILE_PREFIX = "truncated_" - def __init__(self, is_full_mode=False, cpu_cgroup_path=None, memory_cgroup_path=None): + def __init__(self, is_full_mode=False): self._is_full_mode = is_full_mode self._manifest = MANIFEST_FULL if is_full_mode else MANIFEST_NORMAL self._must_collect_files = self._expand_must_collect_files() self._create_base_dirs() self._set_logger() self._initialize_telemetry() - self.cgroups = self._set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path) @staticmethod def _mkdir(dirname): @@ -105,17 +103,6 @@ def _set_logger(): _LOGGER.addHandler(_f_handler) _LOGGER.setLevel(logging.INFO) - @staticmethod - def _set_resource_usage_cgroups(cpu_cgroup_path, memory_cgroup_path): - cpu_cgroup = CpuCgroup(AGENT_LOG_COLLECTOR, cpu_cgroup_path) - msg = "Started tracking cpu cgroup {0}".format(cpu_cgroup) - _LOGGER.info(msg) - cpu_cgroup.initialize_cpu_usage() - memory_cgroup = MemoryCgroup(AGENT_LOG_COLLECTOR, memory_cgroup_path) - msg = "Started tracking memory cgroup {0}".format(memory_cgroup) - _LOGGER.info(msg) - return [cpu_cgroup, memory_cgroup] - @staticmethod def _initialize_telemetry(): protocol = get_protocol_util().get_protocol(init_goal_state=False) @@ -373,9 +360,18 @@ def collect_logs_and_get_archive(self): try: compressed_archive = zipfile.ZipFile(COMPRESSED_ARCHIVE_PATH, "w", compression=zipfile.ZIP_DEFLATED) + max_errors = 8 + error_count = 0 for file_to_collect in files_to_collect: - archive_file_name = LogCollector._convert_file_name_to_archive_name(file_to_collect) - compressed_archive.write(file_to_collect.encode("utf-8"), arcname=archive_file_name) + try: + archive_file_name = LogCollector._convert_file_name_to_archive_name(file_to_collect) + compressed_archive.write(file_to_collect.encode("utf-8"), arcname=archive_file_name) + except Exception as e: + error_count += 1 + if error_count >= max_errors: + raise Exception("Too many errors, giving up. Last error: {0}".format(ustr(e))) + else: + _LOGGER.warning("Failed to add file %s to the archive: %s", file_to_collect, ustr(e)) compressed_archive_size = os.path.getsize(COMPRESSED_ARCHIVE_PATH) _LOGGER.info("Successfully compressed files. Compressed archive size is %s b", compressed_archive_size) diff --git a/azurelinuxagent/common/logcollector_manifests.py b/azurelinuxagent/ga/logcollector_manifests.py similarity index 96% rename from azurelinuxagent/common/logcollector_manifests.py rename to azurelinuxagent/ga/logcollector_manifests.py index e77da3d47..3548de4fc 100644 --- a/azurelinuxagent/common/logcollector_manifests.py +++ b/azurelinuxagent/ga/logcollector_manifests.py @@ -83,6 +83,7 @@ copy,/var/lib/dhcp/dhclient.eth0.leases copy,/var/lib/dhclient/dhclient-eth0.leases copy,/var/lib/wicked/lease-eth0-dhcp-ipv4.xml +copy,/run/systemd/netif/leases/2 echo, echo,### Gathering Log Files ### @@ -119,4 +120,8 @@ echo,### Gathering Disk Info ### diskinfo, + +echo,### Gathering Guest ProxyAgent Log Files ### +copy,/var/log/azure-proxy-agent/* +echo, """ diff --git a/azurelinuxagent/ga/monitor.py b/azurelinuxagent/ga/monitor.py index e2744bc43..1c123d70e 100644 --- a/azurelinuxagent/ga/monitor.py +++ b/azurelinuxagent/ga/monitor.py @@ -22,13 +22,13 @@ import azurelinuxagent.common.conf as conf import azurelinuxagent.common.logger as logger import azurelinuxagent.common.utils.networkutil as networkutil -from azurelinuxagent.common.cgroup import MetricValue, MetricsCategory, MetricsCounter -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import MetricValue, MetricsCategory, MetricsCounter +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.errorstate import ErrorState from azurelinuxagent.common.event import add_event, WALAEventOperation, report_metric from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.interfaces import ThreadHandlerInterface +from azurelinuxagent.ga.interfaces import ThreadHandlerInterface from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.protocol.healthservice import HealthService from azurelinuxagent.common.protocol.imds import get_imds_client diff --git a/azurelinuxagent/common/persist_firewall_rules.py b/azurelinuxagent/ga/persist_firewall_rules.py similarity index 100% rename from azurelinuxagent/common/persist_firewall_rules.py rename to azurelinuxagent/ga/persist_firewall_rules.py diff --git a/azurelinuxagent/ga/rsm_version_updater.py b/azurelinuxagent/ga/rsm_version_updater.py new file mode 100644 index 000000000..366f1d703 --- /dev/null +++ b/azurelinuxagent/ga/rsm_version_updater.py @@ -0,0 +1,137 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2020 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Requires Python 2.6+ and Openssl 1.0+ + +import glob +import os + +from azurelinuxagent.common import conf, logger +from azurelinuxagent.common.event import add_event, WALAEventOperation +from azurelinuxagent.common.exception import AgentUpgradeExitException, AgentUpdateError +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.common.version import CURRENT_VERSION, AGENT_NAME +from azurelinuxagent.ga.ga_version_updater import GAVersionUpdater +from azurelinuxagent.ga.guestagent import GuestAgent + + +class RSMVersionUpdater(GAVersionUpdater): + def __init__(self, gs_id, daemon_version): + super(RSMVersionUpdater, self).__init__(gs_id) + self._daemon_version = daemon_version + + @staticmethod + def _get_all_agents_on_disk(): + path = os.path.join(conf.get_lib_dir(), "{0}-*".format(AGENT_NAME)) + return [GuestAgent.from_installed_agent(path=agent_dir) for agent_dir in glob.iglob(path) if + os.path.isdir(agent_dir)] + + def _get_available_agents_on_disk(self): + available_agents = [agent for agent in self._get_all_agents_on_disk() if agent.is_available] + return sorted(available_agents, key=lambda agent: agent.version, reverse=True) + + def is_update_allowed_this_time(self, ext_gs_updated): + """ + RSM update allowed if we have a new goal state + """ + return ext_gs_updated + + def is_rsm_update_enabled(self, agent_family, ext_gs_updated): + """ + Checks if there is a new goal state and decide if we need to continue with rsm update or switch to self-update. + Firstly it checks agent supports GA versioning or not. If not, we return false to switch to self-update. + if vm is enabled for RSM updates and continue with rsm update, otherwise we return false to switch to self-update. + if either isVersionFromRSM or isVMEnabledForRSMUpgrades or version is missing in the goal state, we ignore the update as we consider it as invalid goal state. + """ + if ext_gs_updated: + if not conf.get_enable_ga_versioning(): + return False + + if agent_family.is_vm_enabled_for_rsm_upgrades is None: + raise AgentUpdateError( + "Received invalid goal state:{0}, missing isVMEnabledForRSMUpgrades property. So, skipping agent update".format( + self._gs_id)) + elif not agent_family.is_vm_enabled_for_rsm_upgrades: + return False + else: + if agent_family.is_version_from_rsm is None: + raise AgentUpdateError( + "Received invalid goal state:{0}, missing isVersionFromRSM property. So, skipping agent update".format( + self._gs_id)) + if agent_family.version is None: + raise AgentUpdateError( + "Received invalid goal state:{0}, missing version property. So, skipping agent update".format( + self._gs_id)) + + return True + + def retrieve_agent_version(self, agent_family, goal_state): + """ + Get the agent version from the goal state + """ + self._version = FlexibleVersion(agent_family.version) + + def is_retrieved_version_allowed_to_update(self, agent_family): + """ + Once version retrieved from goal state, we check if we allowed to update for that version + allow update If new version not same as current version, not below than daemon version and if version is from rsm request + """ + + if not agent_family.is_version_from_rsm or self._version < self._daemon_version or self._version == CURRENT_VERSION: + return False + + return True + + def log_new_agent_update_message(self): + """ + This function logs the update message after we check version allowed to update. + """ + msg = "New agent version:{0} requested by RSM in Goal state {1}, will update the agent before processing the goal state.".format( + str(self._version), self._gs_id) + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + + def proceed_with_update(self): + """ + upgrade/downgrade to the new version. + Raises: AgentUpgradeExitException + """ + if self._version < CURRENT_VERSION: + # In case of a downgrade, we mark the current agent as bad version to avoid starting it back up ever again + # (the expectation here being that if we get request to a downgrade, + # there's a good reason for not wanting the current version). + prefix = "downgrade" + try: + # We should always have an agent directory for the CURRENT_VERSION + agents_on_disk = self._get_available_agents_on_disk() + current_agent = next(agent for agent in agents_on_disk if agent.version == CURRENT_VERSION) + msg = "Marking the agent {0} as bad version since a downgrade was requested in the GoalState, " \ + "suggesting that we really don't want to execute any extensions using this version".format( + CURRENT_VERSION) + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + current_agent.mark_failure(is_fatal=True, reason=msg) + except StopIteration: + logger.warn( + "Could not find a matching agent with current version {0} to blacklist, skipping it".format( + CURRENT_VERSION)) + else: + # In case of an upgrade, we don't need to exclude anything as the daemon will automatically + # start the next available highest version which would be the target version + prefix = "upgrade" + raise AgentUpgradeExitException( + "Current Agent {0} completed all update checks, exiting current process to {1} to the new Agent version {2}".format(CURRENT_VERSION, + prefix, self._version)) diff --git a/azurelinuxagent/ga/self_update_version_updater.py b/azurelinuxagent/ga/self_update_version_updater.py new file mode 100644 index 000000000..5a839851d --- /dev/null +++ b/azurelinuxagent/ga/self_update_version_updater.py @@ -0,0 +1,184 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2020 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Requires Python 2.6+ and Openssl 1.0+ + +import datetime + +from azurelinuxagent.common import conf, logger +from azurelinuxagent.common.event import add_event, WALAEventOperation +from azurelinuxagent.common.exception import AgentUpgradeExitException, AgentUpdateError +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.common.version import CURRENT_VERSION +from azurelinuxagent.ga.ga_version_updater import GAVersionUpdater + + +class SelfUpdateType(object): + """ + Enum for different modes of Self updates + """ + Hotfix = "Hotfix" + Regular = "Regular" + + +class SelfUpdateVersionUpdater(GAVersionUpdater): + def __init__(self, gs_id): + super(SelfUpdateVersionUpdater, self).__init__(gs_id) + self._last_attempted_manifest_download_time = datetime.datetime.min + self._last_attempted_self_update_time = datetime.datetime.min + + @staticmethod + def _get_largest_version(agent_manifest): + """ + Get the largest version from the agent manifest + """ + largest_version = FlexibleVersion("0.0.0.0") + for pkg in agent_manifest.pkg_list.versions: + pkg_version = FlexibleVersion(pkg.version) + if pkg_version > largest_version: + largest_version = pkg_version + return largest_version + + @staticmethod + def _get_agent_upgrade_type(version): + # We follow semantic versioning for the agent, if .. is same, then has changed. + # In this case, we consider it as a Hotfix upgrade. Else we consider it a Regular upgrade. + if version.major == CURRENT_VERSION.major and version.minor == CURRENT_VERSION.minor and version.patch == CURRENT_VERSION.patch: + return SelfUpdateType.Hotfix + return SelfUpdateType.Regular + + @staticmethod + def _get_next_process_time(last_val, frequency, now): + """ + Get the next upgrade time + """ + return now if last_val == datetime.datetime.min else last_val + datetime.timedelta(seconds=frequency) + + def _is_new_agent_allowed_update(self): + """ + This method ensure that update is allowed only once per (hotfix/Regular) upgrade frequency + """ + now = datetime.datetime.utcnow() + upgrade_type = self._get_agent_upgrade_type(self._version) + if upgrade_type == SelfUpdateType.Hotfix: + next_update_time = self._get_next_process_time(self._last_attempted_self_update_time, + conf.get_self_update_hotfix_frequency(), now) + else: + next_update_time = self._get_next_process_time(self._last_attempted_self_update_time, + conf.get_self_update_regular_frequency(), now) + + if self._version > CURRENT_VERSION: + message = "Self-update discovered new {0} upgrade WALinuxAgent-{1}; Will upgrade on or after {2}".format( + upgrade_type, str(self._version), next_update_time.strftime(logger.Logger.LogTimeFormatInUTC)) + logger.info(message) + add_event(op=WALAEventOperation.AgentUpgrade, message=message, log_event=False) + + if next_update_time <= now: + # Update the last upgrade check time even if no new agent is available for upgrade + self._last_attempted_self_update_time = now + return True + return False + + def _should_agent_attempt_manifest_download(self): + """ + The agent should attempt to download the manifest if + the agent has not attempted to download the manifest in the last 1 hour + If we allow update, we update the last attempted manifest download time + """ + now = datetime.datetime.utcnow() + + if self._last_attempted_manifest_download_time != datetime.datetime.min: + next_attempt_time = self._last_attempted_manifest_download_time + datetime.timedelta( + seconds=conf.get_autoupdate_frequency()) + else: + next_attempt_time = now + + if next_attempt_time > now: + return False + self._last_attempted_manifest_download_time = now + return True + + def is_update_allowed_this_time(self, ext_gs_updated): + """ + Checks if we allowed download manifest as per manifest download frequency + """ + if not self._should_agent_attempt_manifest_download(): + return False + return True + + def is_rsm_update_enabled(self, agent_family, ext_gs_updated): + """ + Checks if there is a new goal state and decide if we need to continue with self-update or switch to rsm update. + if vm is not enabled for RSM updates or agent not supports GA versioning then we continue with self update, otherwise we return true to switch to rsm update. + if isVersionFromRSM is missing but isVMEnabledForRSMUpgrades is present in the goal state, we ignore the update as we consider it as invalid goal state. + """ + if ext_gs_updated: + if conf.get_enable_ga_versioning() and agent_family.is_vm_enabled_for_rsm_upgrades is not None and agent_family.is_vm_enabled_for_rsm_upgrades: + if agent_family.is_version_from_rsm is None: + raise AgentUpdateError( + "Received invalid goal state:{0}, missing isVersionFromRSM property. So, skipping agent update".format( + self._gs_id)) + else: + if agent_family.version is None: + raise AgentUpdateError( + "Received invalid goal state:{0}, missing version property. So, skipping agent update".format( + self._gs_id)) + return True + + return False + + def retrieve_agent_version(self, agent_family, goal_state): + """ + Get the largest version from the agent manifest + """ + self._agent_manifest = goal_state.fetch_agent_manifest(agent_family.name, agent_family.uris) + largest_version = self._get_largest_version(self._agent_manifest) + self._version = largest_version + + def is_retrieved_version_allowed_to_update(self, agent_family): + """ + checks update is spread per (as specified in the conf.get_self_update_hotfix_frequency() or conf.get_self_update_regular_frequency()) + or if version below than current version + return false when we don't allow updates. + """ + if not self._is_new_agent_allowed_update(): + return False + + if self._version <= CURRENT_VERSION: + return False + + return True + + def log_new_agent_update_message(self): + """ + This function logs the update message after we check version allowed to update. + """ + msg = "Self-update is ready to upgrade the new agent: {0} now before processing the goal state: {1}".format( + str(self._version), self._gs_id) + logger.info(msg) + add_event(op=WALAEventOperation.AgentUpgrade, message=msg, log_event=False) + + def proceed_with_update(self): + """ + upgrade to largest version. Downgrade is not supported. + Raises: AgentUpgradeExitException + """ + if self._version > CURRENT_VERSION: + # In case of an upgrade, we don't need to exclude anything as the daemon will automatically + # start the next available highest version which would be the target version + raise AgentUpgradeExitException( + "Current Agent {0} completed all update checks, exiting current process to upgrade to the new Agent version {1}".format(CURRENT_VERSION, + self._version)) diff --git a/azurelinuxagent/ga/send_telemetry_events.py b/azurelinuxagent/ga/send_telemetry_events.py index c2e277769..2923a43b1 100644 --- a/azurelinuxagent/ga/send_telemetry_events.py +++ b/azurelinuxagent/ga/send_telemetry_events.py @@ -24,7 +24,7 @@ from azurelinuxagent.common.event import add_event, WALAEventOperation from azurelinuxagent.common.exception import ServiceStoppedError from azurelinuxagent.common.future import ustr, Queue, Empty -from azurelinuxagent.common.interfaces import ThreadHandlerInterface +from azurelinuxagent.ga.interfaces import ThreadHandlerInterface from azurelinuxagent.common.utils import textutil diff --git a/azurelinuxagent/ga/update.py b/azurelinuxagent/ga/update.py index 2b0975b05..a8d34f7c4 100644 --- a/azurelinuxagent/ga/update.py +++ b/azurelinuxagent/ga/update.py @@ -17,7 +17,6 @@ # Requires Python 2.6+ and Openssl 1.0+ # import glob -import json import os import platform import re @@ -34,18 +33,17 @@ from azurelinuxagent.common import logger from azurelinuxagent.common.protocol.imds import get_imds_client from azurelinuxagent.common.utils import fileutil, textutil -from azurelinuxagent.common.agent_supported_feature import get_supported_feature_by_name, SupportedFeatureNames -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator +from azurelinuxagent.common.agent_supported_feature import get_supported_feature_by_name, SupportedFeatureNames, \ + get_agent_supported_features_list_for_crp +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator from azurelinuxagent.common.event import add_event, initialize_event_logger_vminfo_common_parameters, \ WALAEventOperation, EVENTS_DIRECTORY -from azurelinuxagent.common.exception import UpdateError, ExitException, AgentUpgradeExitException, AgentMemoryExceededException +from azurelinuxagent.common.exception import ExitException, AgentUpgradeExitException, AgentMemoryExceededException from azurelinuxagent.common.future import ustr from azurelinuxagent.common.osutil import get_osutil, systemd -from azurelinuxagent.common.persist_firewall_rules import PersistFirewallRulesHandler -from azurelinuxagent.common.protocol.goal_state import GoalStateSource +from azurelinuxagent.ga.persist_firewall_rules import PersistFirewallRulesHandler from azurelinuxagent.common.protocol.hostplugin import HostPluginProtocol, VmSettingsNotSupported -from azurelinuxagent.common.protocol.restapi import VMAgentUpdateStatus, VMAgentUpdateStatuses, ExtHandlerPackageList, \ - VERSION_0 +from azurelinuxagent.common.protocol.restapi import VERSION_0 from azurelinuxagent.common.protocol.util import get_protocol_util from azurelinuxagent.common.utils import shellutil from azurelinuxagent.common.utils.archive import StateArchiver, AGENT_STATUS_FILE @@ -55,16 +53,16 @@ from azurelinuxagent.common.version import AGENT_LONG_NAME, AGENT_NAME, AGENT_DIR_PATTERN, CURRENT_AGENT, AGENT_VERSION, \ CURRENT_VERSION, DISTRO_NAME, DISTRO_VERSION, get_lis_version, \ has_logrotate, PY_VERSION_MAJOR, PY_VERSION_MINOR, PY_VERSION_MICRO, get_daemon_version +from azurelinuxagent.ga.agent_update_handler import get_agent_update_handler from azurelinuxagent.ga.collect_logs import get_collect_logs_handler, is_log_collection_allowed from azurelinuxagent.ga.collect_telemetry_events import get_collect_telemetry_events_handler from azurelinuxagent.ga.env import get_env_handler -from azurelinuxagent.ga.exthandlers import HandlerManifest, ExtHandlersHandler, list_agent_lib_directory, \ +from azurelinuxagent.ga.exthandlers import ExtHandlersHandler, list_agent_lib_directory, \ ExtensionStatusValue, ExtHandlerStatusValue +from azurelinuxagent.ga.guestagent import GuestAgent from azurelinuxagent.ga.monitor import get_monitor_handler from azurelinuxagent.ga.send_telemetry_events import get_send_telemetry_events_handler -AGENT_ERROR_FILE = "error.json" # File name for agent error record -AGENT_MANIFEST_FILE = "HandlerManifest.json" AGENT_PARTITION_FILE = "partition" CHILD_HEALTH_INTERVAL = 15 * 60 @@ -72,8 +70,6 @@ CHILD_LAUNCH_RESTART_MAX = 3 CHILD_POLL_INTERVAL = 60 -MAX_FAILURE = 3 # Max failure allowed for agent before blacklisted - GOAL_STATE_PERIOD_EXTENSIONS_DISABLED = 5 * 60 ORPHAN_POLL_INTERVAL = 3 @@ -122,14 +118,6 @@ def __str__(self): return ustr(self.summary) -class AgentUpgradeType(object): - """ - Enum for different modes of Agent Upgrade - """ - Hotfix = "Hotfix" - Normal = "Normal" - - def get_update_handler(): return UpdateHandler() @@ -144,11 +132,6 @@ def __init__(self): self._is_running = True - # Member variables to keep track of the Agent AutoUpgrade - self.last_attempt_time = None - self._last_hotfix_upgrade_time = None - self._last_normal_upgrade_time = None - self.agents = [] self.child_agent = None @@ -162,9 +145,12 @@ def __init__(self): self._heartbeat_id = str(uuid.uuid4()).upper() self._heartbeat_counter = 0 - self._last_check_memory_usage = datetime.min + self._initial_attempt_check_memory_usage = True + self._last_check_memory_usage_time = time.time() self._check_memory_usage_last_error_report = datetime.min + self._cloud_init_completed = False # Only used when Extensions.WaitForCloudInit is enabled; note that this variable is always reset on service start. + # VM Size is reported via the heartbeat, default it here. self._vm_size = None @@ -331,9 +317,14 @@ def run(self, debug=False): logger.info("OS: {0} {1}", DISTRO_NAME, DISTRO_VERSION) logger.info("Python: {0}.{1}.{2}", PY_VERSION_MAJOR, PY_VERSION_MINOR, PY_VERSION_MICRO) + vm_arch = self.osutil.get_vm_arch() + logger.info("CPU Arch: {0}", vm_arch) + os_info_msg = u"Distro: {dist_name}-{dist_ver}; "\ - u"OSUtil: {util_name}; AgentService: {service_name}; "\ + u"OSUtil: {util_name}; "\ + u"AgentService: {service_name}; "\ u"Python: {py_major}.{py_minor}.{py_micro}; "\ + u"Arch: {vm_arch}; "\ u"systemd: {systemd}; "\ u"LISDrivers: {lis_ver}; "\ u"logrotate: {has_logrotate};".format( @@ -341,7 +332,7 @@ def run(self, debug=False): util_name=type(self.osutil).__name__, service_name=self.osutil.service_name, py_major=PY_VERSION_MAJOR, py_minor=PY_VERSION_MINOR, - py_micro=PY_VERSION_MICRO, systemd=systemd.is_systemd(), + py_micro=PY_VERSION_MICRO, vm_arch=vm_arch, systemd=systemd.is_systemd(), lis_ver=get_lis_version(), has_logrotate=has_logrotate() ) logger.info(os_info_msg) @@ -350,7 +341,7 @@ def run(self, debug=False): # Initialize the goal state; some components depend on information provided by the goal state and this # call ensures the required info is initialized (e.g. telemetry depends on the container ID.) # - protocol = self.protocol_util.get_protocol() + protocol = self.protocol_util.get_protocol(save_to_history=True) self._initialize_goal_state(protocol) @@ -359,6 +350,7 @@ def run(self, debug=False): # Send telemetry for the OS-specific info. add_event(AGENT_NAME, op=WALAEventOperation.OSInfo, message=os_info_msg) + self._log_openssl_info() # # Perform initialization tasks @@ -369,6 +361,7 @@ def run(self, debug=False): from azurelinuxagent.ga.remoteaccess import get_remote_access_handler remote_access_handler = get_remote_access_handler(protocol) + agent_update_handler = get_agent_update_handler(protocol) self._ensure_no_orphans() self._emit_restart_event() @@ -379,7 +372,6 @@ def run(self, debug=False): self._ensure_extension_telemetry_state_configured_properly(protocol) self._ensure_firewall_rules_persisted(dst_ip=protocol.get_endpoint()) self._add_accept_tcp_firewall_rule_if_not_enabled(dst_ip=protocol.get_endpoint()) - self._reset_legacy_blacklisted_agents() self._cleanup_legacy_goal_state_history() # Get all thread handlers @@ -402,7 +394,7 @@ def run(self, debug=False): while self.is_running: self._check_daemon_running(debug) self._check_threads_running(all_thread_handlers) - self._process_goal_state(exthandlers_handler, remote_access_handler) + self._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self._send_heartbeat_telemetry(protocol) self._check_agent_memory_usage() time.sleep(self._goal_state_period) @@ -424,6 +416,29 @@ def run(self, debug=False): self._shutdown() sys.exit(0) + @staticmethod + def _log_openssl_info(): + try: + version = shellutil.run_command(["openssl", "version"]) + message = "OpenSSL version: {0}".format(version) + logger.info(message) + add_event(op=WALAEventOperation.OpenSsl, message=message, is_success=True) + except Exception as e: + message = "Failed to get OpenSSL version: {0}".format(e) + logger.info(message) + add_event(op=WALAEventOperation.OpenSsl, message=message, is_success=False, log_event=False) + # + # Collect telemetry about the 'pkey' command. CryptUtil get_pubkey_from_prv() uses the 'pkey' command only as a fallback after trying 'rsa'. + # 'pkey' also works for RSA keys, but it may not be available on older versions of OpenSSL. Check telemetry after a few releases and if there + # are no versions of OpenSSL that do not support 'pkey' consider removing the use of 'rsa' altogether. + # + try: + shellutil.run_command(["openssl", "help", "pkey"]) + except Exception as e: + message = "OpenSSL does not support the pkey command: {0}".format(e) + logger.info(message) + add_event(op=WALAEventOperation.OpenSsl, message=message, is_success=False, log_event=False) + def _initialize_goal_state(self, protocol): # # Block until we can fetch the first goal state (self._try_update_goal_state() does its own logging and error handling). @@ -444,6 +459,22 @@ def _initialize_goal_state(self, protocol): logger.info("The current Fabric goal state is older than the most recent FastTrack goal state; will skip it.\nFabric: {0}\nFastTrack: {1}", egs.created_on_timestamp, last_fast_track_timestamp) + def _wait_for_cloud_init(self): + if conf.get_wait_for_cloud_init() and not self._cloud_init_completed: + message = "Waiting for cloud-init to complete..." + logger.info(message) + add_event(op=WALAEventOperation.CloudInit, message=message) + try: + output = shellutil.run_command(["cloud-init", "status", "--wait"], timeout=conf.get_wait_for_cloud_init_timeout()) + message = "cloud-init completed\n{0}".format(output) + logger.info(message) + add_event(op=WALAEventOperation.CloudInit, message=message) + except Exception as e: + message = "An error occurred while waiting for cloud-init; will proceed to execute VM extensions. Extensions that have conflicts with cloud-init may fail.\n{0}".format(ustr(e)) + logger.error(message) + add_event(op=WALAEventOperation.CloudInit, message=message, is_success=False, log_event=False) + self._cloud_init_completed = True # Mark as completed even on error since we will proceed to execute extensions + def _get_vm_size(self, protocol): """ Including VMSize is meant to capture the architecture of the VM (i.e. arm64 VMs will @@ -489,7 +520,7 @@ def _try_update_goal_state(self, protocol): try: max_errors_to_log = 3 - protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log) + protocol.client.update_goal_state(silent=self._update_goal_state_error_count >= max_errors_to_log, save_to_history=True) self._goal_state = protocol.get_goal_state() @@ -523,80 +554,6 @@ def _try_update_goal_state(self, protocol): return True - def __update_guest_agent(self, protocol): - """ - This function checks for new Agent updates and raises AgentUpgradeExitException if available. - There are 2 different ways the agent checks for an update - - 1) Requested Version is specified in the Goal State. - - In this case, the Agent will download the requested version and upgrade/downgrade instantly. - 2) No requested version. - - In this case, the agent will periodically check (1 hr) for new agent versions in GA Manifest. - - If available, it will download all versions > CURRENT_VERSION. - - Depending on the highest version > CURRENT_VERSION, - the agent will update within 4 hrs (for a Hotfix update) or 24 hrs (for a Normal update) - """ - - def log_next_update_time(): - next_normal_time, next_hotfix_time = self.__get_next_upgrade_times() - upgrade_type = self.__get_agent_upgrade_type(available_agent) - next_time = next_hotfix_time if upgrade_type == AgentUpgradeType.Hotfix else next_normal_time - message_ = "Discovered new {0} upgrade {1}; Will upgrade on or after {2}".format( - upgrade_type, available_agent.name, - datetime.utcfromtimestamp(next_time).strftime(logger.Logger.LogTimeFormatInUTC)) - add_event(AGENT_NAME, op=WALAEventOperation.AgentUpgrade, version=CURRENT_VERSION, is_success=True, - message=message_, log_event=False) - logger.info(message_) - - def handle_updates_for_requested_version(): - if requested_version < CURRENT_VERSION: - prefix = "downgrade" - # In case of a downgrade, we blacklist the current agent to avoid starting it back up ever again - # (the expectation here being that if RSM is asking us to a downgrade, - # there's a good reason for not wanting the current version). - try: - # We should always have an agent directory for the CURRENT_VERSION - # (unless the CURRENT_VERSION == daemon version, but since we don't support downgrading - # below daemon version, we will never reach this code path if that's the scenario) - current_agent = next(agent for agent in self.agents if agent.version == CURRENT_VERSION) - msg = "Blacklisting the agent {0} since a downgrade was requested in the GoalState, " \ - "suggesting that we really don't want to execute any extensions using this version".format( - CURRENT_VERSION) - logger.info(msg) - current_agent.mark_failure(is_fatal=True, reason=msg) - except StopIteration: - logger.warn( - "Could not find a matching agent with current version {0} to blacklist, skipping it".format( - CURRENT_VERSION)) - else: - # In case of an upgrade, we don't need to blacklist anything as the daemon will automatically - # start the next available highest version which would be the requested version - prefix = "upgrade" - raise AgentUpgradeExitException( - "Exiting current process to {0} to the request Agent version {1}".format(prefix, requested_version)) - - # Skip the update if there is no goal state yet or auto-update is disabled - if self._goal_state is None or not conf.get_autoupdate_enabled(): - return False - - if self._download_agent_if_upgrade_available(protocol): - # The call to get_latest_agent_greater_than_daemon() also finds all agents in directory and sets the self.agents property. - # This state is used to find the GuestAgent object with the current version later if requested version is available in last GS. - available_agent = self.get_latest_agent_greater_than_daemon() - requested_version, _ = self.__get_requested_version_and_agent_family_from_last_gs() - if requested_version is not None: - # If requested version specified, upgrade/downgrade to the specified version instantly as this is - # driven by the goal state (as compared to the agent periodically checking for new upgrades every hour) - handle_updates_for_requested_version() - elif available_agent is None: - # Legacy behavior: The current agent can become unavailable and needs to be reverted. - # In that case, self._upgrade_available() returns True and available_agent would be None. Handling it here. - raise AgentUpgradeExitException( - "Agent {0} is reverting to the installed agent -- exiting".format(CURRENT_AGENT)) - else: - log_next_update_time() - - self.__upgrade_agent_if_permitted() - def _processing_new_incarnation(self): """ True if we are currently processing a new incarnation (i.e. WireServer goal state) @@ -607,21 +564,22 @@ def _processing_new_extensions_goal_state(self): """ True if we are currently processing a new extensions goal state """ - egs = self._goal_state.extensions_goal_state - return self._goal_state is not None and egs.id != self._last_extensions_gs_id and not egs.is_outdated + return self._goal_state is not None and self._goal_state.extensions_goal_state.id != self._last_extensions_gs_id and not self._goal_state.extensions_goal_state.is_outdated - def _process_goal_state(self, exthandlers_handler, remote_access_handler): + def _process_goal_state(self, exthandlers_handler, remote_access_handler, agent_update_handler): protocol = exthandlers_handler.protocol # update self._goal_state if not self._try_update_goal_state(protocol): - # agent updates and status reporting should be done even when the goal state is not updated - self.__update_guest_agent(protocol) - self._report_status(exthandlers_handler) + agent_update_handler.run(self._goal_state, self._processing_new_extensions_goal_state()) + # status reporting should be done even when the goal state is not updated + self._report_status(exthandlers_handler, agent_update_handler) return # check for agent updates - self.__update_guest_agent(protocol) + agent_update_handler.run(self._goal_state, self._processing_new_extensions_goal_state()) + + self._wait_for_cloud_init() try: if self._processing_new_extensions_goal_state(): @@ -639,7 +597,7 @@ def _process_goal_state(self, exthandlers_handler, remote_access_handler): CGroupConfigurator.get_instance().check_cgroups(cgroup_metrics=[]) # report status before processing the remote access, since that operation can take a long time - self._report_status(exthandlers_handler) + self._report_status(exthandlers_handler, agent_update_handler) if self._processing_new_incarnation(): remote_access_handler.run() @@ -668,54 +626,19 @@ def _cleanup_legacy_goal_state_history(): except Exception as exception: logger.warn("Error removing legacy history files: {0}", ustr(exception)) - def __get_vmagent_update_status(self, goal_state_changed): - """ - This function gets the VMAgent update status as per the last GoalState. - Returns: None if the last GS does not ask for requested version else VMAgentUpdateStatus - """ - if not conf.get_enable_ga_versioning(): - return None - - update_status = None - - try: - requested_version, manifest = self.__get_requested_version_and_agent_family_from_last_gs() - if manifest is None and goal_state_changed: - logger.info("Unable to report update status as no matching manifest found for family: {0}".format( - conf.get_autoupdate_gafamily())) - return None - - if requested_version is not None: - if CURRENT_VERSION == requested_version: - status = VMAgentUpdateStatuses.Success - code = 0 - else: - status = VMAgentUpdateStatuses.Error - code = 1 - update_status = VMAgentUpdateStatus(expected_version=manifest.requested_version_string, status=status, - code=code) - except Exception as error: - if goal_state_changed: - err_msg = "[This error will only be logged once per goal state] " \ - "Ran into error when trying to fetch updateStatus for the agent, skipping reporting update satus. Error: {0}".format( - textutil.format_exception(error)) - logger.warn(err_msg) - add_event(op=WALAEventOperation.AgentUpgrade, is_success=False, message=err_msg, log_event=False) - - return update_status - - def _report_status(self, exthandlers_handler): - vm_agent_update_status = self.__get_vmagent_update_status(self._processing_new_extensions_goal_state()) + def _report_status(self, exthandlers_handler, agent_update_handler): # report_ext_handlers_status does its own error handling and returns None if an error occurred vm_status = exthandlers_handler.report_ext_handlers_status( goal_state_changed=self._processing_new_extensions_goal_state(), - vm_agent_update_status=vm_agent_update_status, vm_agent_supports_fast_track=self._supports_fast_track) + vm_agent_update_status=agent_update_handler.get_vmagent_update_status(), vm_agent_supports_fast_track=self._supports_fast_track) if vm_status is not None: self._report_extensions_summary(vm_status) if self._goal_state is not None: - agent_status = exthandlers_handler.get_ext_handlers_status_debug_info(vm_status) - self._goal_state.save_to_history(agent_status, AGENT_STATUS_FILE) + status_blob_text = exthandlers_handler.protocol.get_status_blob_data() + if status_blob_text is None: + status_blob_text = "{}" + self._goal_state.save_to_history(status_blob_text, AGENT_STATUS_FILE) if self._goal_state.extensions_goal_state.is_outdated: exthandlers_handler.protocol.client.get_host_plugin().clear_fast_track_state() @@ -831,6 +754,16 @@ def log_if_op_disabled(name, value): if not value: log_event("{0} is set to False, not processing the operation".format(name)) + def log_if_agent_versioning_feature_disabled(): + supports_ga_versioning = False + for _, feature in get_agent_supported_features_list_for_crp().items(): + if feature.name == SupportedFeatureNames.GAVersioningGovernance: + supports_ga_versioning = True + break + if not supports_ga_versioning: + msg = "Agent : {0} doesn't support GA Versioning".format(CURRENT_VERSION) + log_event(msg) + log_if_int_changed_from_default("Extensions.GoalStatePeriod", conf.get_goal_state_period(), "Changing this value affects how often extensions are processed and status for the VM is reported. Too small a value may report the VM as unresponsive") log_if_int_changed_from_default("Extensions.InitialGoalStatePeriod", conf.get_initial_goal_state_period(), @@ -838,6 +771,12 @@ def log_if_op_disabled(name, value): log_if_op_disabled("OS.EnableFirewall", conf.enable_firewall()) log_if_op_disabled("Extensions.Enabled", conf.get_extensions_enabled()) log_if_op_disabled("AutoUpdate.Enabled", conf.get_autoupdate_enabled()) + log_if_op_disabled("AutoUpdate.UpdateToLatestVersion", conf.get_auto_update_to_latest_version()) + + if conf.is_present("AutoUpdate.Enabled") and conf.get_autoupdate_enabled() != conf.get_auto_update_to_latest_version(): + msg = "AutoUpdate.Enabled property is **Deprecated** now but it's set to different value from AutoUpdate.UpdateToLatestVersion. Please consider removing it if added by mistake" + logger.warn(msg) + add_event(AGENT_NAME, op=WALAEventOperation.ConfigurationChange, message=msg) if conf.enable_firewall(): log_if_int_changed_from_default("OS.EnableFirewallPeriod", conf.get_enable_firewall_period()) @@ -851,6 +790,8 @@ def log_if_op_disabled(name, value): if conf.get_lib_dir() != "/var/lib/waagent": log_event("lib dir is in an unexpected location: {0}".format(conf.get_lib_dir())) + log_if_agent_versioning_feature_disabled() + except Exception as e: logger.warn("Failed to log changes in configuration: {0}", ustr(e)) @@ -1071,173 +1012,6 @@ def _shutdown(self): str(e)) return - def __get_requested_version_and_agent_family_from_last_gs(self): - """ - Get the requested version and corresponding manifests from last GS if supported - Returns: (Requested Version, Manifest) if supported and available - (None, None) if no manifests found in the last GS - (None, manifest) if not supported or not specified in GS - """ - family_name = conf.get_autoupdate_gafamily() - agent_families = self._goal_state.extensions_goal_state.agent_families - agent_families = [m for m in agent_families if m.name == family_name and len(m.uris) > 0] - if len(agent_families) == 0: - return None, None - if conf.get_enable_ga_versioning() and agent_families[0].is_requested_version_specified: - return agent_families[0].requested_version, agent_families[0] - return None, agent_families[0] - - def _download_agent_if_upgrade_available(self, protocol, base_version=CURRENT_VERSION): - """ - This function downloads the new agent if an update is available. - If a requested version is available in goal state, then only that version is downloaded (new-update model) - Else, we periodically (1hr by default) checks if new Agent upgrade is available and download it on filesystem if available (old-update model) - rtype: Boolean - return: True if current agent is no longer available or an agent with a higher version number is available - else False - """ - - def report_error(msg_, version_=CURRENT_VERSION, op=WALAEventOperation.Download): - logger.warn(msg_) - add_event(AGENT_NAME, op=op, version=version_, is_success=False, message=msg_, log_event=False) - - def can_proceed_with_requested_version(): - if not gs_updated: - # If the goal state didn't change, don't process anything. - return False - - # With the new model, we will get a new GS when CRP wants us to auto-update using required version. - # If there's no new goal state, don't proceed with anything - msg_ = "Found requested version in manifest: {0} for goal state {1}".format( - requested_version, goal_state_id) - logger.info(msg_) - add_event(AGENT_NAME, op=WALAEventOperation.AgentUpgrade, is_success=True, message=msg_, log_event=False) - - if requested_version < daemon_version: - # Don't process the update if the requested version is lesser than daemon version, - # as we don't support downgrades below daemon versions. - report_error( - "Can't process the upgrade as the requested version: {0} is < current daemon version: {1}".format( - requested_version, daemon_version), op=WALAEventOperation.AgentUpgrade) - return False - - return True - - def agent_upgrade_time_elapsed(now_): - if self.last_attempt_time is not None: - next_attempt_time = self.last_attempt_time + conf.get_autoupdate_frequency() - else: - next_attempt_time = now_ - if next_attempt_time > now_: - return False - return True - - agent_family_name = conf.get_autoupdate_gafamily() - gs_updated = False - daemon_version = self.__get_daemon_version_for_update() - try: - # Fetch the agent manifests from the latest Goal State - goal_state_id = self._goal_state.extensions_goal_state.id - gs_updated = self._processing_new_extensions_goal_state() - requested_version, agent_family = self.__get_requested_version_and_agent_family_from_last_gs() - if agent_family is None: - logger.verbose( - u"No manifest links found for agent family: {0} for goal state {1}, skipping update check".format( - agent_family_name, goal_state_id)) - return False - except Exception as err: - # If there's some issues in fetching the agent manifests, report it only on goal state change - msg = u"Exception retrieving agent manifests: {0}".format(textutil.format_exception(err)) - if gs_updated: - report_error(msg) - else: - logger.verbose(msg) - return False - - if requested_version is not None: - # If GA versioning is enabled and requested version present in GS, and it's a new GS, follow new logic - if not can_proceed_with_requested_version(): - return False - else: - # If no requested version specified in the Goal State, follow the old auto-update logic - # Note: If the first Goal State contains a requested version, this timer won't start (i.e. self.last_attempt_time won't be updated). - # If any subsequent goal state does not contain requested version, this timer will start then, and we will - # download all versions available in PIR and auto-update to the highest available version on that goal state. - now = time.time() - if not agent_upgrade_time_elapsed(now): - return False - - logger.info("No requested version specified, checking for all versions for agent update (family: {0})", - agent_family_name) - self.last_attempt_time = now - - try: - # If we make it to this point, then either there is a requested version in a new GS (new auto-update model), - # or the 1hr time limit has elapsed for us to check the agent manifest for updates (old auto-update model). - pkg_list = ExtHandlerPackageList() - - # If the requested version is the current version, don't download anything; - # the call to purge() below will delete all other agents from disk - # In this case, no need to even fetch the GA family manifest as we don't need to download any agent. - if requested_version is not None and requested_version == CURRENT_VERSION: - packages_to_download = [] - msg = "The requested version is running as the current version: {0}".format(requested_version) - logger.info(msg) - add_event(AGENT_NAME, op=WALAEventOperation.AgentUpgrade, is_success=True, message=msg) - else: - agent_manifest = self._goal_state.fetch_agent_manifest(agent_family.name, agent_family.uris) - pkg_list = agent_manifest.pkg_list - packages_to_download = pkg_list.versions - - # Verify the requested version is in GA family manifest (if specified) - if requested_version is not None and requested_version != CURRENT_VERSION: - for pkg in pkg_list.versions: - if FlexibleVersion(pkg.version) == requested_version: - # Found a matching package, only download that one - packages_to_download = [pkg] - break - else: - msg = "No matching package found in the agent manifest for requested version: {0} in goal state {1}, skipping agent update".format( - requested_version, goal_state_id) - report_error(msg, version_=requested_version) - return False - - # Set the agents to those available for download at least as current as the existing agent - # or to the requested version (if specified) - is_fast_track_goal_state = self._goal_state.extensions_goal_state.source == GoalStateSource.FastTrack - agents_to_download = [GuestAgent.from_agent_package(pkg, protocol, is_fast_track_goal_state) for pkg in packages_to_download] - - # Filter out the agents that were downloaded/extracted successfully. If the agent was not installed properly, - # we delete the directory and the zip package from the filesystem - self._set_and_sort_agents([agent for agent in agents_to_download if agent.is_available]) - - # Remove from disk any agent no longer needed in the VM. - # If requested version is provided, this would delete all other agents present on the VM except - - # - the current version and the requested version if requested version != current version - # - only the current version if requested version == current version - # Note: - # The code leaves on disk available, but blacklisted, agents to preserve the state. - # Otherwise, those agents could be downloaded again and inappropriately retried. - self._purge_agents() - self._filter_blacklisted_agents() - - # If there are no agents available to upgrade/downgrade to, return False - if len(self.agents) == 0: - return False - - if requested_version is not None: - # In case of requested version, return True if an agent with a different version number than the - # current version is available that is higher than the current daemon version - return self.agents[0].version != base_version and self.agents[0].version > daemon_version - else: - # Else, return True if the highest agent is > base_version (CURRENT_VERSION) - return self.agents[0].version > base_version - - except Exception as err: - msg = u"Exception downloading agents for update: {0}".format(textutil.format_exception(err)) - report_error(msg) - return False - def _write_pid_file(self): pid_files = self._get_pid_files() @@ -1269,13 +1043,10 @@ def _send_heartbeat_telemetry(self, protocol): if datetime.utcnow() >= (self._last_telemetry_heartbeat + UpdateHandler.TELEMETRY_HEARTBEAT_PERIOD): dropped_packets = self.osutil.get_firewall_dropped_packets(protocol.get_endpoint()) auto_update_enabled = 1 if conf.get_autoupdate_enabled() else 0 - # Include vm architecture in the heartbeat message because the kusto table does not have - # a separate column for it. - vmarch = self._get_vm_arch() - telemetry_msg = "{0};{1};{2};{3};{4};{5}".format(self._heartbeat_counter, self._heartbeat_id, dropped_packets, + telemetry_msg = "{0};{1};{2};{3};{4}".format(self._heartbeat_counter, self._heartbeat_id, dropped_packets, self._heartbeat_update_goal_state_error_count, - auto_update_enabled, vmarch) + auto_update_enabled) debug_log_msg = "[DEBUG HeartbeatCounter: {0};HeartbeatId: {1};DroppedPackets: {2};" \ "UpdateGSErrors: {3};AutoUpdate: {4}]".format(self._heartbeat_counter, self._heartbeat_id, dropped_packets, @@ -1298,8 +1069,11 @@ def _check_agent_memory_usage(self): """ try: if conf.get_enable_agent_memory_usage_check() and self._extensions_summary.converged: - if self._last_check_memory_usage == datetime.min or datetime.utcnow() >= (self._last_check_memory_usage + UpdateHandler.CHECK_MEMORY_USAGE_PERIOD): - self._last_check_memory_usage = datetime.utcnow() + # we delay first attempt memory usage check, so that current agent won't get blacklisted due to multiple restarts(because of memory limit reach) too frequently + if (self._initial_attempt_check_memory_usage and time.time() - self._last_check_memory_usage_time > CHILD_LAUNCH_INTERVAL) or \ + (not self._initial_attempt_check_memory_usage and time.time() - self._last_check_memory_usage_time > conf.get_cgroup_check_period()): + self._last_check_memory_usage_time = time.time() + self._initial_attempt_check_memory_usage = False CGroupConfigurator.get_instance().check_agent_memory_usage() except AgentMemoryExceededException as exception: msg = "Check on agent memory usage:\n{0}".format(ustr(exception)) @@ -1424,355 +1198,3 @@ def _execute_run_command(command): except Exception as e: msg = "Error while checking ip table rules:{0}".format(ustr(e)) logger.error(msg) - - def __get_next_upgrade_times(self): - """ - Get the next upgrade times - return: Next Normal Upgrade Time, Next Hotfix Upgrade Time - """ - - def get_next_process_time(last_val, frequency): - return now if last_val is None else last_val + frequency - - now = time.time() - next_hotfix_time = get_next_process_time(self._last_hotfix_upgrade_time, conf.get_hotfix_upgrade_frequency()) - next_normal_time = get_next_process_time(self._last_normal_upgrade_time, conf.get_normal_upgrade_frequency()) - - return next_normal_time, next_hotfix_time - - @staticmethod - def __get_agent_upgrade_type(available_agent): - # We follow semantic versioning for the agent, if . is same, then . has changed. - # In this case, we consider it as a Hotfix upgrade. Else we consider it a Normal upgrade. - if available_agent.version.major == CURRENT_VERSION.major and available_agent.version.minor == CURRENT_VERSION.minor: - return AgentUpgradeType.Hotfix - return AgentUpgradeType.Normal - - def __upgrade_agent_if_permitted(self): - """ - Check every 4hrs for a Hotfix Upgrade and 24 hours for a Normal upgrade and upgrade the agent if available. - raises: ExitException when a new upgrade is available in the relevant time window, else returns - """ - - next_normal_time, next_hotfix_time = self.__get_next_upgrade_times() - now = time.time() - # Not permitted to update yet for any of the AgentUpgradeModes - if next_hotfix_time > now and next_normal_time > now: - return - - # Update the last upgrade check time even if no new agent is available for upgrade - self._last_hotfix_upgrade_time = now if next_hotfix_time <= now else self._last_hotfix_upgrade_time - self._last_normal_upgrade_time = now if next_normal_time <= now else self._last_normal_upgrade_time - - available_agent = self.get_latest_agent_greater_than_daemon() - if available_agent is None or available_agent.version <= CURRENT_VERSION: - logger.verbose("No agent upgrade discovered") - return - - upgrade_type = self.__get_agent_upgrade_type(available_agent) - upgrade_message = "{0} Agent upgrade discovered, updating to {1} -- exiting".format(upgrade_type, - available_agent.name) - - if (upgrade_type == AgentUpgradeType.Hotfix and next_hotfix_time <= now) or ( - upgrade_type == AgentUpgradeType.Normal and next_normal_time <= now): - raise AgentUpgradeExitException(upgrade_message) - - def _reset_legacy_blacklisted_agents(self): - # Reset the state of all blacklisted agents that were blacklisted by legacy agents (i.e. not during auto-update) - - # Filter legacy agents which are blacklisted but do not contain a `reason` in their error.json files - # (this flag signifies that this agent was blacklisted by the newer agents). - try: - legacy_blacklisted_agents = [agent for agent in self._load_agents() if - agent.is_blacklisted and agent.error.reason == ''] - for agent in legacy_blacklisted_agents: - agent.clear_error() - except Exception as err: - logger.warn("Unable to reset legacy blacklisted agents due to: {0}".format(err)) - - -class GuestAgent(object): - def __init__(self, path, pkg, protocol, is_fast_track_goal_state): - """ - If 'path' is given, the object is initialized to the version installed under that path. - - If 'pkg' is given, the version specified in the package information is downloaded and the object is - initialized to that version. - - 'is_fast_track_goal_state' and 'protocol' are used only when a package is downloaded. - - NOTE: Prefer using the from_installed_agent and from_agent_package methods instead of calling __init__ directly - """ - self._is_fast_track_goal_state = is_fast_track_goal_state - self.pkg = pkg - self._protocol = protocol - version = None - if path is not None: - m = AGENT_DIR_PATTERN.match(path) - if m is None: - raise UpdateError(u"Illegal agent directory: {0}".format(path)) - version = m.group(1) - elif self.pkg is not None: - version = pkg.version - - if version is None: - raise UpdateError(u"Illegal agent version: {0}".format(version)) - self.version = FlexibleVersion(version) - - location = u"disk" if path is not None else u"package" - logger.verbose(u"Loading Agent {0} from {1}", self.name, location) - - self.error = GuestAgentError(self.get_agent_error_file()) - self.error.load() - - try: - self._ensure_downloaded() - self._ensure_loaded() - except Exception as e: - # If we're unable to download/unpack the agent, delete the Agent directory - try: - if os.path.isdir(self.get_agent_dir()): - shutil.rmtree(self.get_agent_dir(), ignore_errors=True) - except Exception as err: - logger.warn("Unable to delete Agent files: {0}".format(err)) - msg = u"Agent {0} install failed with exception:".format( - self.name) - detailed_msg = '{0} {1}'.format(msg, textutil.format_exception(e)) - add_event( - AGENT_NAME, - version=self.version, - op=WALAEventOperation.Install, - is_success=False, - message=detailed_msg) - - @staticmethod - def from_installed_agent(path): - """ - Creates an instance of GuestAgent using the agent installed in the given 'path'. - """ - return GuestAgent(path, None, None, False) - - @staticmethod - def from_agent_package(package, protocol, is_fast_track_goal_state): - """ - Creates an instance of GuestAgent using the information provided in the 'package'; if that version of the agent is not installed it, it installs it. - """ - return GuestAgent(None, package, protocol, is_fast_track_goal_state) - - @property - def name(self): - return "{0}-{1}".format(AGENT_NAME, self.version) - - def get_agent_cmd(self): - return self.manifest.get_enable_command() - - def get_agent_dir(self): - return os.path.join(conf.get_lib_dir(), self.name) - - def get_agent_error_file(self): - return os.path.join(conf.get_lib_dir(), self.name, AGENT_ERROR_FILE) - - def get_agent_manifest_path(self): - return os.path.join(self.get_agent_dir(), AGENT_MANIFEST_FILE) - - def get_agent_pkg_path(self): - return ".".join((os.path.join(conf.get_lib_dir(), self.name), "zip")) - - def clear_error(self): - self.error.clear() - self.error.save() - - @property - def is_available(self): - return self.is_downloaded and not self.is_blacklisted - - @property - def is_blacklisted(self): - return self.error is not None and self.error.is_blacklisted - - @property - def is_downloaded(self): - return self.is_blacklisted or \ - os.path.isfile(self.get_agent_manifest_path()) - - def mark_failure(self, is_fatal=False, reason=''): - try: - if not os.path.isdir(self.get_agent_dir()): - os.makedirs(self.get_agent_dir()) - self.error.mark_failure(is_fatal=is_fatal, reason=reason) - self.error.save() - if self.error.is_blacklisted: - msg = u"Agent {0} is permanently blacklisted".format(self.name) - logger.warn(msg) - add_event(op=WALAEventOperation.AgentBlacklisted, is_success=False, message=msg, log_event=False, - version=self.version) - except Exception as e: - logger.warn(u"Agent {0} failed recording error state: {1}", self.name, ustr(e)) - - def _ensure_downloaded(self): - logger.verbose(u"Ensuring Agent {0} is downloaded", self.name) - - if self.is_downloaded: - logger.verbose(u"Agent {0} was previously downloaded - skipping download", self.name) - return - - if self.pkg is None: - raise UpdateError(u"Agent {0} is missing package and download URIs".format( - self.name)) - - self._download() - - msg = u"Agent {0} downloaded successfully".format(self.name) - logger.verbose(msg) - add_event( - AGENT_NAME, - version=self.version, - op=WALAEventOperation.Install, - is_success=True, - message=msg) - - def _ensure_loaded(self): - self._load_manifest() - self._load_error() - - def _download(self): - try: - self._protocol.client.download_zip_package("agent package", self.pkg.uris, self.get_agent_pkg_path(), self.get_agent_dir(), use_verify_header=self._is_fast_track_goal_state) - except Exception as exception: - msg = "Unable to download Agent {0}: {1}".format(self.name, ustr(exception)) - add_event( - AGENT_NAME, - op=WALAEventOperation.Download, - version=CURRENT_VERSION, - is_success=False, - message=msg) - raise UpdateError(msg) - - def _load_error(self): - try: - self.error = GuestAgentError(self.get_agent_error_file()) - self.error.load() - logger.verbose(u"Agent {0} error state: {1}", self.name, ustr(self.error)) - except Exception as e: - logger.warn(u"Agent {0} failed loading error state: {1}", self.name, ustr(e)) - - def _load_manifest(self): - path = self.get_agent_manifest_path() - if not os.path.isfile(path): - msg = u"Agent {0} is missing the {1} file".format(self.name, AGENT_MANIFEST_FILE) - raise UpdateError(msg) - - with open(path, "r") as manifest_file: - try: - manifests = json.load(manifest_file) - except Exception as e: - msg = u"Agent {0} has a malformed {1} ({2})".format(self.name, AGENT_MANIFEST_FILE, ustr(e)) - raise UpdateError(msg) - if type(manifests) is list: - if len(manifests) <= 0: - msg = u"Agent {0} has an empty {1}".format(self.name, AGENT_MANIFEST_FILE) - raise UpdateError(msg) - manifest = manifests[0] - else: - manifest = manifests - - try: - self.manifest = HandlerManifest(manifest) # pylint: disable=W0201 - if len(self.manifest.get_enable_command()) <= 0: - raise Exception(u"Manifest is missing the enable command") - except Exception as e: - msg = u"Agent {0} has an illegal {1}: {2}".format( - self.name, - AGENT_MANIFEST_FILE, - ustr(e)) - raise UpdateError(msg) - - logger.verbose( - u"Agent {0} loaded manifest from {1}", - self.name, - self.get_agent_manifest_path()) - logger.verbose(u"Successfully loaded Agent {0} {1}: {2}", - self.name, - AGENT_MANIFEST_FILE, - ustr(self.manifest.data)) - return - - -class GuestAgentError(object): - def __init__(self, path): - self.last_failure = 0.0 - self.was_fatal = False - if path is None: - raise UpdateError(u"GuestAgentError requires a path") - self.path = path - self.failure_count = 0 - self.reason = '' - - self.clear() - return - - def mark_failure(self, is_fatal=False, reason=''): - self.last_failure = time.time() - self.failure_count += 1 - self.was_fatal = is_fatal - self.reason = reason - return - - def clear(self): - self.last_failure = 0.0 - self.failure_count = 0 - self.was_fatal = False - self.reason = '' - return - - @property - def is_blacklisted(self): - return self.was_fatal or self.failure_count >= MAX_FAILURE - - def load(self): - if self.path is not None and os.path.isfile(self.path): - try: - with open(self.path, 'r') as f: - self.from_json(json.load(f)) - except Exception as error: - # The error.json file is only supposed to be written only by the agent. - # If for whatever reason the file is malformed, just delete it to reset state of the errors. - logger.warn( - "Ran into error when trying to load error file {0}, deleting it to clean state. Error: {1}".format( - self.path, textutil.format_exception(error))) - try: - os.remove(self.path) - except Exception: - # We try best case efforts to delete the file, ignore error if we're unable to do so - pass - return - - def save(self): - if os.path.isdir(os.path.dirname(self.path)): - with open(self.path, 'w') as f: - json.dump(self.to_json(), f) - return - - def from_json(self, data): - self.last_failure = max(self.last_failure, data.get(u"last_failure", 0.0)) - self.failure_count = max(self.failure_count, data.get(u"failure_count", 0)) - self.was_fatal = self.was_fatal or data.get(u"was_fatal", False) - reason = data.get(u"reason", '') - self.reason = reason if reason != '' else self.reason - return - - def to_json(self): - data = { - u"last_failure": self.last_failure, - u"failure_count": self.failure_count, - u"was_fatal": self.was_fatal, - u"reason": ustr(self.reason) - } - return data - - def __str__(self): - return "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( - self.last_failure, - self.failure_count, - self.was_fatal, - self.reason) diff --git a/azurelinuxagent/pa/deprovision/default.py b/azurelinuxagent/pa/deprovision/default.py index 89492b75e..35b4ae82e 100644 --- a/azurelinuxagent/pa/deprovision/default.py +++ b/azurelinuxagent/pa/deprovision/default.py @@ -26,11 +26,11 @@ import azurelinuxagent.common.conf as conf import azurelinuxagent.common.utils.fileutil as fileutil from azurelinuxagent.common import version -from azurelinuxagent.common.cgroupconfigurator import _AGENT_DROP_IN_FILE_SLICE, _DROP_IN_FILE_CPU_ACCOUNTING, \ +from azurelinuxagent.ga.cgroupconfigurator import _AGENT_DROP_IN_FILE_SLICE, _DROP_IN_FILE_CPU_ACCOUNTING, \ _DROP_IN_FILE_CPU_QUOTA, _DROP_IN_FILE_MEMORY_ACCOUNTING, LOGCOLLECTOR_SLICE from azurelinuxagent.common.exception import ProtocolError from azurelinuxagent.common.osutil import get_osutil, systemd -from azurelinuxagent.common.persist_firewall_rules import PersistFirewallRulesHandler +from azurelinuxagent.ga.persist_firewall_rules import PersistFirewallRulesHandler from azurelinuxagent.common.protocol.util import get_protocol_util from azurelinuxagent.ga.exthandlers import HANDLER_COMPLETE_NAME_PATTERN @@ -131,6 +131,10 @@ def del_dhcp_lease(self, warnings, actions): actions.append(DeprovisionAction(fileutil.rm_files, ["/var/lib/NetworkManager/dhclient-*.lease"])) + # For Ubuntu >= 18.04, using systemd-networkd + actions.append(DeprovisionAction(fileutil.rm_files, + ["/run/systemd/netif/leases/*"])) + def del_ext_handler_files(self, warnings, actions): # pylint: disable=W0613 ext_dirs = [d for d in os.listdir(conf.get_lib_dir()) if os.path.isdir(os.path.join(conf.get_lib_dir(), d)) @@ -154,7 +158,11 @@ def del_lib_dir_files(self, warnings, actions): # pylint: disable=W0613 'partition', 'Protocol', 'SharedConfig.xml', - 'WireServerEndpoint' + 'WireServerEndpoint', + 'published_hostname', + 'fast_track.json', + 'initial_goal_state', + 'rsm_update.json' ] known_files_glob = [ 'Extensions.*.xml', diff --git a/azurelinuxagent/pa/provision/default.py b/azurelinuxagent/pa/provision/default.py index 91fe04eda..a872d70fd 100644 --- a/azurelinuxagent/pa/provision/default.py +++ b/azurelinuxagent/pa/provision/default.py @@ -172,9 +172,11 @@ def check_provisioned_file(self): s = fileutil.read_file(ProvisionHandler.provisioned_file_path()).strip() if not self.osutil.is_current_instance_id(s): if len(s) > 0: - logger.warn("VM is provisioned, " - "but the VM unique identifier has changed -- " - "clearing cached state") + msg = "VM is provisioned, but the VM unique identifier has changed. This indicates the VM may be " \ + "created from an image that was not properly deprovisioned or generalized, which can result in " \ + "unexpected behavior from the guest agent -- clearing cached state" + logger.warn(msg) + self.report_event(msg) from azurelinuxagent.pa.deprovision \ import get_deprovision_handler deprovision_handler = get_deprovision_handler() diff --git a/azurelinuxagent/pa/rdma/centos.py b/azurelinuxagent/pa/rdma/centos.py index 87e2eff74..5e82acf53 100644 --- a/azurelinuxagent/pa/rdma/centos.py +++ b/azurelinuxagent/pa/rdma/centos.py @@ -23,7 +23,7 @@ import time import azurelinuxagent.common.logger as logger import azurelinuxagent.common.utils.shellutil as shellutil -from azurelinuxagent.common.rdma import RDMAHandler +from azurelinuxagent.pa.rdma.rdma import RDMAHandler class CentOSRDMAHandler(RDMAHandler): diff --git a/azurelinuxagent/pa/rdma/factory.py b/azurelinuxagent/pa/rdma/factory.py index c114dc380..ec4a8bc48 100644 --- a/azurelinuxagent/pa/rdma/factory.py +++ b/azurelinuxagent/pa/rdma/factory.py @@ -18,7 +18,7 @@ from distutils.version import LooseVersion as Version # pylint: disable=no-name-in-module, import-error import azurelinuxagent.common.logger as logger -from azurelinuxagent.common.rdma import RDMAHandler +from azurelinuxagent.pa.rdma.rdma import RDMAHandler from azurelinuxagent.common.version import DISTRO_FULL_NAME, DISTRO_VERSION from .centos import CentOSRDMAHandler from .suse import SUSERDMAHandler diff --git a/azurelinuxagent/common/rdma.py b/azurelinuxagent/pa/rdma/rdma.py similarity index 97% rename from azurelinuxagent/common/rdma.py rename to azurelinuxagent/pa/rdma/rdma.py index 299b1a8a5..aabd05541 100644 --- a/azurelinuxagent/common/rdma.py +++ b/azurelinuxagent/pa/rdma/rdma.py @@ -419,28 +419,33 @@ def update_iboip_interfaces(self, mac_ip_array): @staticmethod def update_iboip_interface(ipv4_addr, timeout_sec, check_interval_sec): - logger.info("Wait for ib0 become available") + logger.info("Wait for ib become available") total_retries = timeout_sec / check_interval_sec n = 0 - found_ib0 = None - while not found_ib0 and n < total_retries: + found_ib = None + while not found_ib and n < total_retries: ret, output = shellutil.run_get_output("ifconfig -a") if ret != 0: raise Exception("Failed to list network interfaces") - found_ib0 = re.search("ib0", output, re.IGNORECASE) - if found_ib0: + found_ib = re.search(r"(ib\S+):", output, re.IGNORECASE) + if found_ib: break time.sleep(check_interval_sec) n += 1 - if not found_ib0: - raise Exception("ib0 is not available") + if not found_ib: + raise Exception("ib is not available") + + ibname = found_ib.groups()[0] + if shellutil.run("ifconfig {0} up".format(ibname)) != 0: + raise Exception("Could not run ifconfig {0} up".format(ibname)) netmask = 16 logger.info("RDMA: configuring IPv4 addr and netmask on ipoib interface") addr = '{0}/{1}'.format(ipv4_addr, netmask) - if shellutil.run("ifconfig ib0 {0}".format(addr)) != 0: - raise Exception("Could set addr to {0} on ib0".format(addr)) + if shellutil.run("ifconfig {0} {1}".format(ibname, addr)) != 0: + raise Exception("Could not set addr to {0} on {1}".format(addr, ibname)) + logger.info("RDMA: ipoib address and netmask configured on interface") @staticmethod diff --git a/azurelinuxagent/pa/rdma/suse.py b/azurelinuxagent/pa/rdma/suse.py index 66e8b3720..bcf971482 100644 --- a/azurelinuxagent/pa/rdma/suse.py +++ b/azurelinuxagent/pa/rdma/suse.py @@ -21,7 +21,7 @@ import azurelinuxagent.common.logger as logger import azurelinuxagent.common.utils.shellutil as shellutil -from azurelinuxagent.common.rdma import RDMAHandler +from azurelinuxagent.pa.rdma.rdma import RDMAHandler from azurelinuxagent.common.version import DISTRO_VERSION from distutils.version import LooseVersion as Version diff --git a/azurelinuxagent/pa/rdma/ubuntu.py b/azurelinuxagent/pa/rdma/ubuntu.py index a56a4be4e..bef152f2e 100644 --- a/azurelinuxagent/pa/rdma/ubuntu.py +++ b/azurelinuxagent/pa/rdma/ubuntu.py @@ -24,7 +24,7 @@ import azurelinuxagent.common.conf as conf import azurelinuxagent.common.logger as logger import azurelinuxagent.common.utils.shellutil as shellutil -from azurelinuxagent.common.rdma import RDMAHandler +from azurelinuxagent.pa.rdma.rdma import RDMAHandler class UbuntuRDMAHandler(RDMAHandler): diff --git a/config/alpine/waagent.conf b/config/alpine/waagent.conf index d813ee5ca..a8620b5c4 100644 --- a/config/alpine/waagent.conf +++ b/config/alpine/waagent.conf @@ -75,7 +75,11 @@ OS.OpensslPath=None OS.SshDir=/etc/ssh # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# TODO: Update the wiki link and point to readme page or public facing doc +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/arch/waagent.conf b/config/arch/waagent.conf index ef914e9f8..947da9ba6 100644 --- a/config/arch/waagent.conf +++ b/config/arch/waagent.conf @@ -100,7 +100,10 @@ OS.SshDir=/etc/ssh # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/bigip/waagent.conf b/config/bigip/waagent.conf index fe56c4d58..2bed138b9 100644 --- a/config/bigip/waagent.conf +++ b/config/bigip/waagent.conf @@ -82,7 +82,10 @@ OS.SshdConfigPath=/config/ssh/sshd_config OS.EnableRDMA=n # Enable or disable goal state processing auto-update, default is enabled -AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/clearlinux/waagent.conf b/config/clearlinux/waagent.conf index 0b70d2621..61d3830df 100644 --- a/config/clearlinux/waagent.conf +++ b/config/clearlinux/waagent.conf @@ -73,8 +73,12 @@ OS.OpensslPath=None # Set the path to SSH keys and configuration files OS.SshDir=/etc/ssh -# Enable or disable self-update, default is enabled -AutoUpdate.Enabled=y +# Enable or disable goal state processing auto-update, default is enabled +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y + AutoUpdate.GAFamily=Prod # Determine if the overprovisioning feature is enabled. If yes, hold extension diff --git a/config/coreos/waagent.conf b/config/coreos/waagent.conf index 003482ab0..a7b217403 100644 --- a/config/coreos/waagent.conf +++ b/config/coreos/waagent.conf @@ -104,7 +104,10 @@ OS.OpensslPath=None # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/debian/waagent.conf b/config/debian/waagent.conf index dfd7afcd6..40a92b92b 100644 --- a/config/debian/waagent.conf +++ b/config/debian/waagent.conf @@ -110,7 +110,10 @@ OS.SshDir=/etc/ssh # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/devuan/waagent.conf b/config/devuan/waagent.conf index be80edbd4..53b0a85bf 100644 --- a/config/devuan/waagent.conf +++ b/config/devuan/waagent.conf @@ -104,7 +104,10 @@ OS.SshDir=/etc/ssh # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/freebsd/waagent.conf b/config/freebsd/waagent.conf index c917d16c5..6774b8fdd 100644 --- a/config/freebsd/waagent.conf +++ b/config/freebsd/waagent.conf @@ -102,7 +102,10 @@ OS.SudoersDir=/usr/local/etc/sudoers.d # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/gaia/waagent.conf b/config/gaia/waagent.conf index 0e171d28b..fa915f722 100644 --- a/config/gaia/waagent.conf +++ b/config/gaia/waagent.conf @@ -101,8 +101,15 @@ OS.SshDir=/etc/ssh OS.EnableRDMA=n # Enable or disable goal state processing auto-update, default is enabled +# When turned off, it reverts to the pre-installed agent that comes with image +# AutoUpdate.Enabled is a legacy parameter used only for backwards compatibility. We encourage users to transition to new option AutoUpdate.UpdateToLatestVersion +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details AutoUpdate.Enabled=n +# Enable or disable goal state processing auto-update, default is enabled +# When turned off, it remains on latest version installed on the vm +# AutoUpdate.UpdateToLatestVersion=y + # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/iosxe/waagent.conf b/config/iosxe/waagent.conf index 764058986..88ed14c47 100644 --- a/config/iosxe/waagent.conf +++ b/config/iosxe/waagent.conf @@ -100,7 +100,10 @@ OS.SshDir=/etc/ssh # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/mariner/waagent.conf b/config/mariner/waagent.conf index dbd9e14a8..05eb129f0 100644 --- a/config/mariner/waagent.conf +++ b/config/mariner/waagent.conf @@ -75,8 +75,12 @@ OS.OpensslPath=None # Set the path to SSH keys and configuration files OS.SshDir=/etc/ssh -# Enable or disable self-update, default is enabled -AutoUpdate.Enabled=y +# Enable or disable goal state processing auto-update, default is enabled +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y + AutoUpdate.GAFamily=Prod # Determine if the overprovisioning feature is enabled. If yes, hold extension diff --git a/config/nsbsd/waagent.conf b/config/nsbsd/waagent.conf index 9d0ce74d8..8b04a410a 100644 --- a/config/nsbsd/waagent.conf +++ b/config/nsbsd/waagent.conf @@ -80,7 +80,7 @@ OS.SudoersDir=/usr/local/etc/sudoers.d # DetectScvmmEnv=n # -Lib.Dir=/usr/Firewall/var/waagent +Lib.Dir=/usr/Firewall/lib/waagent # # DVD.MountPoint=/mnt/cdrom/secure @@ -98,8 +98,15 @@ Extension.LogDir=/log/azure # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled +# When turned off, it reverts to the pre-installed agent that comes with image +# AutoUpdate.Enabled is a legacy parameter used only for backwards compatibility. We encourage users to transition to new option AutoUpdate.UpdateToLatestVersion +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details AutoUpdate.Enabled=n +# Enable or disable goal state processing auto-update, default is enabled +# When turned off, it remains on latest version installed on the vm +# AutoUpdate.UpdateToLatestVersion=y + # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/openbsd/waagent.conf b/config/openbsd/waagent.conf index a644d5d69..c0bc8ed14 100644 --- a/config/openbsd/waagent.conf +++ b/config/openbsd/waagent.conf @@ -96,7 +96,10 @@ OS.PasswordPath=/etc/master.passwd # OS.EnableRDMA=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/photonos/waagent.conf b/config/photonos/waagent.conf index 65da1313c..05227f6bc 100644 --- a/config/photonos/waagent.conf +++ b/config/photonos/waagent.conf @@ -70,8 +70,12 @@ OS.OpensslPath=None # Set the path to SSH keys and configuration files OS.SshDir=/etc/ssh -# Enable or disable self-update, default is enabled -AutoUpdate.Enabled=y +# Enable or disable goal state processing auto-update, default is enabled +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y + AutoUpdate.GAFamily=Prod # Determine if the overprovisioning feature is enabled. If yes, hold extension diff --git a/config/suse/waagent.conf b/config/suse/waagent.conf index c617f9af8..9e6369a87 100644 --- a/config/suse/waagent.conf +++ b/config/suse/waagent.conf @@ -113,7 +113,10 @@ OS.SshDir=/etc/ssh # OS.CheckRdmaDriver=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/ubuntu/waagent.conf b/config/ubuntu/waagent.conf index 19b56bae4..286933ce5 100644 --- a/config/ubuntu/waagent.conf +++ b/config/ubuntu/waagent.conf @@ -101,7 +101,10 @@ OS.SshDir=/etc/ssh # OS.CheckRdmaDriver=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/config/waagent.conf b/config/waagent.conf index 7316dc2da..3c9ad5d4c 100644 --- a/config/waagent.conf +++ b/config/waagent.conf @@ -122,7 +122,10 @@ OS.SshDir=/etc/ssh # OS.CheckRdmaDriver=y # Enable or disable goal state processing auto-update, default is enabled -# AutoUpdate.Enabled=y +# When turned off, it remains on latest version installed on the vm +# Added this new option AutoUpdate.UpdateToLatestVersion in place of AutoUpdate.Enabled, and encourage users to transition to this new option +# See wiki[https://github.com/Azure/WALinuxAgent/wiki/FAQ#autoupdateenabled-vs-autoupdateupdatetolatestversion] for more details +# AutoUpdate.UpdateToLatestVersion=y # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/makepkg.py b/makepkg.py index 5ec04d5d8..bc4aad4c3 100755 --- a/makepkg.py +++ b/makepkg.py @@ -8,8 +8,9 @@ import subprocess import sys -from azurelinuxagent.common.version import AGENT_NAME, AGENT_VERSION, AGENT_LONG_VERSION -from azurelinuxagent.ga.update import AGENT_MANIFEST_FILE +from azurelinuxagent.common.version import AGENT_NAME, AGENT_VERSION, \ + AGENT_LONG_VERSION +from azurelinuxagent.ga.guestagent import AGENT_MANIFEST_FILE MANIFEST = '''[{{ "name": "{0}", diff --git a/setup.py b/setup.py index 8f5d92b42..6b54d09e7 100755 --- a/setup.py +++ b/setup.py @@ -319,7 +319,7 @@ def run(self): # implementation may be broken prior to Python 3.7 wher the functionality # will be removed from Python 3 requires = [] # pylint: disable=invalid-name -if float(sys.version[:3]) >= 3.7: +if sys.version_info[0] >= 3 and sys.version_info[1] >= 7: requires = ['distro'] # pylint: disable=invalid-name modules = [] # pylint: disable=invalid-name diff --git a/test-requirements.txt b/test-requirements.txt index 3c54ab997..2b9467870 100644 --- a/test-requirements.txt +++ b/test-requirements.txt @@ -1,4 +1,3 @@ -codecov coverage mock==2.0.0; python_version == '2.6' mock==3.0.5; python_version >= '2.7' and python_version <= '3.5' @@ -18,4 +17,7 @@ assertpy azure-core azure-identity azure-mgmt-compute>=22.1.0 +azure-mgmt-network>=19.3.0 azure-mgmt-resource>=15.0.0 +msrestazure +pytz diff --git a/tests/common/dhcp/test_dhcp.py b/tests/common/dhcp/test_dhcp.py index b4eece5c2..dda28985a 100644 --- a/tests/common/dhcp/test_dhcp.py +++ b/tests/common/dhcp/test_dhcp.py @@ -18,7 +18,7 @@ import mock import azurelinuxagent.common.dhcp as dhcp import azurelinuxagent.common.osutil.default as osutil -from tests.tools import AgentTestCase, open_patch, patch +from tests.lib.tools import AgentTestCase, open_patch, patch class TestDHCP(AgentTestCase): diff --git a/tests/common/osutil/test_alpine.py b/tests/common/osutil/test_alpine.py index d2eb36114..ec669cf3e 100644 --- a/tests/common/osutil/test_alpine.py +++ b/tests/common/osutil/test_alpine.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.alpine import AlpineOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_arch.py b/tests/common/osutil/test_arch.py index 1133eae27..67ada5e54 100644 --- a/tests/common/osutil/test_arch.py +++ b/tests/common/osutil/test_arch.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.arch import ArchUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_bigip.py b/tests/common/osutil/test_bigip.py index 421d4d920..7312f3ae5 100644 --- a/tests/common/osutil/test_bigip.py +++ b/tests/common/osutil/test_bigip.py @@ -26,7 +26,7 @@ import azurelinuxagent.common.utils.shellutil as shellutil from azurelinuxagent.common.exception import OSUtilError from azurelinuxagent.common.osutil.bigip import BigIpOSUtil -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_clearlinux.py b/tests/common/osutil/test_clearlinux.py index a7d75722f..4824c9551 100644 --- a/tests/common/osutil/test_clearlinux.py +++ b/tests/common/osutil/test_clearlinux.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.clearlinux import ClearLinuxUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_coreos.py b/tests/common/osutil/test_coreos.py index dca06fa15..36e398738 100644 --- a/tests/common/osutil/test_coreos.py +++ b/tests/common/osutil/test_coreos.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.coreos import CoreOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_default.py b/tests/common/osutil/test_default.py index ab4fa5c99..68bd282d7 100644 --- a/tests/common/osutil/test_default.py +++ b/tests/common/osutil/test_default.py @@ -34,8 +34,8 @@ from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common.utils.flexible_version import FlexibleVersion from azurelinuxagent.common.utils.networkutil import AddFirewallRules -from tests.common.mock_environment import MockEnvironment -from tests.tools import AgentTestCase, patch, open_patch, load_data, data_dir, is_python_version_26_or_34, skip_if_predicate_true +from tests.lib.mock_environment import MockEnvironment +from tests.lib.tools import AgentTestCase, patch, open_patch, load_data, data_dir, is_python_version_26_or_34, skip_if_predicate_true actual_get_proc_net_route = 'azurelinuxagent.common.osutil.default.DefaultOSUtil._get_proc_net_route' @@ -298,7 +298,7 @@ def test_no_primary_does_not_throw(self): def test_dhcp_lease_default(self): self.assertTrue(osutil.DefaultOSUtil().get_dhcp_lease_endpoint() is None) - def test_dhcp_lease_ubuntu(self): + def test_dhcp_lease_older_ubuntu(self): with patch.object(glob, "glob", return_value=['/var/lib/dhcp/dhclient.eth0.leases']): with patch(open_patch(), mock.mock_open(read_data=load_data("dhcp.leases"))): endpoint = get_osutil(distro_name='ubuntu', distro_version='12.04').get_dhcp_lease_endpoint() # pylint: disable=assignment-from-none @@ -313,6 +313,20 @@ def test_dhcp_lease_ubuntu(self): self.assertTrue(endpoint is not None) self.assertEqual(endpoint, "168.63.129.16") + endpoint = get_osutil(distro_name='ubuntu', distro_version='18.04').get_dhcp_lease_endpoint() # pylint: disable=assignment-from-none + self.assertTrue(endpoint is None) + + def test_dhcp_lease_newer_ubuntu(self): + with patch.object(glob, "glob", return_value=['/run/systemd/netif/leases/2']): + with patch(open_patch(), mock.mock_open(read_data=load_data("2"))): + endpoint = get_osutil(distro_name='ubuntu', distro_version='18.04').get_dhcp_lease_endpoint() # pylint: disable=assignment-from-none + self.assertTrue(endpoint is not None) + self.assertEqual(endpoint, "168.63.129.16") + + endpoint = get_osutil(distro_name='ubuntu', distro_version='20.04').get_dhcp_lease_endpoint() # pylint: disable=assignment-from-none + self.assertTrue(endpoint is not None) + self.assertEqual(endpoint, "168.63.129.16") + def test_dhcp_lease_custom_dns(self): """ Validate that the wireserver address is coming from option 245 @@ -687,7 +701,7 @@ def mock_popen(command, *args, **kwargs): return mock_popen.original(command, *args, **kwargs) mock_popen.original = subprocess.Popen - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen) as popen_patcher: + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", side_effect=mock_popen) as popen_patcher: with patch('os.getuid', return_value=uid): popen_patcher.wait = wait popen_patcher.destination = destination @@ -910,7 +924,7 @@ def mock_popen(command, *args, **kwargs): return mock_popen.original(command, *args, **kwargs) mock_popen.original = subprocess.Popen - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen): + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", side_effect=mock_popen): success = osutil.DefaultOSUtil().remove_firewall(mock_iptables.destination, mock_iptables.uid, mock_iptables.wait) delete_conntrack_accept_command = TestOSUtil._command_to_string(osutil.get_firewall_delete_conntrack_accept_command(mock_iptables.wait, mock_iptables.destination)) diff --git a/tests/common/osutil/test_default_osutil.py b/tests/common/osutil/test_default_osutil.py index 070f1d653..1b94dd5ca 100644 --- a/tests/common/osutil/test_default_osutil.py +++ b/tests/common/osutil/test_default_osutil.py @@ -16,7 +16,7 @@ # from azurelinuxagent.common.osutil.default import DefaultOSUtil, shellutil # pylint: disable=unused-import -from tests.tools import AgentTestCase, patch # pylint: disable=unused-import +from tests.lib.tools import AgentTestCase, patch # pylint: disable=unused-import class DefaultOsUtilTestCase(AgentTestCase): diff --git a/tests/common/osutil/test_factory.py b/tests/common/osutil/test_factory.py index 7bd729c3b..46bf6a875 100644 --- a/tests/common/osutil/test_factory.py +++ b/tests/common/osutil/test_factory.py @@ -34,7 +34,7 @@ from azurelinuxagent.common.osutil.suse import SUSEOSUtil, SUSE11OSUtil from azurelinuxagent.common.osutil.ubuntu import UbuntuOSUtil, Ubuntu12OSUtil, Ubuntu14OSUtil, \ UbuntuSnappyOSUtil, Ubuntu16OSUtil, Ubuntu18OSUtil -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class TestOsUtilFactory(AgentTestCase): @@ -98,6 +98,13 @@ def test_get_osutil_it_should_return_ubuntu(self): self.assertTrue(isinstance(ret, Ubuntu18OSUtil)) self.assertEqual(ret.get_service_name(), "walinuxagent") + ret = _get_osutil(distro_name="ubuntu", + distro_code_name="focal", + distro_version="24.04", + distro_full_name="") + self.assertTrue(isinstance(ret, Ubuntu18OSUtil)) + self.assertEqual(ret.get_service_name(), "walinuxagent") + ret = _get_osutil(distro_name="ubuntu", distro_code_name="", distro_version="10.04", diff --git a/tests/common/osutil/test_freebsd.py b/tests/common/osutil/test_freebsd.py index 385183601..0236b4719 100644 --- a/tests/common/osutil/test_freebsd.py +++ b/tests/common/osutil/test_freebsd.py @@ -20,7 +20,7 @@ import azurelinuxagent.common.utils.shellutil as shellutil from azurelinuxagent.common.osutil.freebsd import FreeBSDOSUtil from azurelinuxagent.common.utils import textutil -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_nsbsd.py b/tests/common/osutil/test_nsbsd.py index 4e97f7444..37d79e61a 100644 --- a/tests/common/osutil/test_nsbsd.py +++ b/tests/common/osutil/test_nsbsd.py @@ -19,7 +19,7 @@ from azurelinuxagent.common.osutil.nsbsd import NSBSDOSUtil from azurelinuxagent.common.utils.fileutil import read_file -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class TestNSBSDOSUtil(AgentTestCase): diff --git a/tests/common/osutil/test_openbsd.py b/tests/common/osutil/test_openbsd.py index e82a1d8e4..666e4efab 100644 --- a/tests/common/osutil/test_openbsd.py +++ b/tests/common/osutil/test_openbsd.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.openbsd import OpenBSDOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_openwrt.py b/tests/common/osutil/test_openwrt.py index 05620ff4d..e204cae1f 100644 --- a/tests/common/osutil/test_openwrt.py +++ b/tests/common/osutil/test_openwrt.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.openwrt import OpenWRTOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_photonos.py b/tests/common/osutil/test_photonos.py index f63e7c2f9..506025e2e 100644 --- a/tests/common/osutil/test_photonos.py +++ b/tests/common/osutil/test_photonos.py @@ -18,7 +18,7 @@ import unittest from azurelinuxagent.common.osutil.photonos import PhotonOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_redhat.py b/tests/common/osutil/test_redhat.py index dfd5e07a8..3c4787fc1 100644 --- a/tests/common/osutil/test_redhat.py +++ b/tests/common/osutil/test_redhat.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.redhat import Redhat6xOSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_suse.py b/tests/common/osutil/test_suse.py index 8fd6141be..1e752ca72 100644 --- a/tests/common/osutil/test_suse.py +++ b/tests/common/osutil/test_suse.py @@ -17,7 +17,7 @@ import unittest from azurelinuxagent.common.osutil.suse import SUSE11OSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/common/osutil/test_ubuntu.py b/tests/common/osutil/test_ubuntu.py index f386fb7c7..24ce7b7f6 100644 --- a/tests/common/osutil/test_ubuntu.py +++ b/tests/common/osutil/test_ubuntu.py @@ -18,7 +18,7 @@ import unittest from azurelinuxagent.common.osutil.ubuntu import Ubuntu12OSUtil, Ubuntu18OSUtil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase from .test_default import osutil_get_dhcp_pid_should_return_a_list_of_pids diff --git a/tests/distro/__init__.py b/tests/common/protocol/__init__.py similarity index 100% rename from tests/distro/__init__.py rename to tests/common/protocol/__init__.py diff --git a/tests/protocol/test_datacontract.py b/tests/common/protocol/test_datacontract.py similarity index 100% rename from tests/protocol/test_datacontract.py rename to tests/common/protocol/test_datacontract.py diff --git a/tests/protocol/test_extensions_goal_state_from_extensions_config.py b/tests/common/protocol/test_extensions_goal_state_from_extensions_config.py similarity index 54% rename from tests/protocol/test_extensions_goal_state_from_extensions_config.py rename to tests/common/protocol/test_extensions_goal_state_from_extensions_config.py index 5af0aa288..2a9acff65 100644 --- a/tests/protocol/test_extensions_goal_state_from_extensions_config.py +++ b/tests/common/protocol/test_extensions_goal_state_from_extensions_config.py @@ -2,20 +2,20 @@ # Licensed under the Apache License. from azurelinuxagent.common.AgentGlobals import AgentGlobals from azurelinuxagent.common.protocol.extensions_goal_state import GoalStateChannel -from tests.protocol.mocks import mockwiredata, mock_wire_protocol -from tests.tools import AgentTestCase +from tests.lib.mock_wire_protocol import wire_protocol_data, mock_wire_protocol +from tests.lib.tools import AgentTestCase class ExtensionsGoalStateFromExtensionsConfigTestCase(AgentTestCase): def test_it_should_parse_in_vm_metadata(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_META_DATA) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_META_DATA) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertEqual("555e551c-600e-4fb4-90ba-8ab8ec28eccc", extensions_goal_state.activity_id, "Incorrect activity Id") self.assertEqual("400de90b-522e-491f-9d89-ec944661f531", extensions_goal_state.correlation_id, "Incorrect correlation Id") self.assertEqual('2020-11-09T17:48:50.412125Z', extensions_goal_state.created_on_timestamp, "Incorrect GS Creation time") def test_it_should_use_default_values_when_in_vm_metadata_is_missing(self): - data_file = mockwiredata.DATA_FILE.copy() + data_file = wire_protocol_data.DATA_FILE.copy() data_file["ext_conf"] = "wire/ext_conf-no_gs_metadata.xml" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -24,14 +24,14 @@ def test_it_should_use_default_values_when_in_vm_metadata_is_missing(self): self.assertEqual('1900-01-01T00:00:00.000000Z', extensions_goal_state.created_on_timestamp, "Incorrect GS Creation time") def test_it_should_use_default_values_when_in_vm_metadata_is_invalid(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_INVALID_VM_META_DATA) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_INVALID_VM_META_DATA) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertEqual(AgentGlobals.GUID_ZERO, extensions_goal_state.activity_id, "Incorrect activity Id") self.assertEqual(AgentGlobals.GUID_ZERO, extensions_goal_state.correlation_id, "Incorrect correlation Id") self.assertEqual('1900-01-01T00:00:00.000000Z', extensions_goal_state.created_on_timestamp, "Incorrect GS Creation time") def test_it_should_parse_missing_status_upload_blob_as_none(self): - data_file = mockwiredata.DATA_FILE.copy() + data_file = wire_protocol_data.DATA_FILE.copy() data_file["ext_conf"] = "hostgaplugin/ext_conf-no_status_upload_blob.xml" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -40,14 +40,14 @@ def test_it_should_parse_missing_status_upload_blob_as_none(self): self.assertEqual("BlockBlob", extensions_goal_state.status_upload_blob_type, "Expected status upload blob to be Block") def test_it_should_default_to_block_blob_when_the_status_blob_type_is_not_valid(self): - data_file = mockwiredata.DATA_FILE.copy() + data_file = wire_protocol_data.DATA_FILE.copy() data_file["ext_conf"] = "hostgaplugin/ext_conf-invalid_blob_type.xml" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertEqual("BlockBlob", extensions_goal_state.status_upload_blob_type, 'Expected BlockBlob for an invalid statusBlobType') def test_it_should_parse_empty_depends_on_as_dependency_level_0(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-empty_depends_on.json" data_file["ext_conf"] = "hostgaplugin/ext_conf-empty_depends_on.xml" with mock_wire_protocol(data_file) as protocol: @@ -56,7 +56,47 @@ def test_it_should_parse_empty_depends_on_as_dependency_level_0(self): self.assertEqual(0, extensions[0].settings[0].dependencyLevel, "Incorrect dependencyLevel") def test_its_source_channel_should_be_wire_server(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertEqual(GoalStateChannel.WireServer, extensions_goal_state.channel, "The channel is incorrect") + + def test_it_should_parse_is_version_from_rsm_properly(self): + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertIsNone(family.is_version_from_rsm, "is_version_from_rsm should be None") + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "hostgaplugin/ext_conf-agent_family_version.xml" + with mock_wire_protocol(data_file) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertTrue(family.is_version_from_rsm, "is_version_from_rsm should be True") + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "hostgaplugin/ext_conf-rsm_version_properties_false.xml" + with mock_wire_protocol(data_file) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertFalse(family.is_version_from_rsm, "is_version_from_rsm should be False") + + def test_it_should_parse_is_vm_enabled_for_rsm_upgrades(self): + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertIsNone(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be None") + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "hostgaplugin/ext_conf-agent_family_version.xml" + with mock_wire_protocol(data_file) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertTrue(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be True") + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "hostgaplugin/ext_conf-rsm_version_properties_false.xml" + with mock_wire_protocol(data_file) as protocol: + agent_families = protocol.get_goal_state().extensions_goal_state.agent_families + for family in agent_families: + self.assertFalse(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be False") diff --git a/tests/protocol/test_extensions_goal_state_from_vm_settings.py b/tests/common/protocol/test_extensions_goal_state_from_vm_settings.py similarity index 70% rename from tests/protocol/test_extensions_goal_state_from_vm_settings.py rename to tests/common/protocol/test_extensions_goal_state_from_vm_settings.py index 1100b05bf..771fa2206 100644 --- a/tests/protocol/test_extensions_goal_state_from_vm_settings.py +++ b/tests/common/protocol/test_extensions_goal_state_from_vm_settings.py @@ -5,13 +5,13 @@ from azurelinuxagent.common.protocol.goal_state import GoalState from azurelinuxagent.common.protocol.extensions_goal_state import GoalStateChannel from azurelinuxagent.common.protocol.extensions_goal_state_from_vm_settings import _CaseFoldedDict -from tests.protocol.mocks import mockwiredata, mock_wire_protocol -from tests.tools import AgentTestCase +from tests.lib.mock_wire_protocol import wire_protocol_data, mock_wire_protocol +from tests.lib.tools import AgentTestCase class ExtensionsGoalStateFromVmSettingsTestCase(AgentTestCase): def test_it_should_parse_vm_settings(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state def assert_property(name, value): @@ -49,23 +49,73 @@ def assert_property(name, value): self.assertEqual(1, extensions_goal_state.extensions[3].settings[1].dependencyLevel, "Incorrect dependency level (multi-config)") def test_it_should_parse_requested_version_properly(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: goal_state = GoalState(protocol.client) families = goal_state.extensions_goal_state.agent_families for family in families: - self.assertEqual(family.requested_version_string, "0.0.0.0", "Version should be None") + self.assertIsNone(family.version, "Version should be None") - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() - data_file["vm_settings"] = "hostgaplugin/vm_settings-requested_version.json" + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() + data_file["vm_settings"] = "hostgaplugin/vm_settings-agent_family_version.json" with mock_wire_protocol(data_file) as protocol: protocol.mock_wire_data.set_etag(888) goal_state = GoalState(protocol.client) families = goal_state.extensions_goal_state.agent_families for family in families: - self.assertEqual(family.requested_version_string, "9.9.9.9", "Version should be 9.9.9.9") + self.assertEqual(family.version, "9.9.9.9", "Version should be 9.9.9.9") + + def test_it_should_parse_is_version_from_rsm_properly(self): + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertIsNone(family.is_version_from_rsm, "is_version_from_rsm should be None") + + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() + data_file["vm_settings"] = "hostgaplugin/vm_settings-agent_family_version.json" + with mock_wire_protocol(data_file) as protocol: + protocol.mock_wire_data.set_etag(888) + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertTrue(family.is_version_from_rsm, "is_version_from_rsm should be True") + + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() + data_file["vm_settings"] = "hostgaplugin/vm_settings-requested_version_properties_false.json" + with mock_wire_protocol(data_file) as protocol: + protocol.mock_wire_data.set_etag(888) + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertFalse(family.is_version_from_rsm, "is_version_from_rsm should be False") + + def test_it_should_parse_is_vm_enabled_for_rsm_upgrades_properly(self): + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertIsNone(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be None") + + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() + data_file["vm_settings"] = "hostgaplugin/vm_settings-agent_family_version.json" + with mock_wire_protocol(data_file) as protocol: + protocol.mock_wire_data.set_etag(888) + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertTrue(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be True") + + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() + data_file["vm_settings"] = "hostgaplugin/vm_settings-requested_version_properties_false.json" + with mock_wire_protocol(data_file) as protocol: + protocol.mock_wire_data.set_etag(888) + goal_state = GoalState(protocol.client) + families = goal_state.extensions_goal_state.agent_families + for family in families: + self.assertFalse(family.is_vm_enabled_for_rsm_upgrades, "is_vm_enabled_for_rsm_upgrades should be False") def test_it_should_parse_missing_status_upload_blob_as_none(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-no_status_upload_blob.json" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -74,7 +124,7 @@ def test_it_should_parse_missing_status_upload_blob_as_none(self): self.assertEqual("BlockBlob", extensions_goal_state.status_upload_blob_type, "Expected status upload blob to be Block") def test_it_should_parse_missing_agent_manifests_as_empty(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-no_manifests.json" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -82,7 +132,7 @@ def test_it_should_parse_missing_agent_manifests_as_empty(self): self.assertListEqual([], extensions_goal_state.agent_families[0].uris, "Expected an empty list of agent manifests") def test_it_should_parse_missing_extension_manifests_as_empty(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-no_manifests.json" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -98,7 +148,7 @@ def test_it_should_parse_missing_extension_manifests_as_empty(self): extensions_goal_state.extensions[2].manifest_uris, "Incorrect list of manifests for {0}".format(extensions_goal_state.extensions[2])) def test_it_should_default_to_block_blob_when_the_status_blob_type_is_not_valid(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-invalid_blob_type.json" with mock_wire_protocol(data_file) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state @@ -106,7 +156,7 @@ def test_it_should_default_to_block_blob_when_the_status_blob_type_is_not_valid( self.assertEqual("BlockBlob", extensions_goal_state.status_upload_blob_type, 'Expected BlockBlob for an invalid statusBlobType') def test_its_source_channel_should_be_host_ga_plugin(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertEqual(GoalStateChannel.HostGAPlugin, extensions_goal_state.channel, "The channel is incorrect") diff --git a/tests/protocol/test_goal_state.py b/tests/common/protocol/test_goal_state.py similarity index 90% rename from tests/protocol/test_goal_state.py rename to tests/common/protocol/test_goal_state.py index 61653b2af..5b4a2948a 100644 --- a/tests/protocol/test_goal_state.py +++ b/tests/common/protocol/test_goal_state.py @@ -19,15 +19,15 @@ from azurelinuxagent.common.exception import ProtocolError from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common.utils.archive import ARCHIVE_DIRECTORY_NAME -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol import mockwiredata -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.tools import AgentTestCase, patch, load_data +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib import wire_protocol_data +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.tools import AgentTestCase, patch, load_data class GoalStateTestCase(AgentTestCase, HttpRequestPredicates): def test_it_should_use_vm_settings_by_default(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.mock_wire_data.set_etag(888) extensions_goal_state = GoalState(protocol.client).extensions_goal_state self.assertTrue( @@ -41,7 +41,7 @@ def _assert_is_extensions_goal_state_from_extensions_config(self, extensions_goa def test_it_should_use_extensions_config_when_fast_track_is_disabled(self): with patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=False): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: self._assert_is_extensions_goal_state_from_extensions_config(GoalState(protocol.client).extensions_goal_state) def test_it_should_use_extensions_config_when_fast_track_is_not_supported(self): @@ -50,11 +50,11 @@ def http_get_handler(url, *_, **__): return MockHttpResponse(httpclient.NOT_FOUND) return None - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: self._assert_is_extensions_goal_state_from_extensions_config(GoalState(protocol.client).extensions_goal_state) def test_it_should_use_extensions_config_when_the_host_ga_plugin_version_is_not_supported(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = "hostgaplugin/vm_settings-unsupported_version.json" with mock_wire_protocol(data_file) as protocol: @@ -63,7 +63,7 @@ def test_it_should_use_extensions_config_when_the_host_ga_plugin_version_is_not_ def test_it_should_retry_get_vm_settings_on_resource_gone_error(self): # Requests to the hostgaplugin incude the Container ID and the RoleConfigName as headers; when the hostgaplugin returns GONE (HTTP status 410) the agent # needs to get a new goal state and retry the request with updated values for the Container ID and RoleConfigName headers. - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: # Do not mock the vmSettings request at the level of azurelinuxagent.common.utils.restutil.http_request. The GONE status is handled # in the internal _http_request, which we mock below. protocol.do_not_mock = lambda method, url: method == "GET" and self.is_host_plugin_vm_settings_request(url) @@ -89,8 +89,8 @@ def http_get_vm_settings(_method, _host, _relative_url, _timeout, **kwargs): self.assertEqual("GET_VM_SETTINGS_TEST_ROLE_CONFIG_NAME", request_headers[1][hostplugin._HEADER_HOST_CONFIG_NAME], "The retry request did not include the expected header for the RoleConfigName") def test_fetch_goal_state_should_raise_on_incomplete_goal_state(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: - protocol.mock_wire_data.data_files = mockwiredata.DATA_FILE_NOOP_GS + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: + protocol.mock_wire_data.data_files = wire_protocol_data.DATA_FILE_NOOP_GS protocol.mock_wire_data.reload() protocol.mock_wire_data.set_incarnation(2) @@ -101,25 +101,29 @@ def test_fetch_goal_state_should_raise_on_incomplete_goal_state(self): def test_fetching_the_goal_state_should_save_the_shared_config(self): # SharedConfig.xml is used by other components (Azsec and Singularity/HPC Infiniband); verify that we do not delete it - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: _ = GoalState(protocol.client) shared_config = os.path.join(conf.get_lib_dir(), 'SharedConfig.xml') self.assertTrue(os.path.exists(shared_config), "{0} should have been created".format(shared_config)) def test_fetching_the_goal_state_should_save_the_goal_state_to_the_history_directory(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.mock_wire_data.set_incarnation(999) protocol.mock_wire_data.set_etag(888) - _ = GoalState(protocol.client) + _ = GoalState(protocol.client, save_to_history=True) self._assert_directory_contents( self._find_history_subdirectory("999-888"), ["GoalState.xml", "ExtensionsConfig.xml", "VmSettings.json", "Certificates.json", "SharedConfig.xml", "HostingEnvironmentConfig.xml"]) + @staticmethod + def _get_history_directory(): + return os.path.join(conf.get_lib_dir(), ARCHIVE_DIRECTORY_NAME) + def _find_history_subdirectory(self, tag): - matches = glob.glob(os.path.join(self.tmp_dir, ARCHIVE_DIRECTORY_NAME, "*_{0}".format(tag))) + matches = glob.glob(os.path.join(self._get_history_directory(), "*_{0}".format(tag))) self.assertTrue(len(matches) == 1, "Expected one history directory for tag {0}. Got: {1}".format(tag, matches)) return matches[0] @@ -132,11 +136,11 @@ def _assert_directory_contents(self, directory, expected_files): self.assertEqual(expected_files, actual_files, "The expected files were not saved to {0}".format(directory)) def test_update_should_create_new_history_subdirectories(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.mock_wire_data.set_incarnation(123) protocol.mock_wire_data.set_etag(654) - goal_state = GoalState(protocol.client) + goal_state = GoalState(protocol.client, save_to_history=True) self._assert_directory_contents( self._find_history_subdirectory("123-654"), ["GoalState.xml", "ExtensionsConfig.xml", "VmSettings.json", "Certificates.json", "SharedConfig.xml", "HostingEnvironmentConfig.xml"]) @@ -157,14 +161,14 @@ def http_get_handler(url, *_, **__): protocol.set_http_handlers(http_get_handler=None) goal_state.update() self._assert_directory_contents( - self._find_history_subdirectory("234-987"), ["VmSettings.json", "Certificates.json"]) + self._find_history_subdirectory("234-987"), ["VmSettings.json"]) def test_it_should_redact_the_protected_settings_when_saving_to_the_history_directory(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.mock_wire_data.set_incarnation(888) protocol.mock_wire_data.set_etag(888) - goal_state = GoalState(protocol.client) + goal_state = GoalState(protocol.client, save_to_history=True) extensions_goal_state = goal_state.extensions_goal_state protected_settings = [] @@ -195,11 +199,11 @@ def test_it_should_redact_the_protected_settings_when_saving_to_the_history_dire "Could not find the expected number of redacted settings in {0}.\nExpected {1}.\n{2}".format(file_name, len(protected_settings), file_contents)) def test_it_should_save_vm_settings_on_parse_errors(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: invalid_vm_settings_file = "hostgaplugin/vm_settings-parse_error.json" - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() data_file["vm_settings"] = invalid_vm_settings_file - protocol.mock_wire_data = mockwiredata.WireProtocolData(data_file) + protocol.mock_wire_data = wire_protocol_data.WireProtocolData(data_file) with self.assertRaises(ProtocolError): # the parsing error will cause an exception _ = GoalState(protocol.client) @@ -221,6 +225,12 @@ def test_it_should_save_vm_settings_on_parse_errors(self): self.assertEqual(expected, actual, "The vmSettings were not saved correctly") + def test_should_not_save_to_the_history_by_default(self): + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: + _ = GoalState(protocol.client) # omit the save_to_history parameter + history = self._get_history_directory() + self.assertFalse(os.path.exists(history), "The history directory not should have been created") + @staticmethod @contextlib.contextmanager def _create_protocol_ws_and_hgap_in_sync(): @@ -228,7 +238,7 @@ def _create_protocol_ws_and_hgap_in_sync(): Creates a mock protocol in which the HostGAPlugin and the WireServer are in sync, both of them returning the same Fabric goal state. """ - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() with mock_wire_protocol(data_file) as protocol: timestamp = datetime.datetime.utcnow() @@ -372,7 +382,7 @@ def http_get_handler(url, *_, **__): self.assertTrue(goal_state.extensions_goal_state.is_outdated, "The updated goal state should be marked as outdated") def test_it_should_raise_when_the_tenant_certificate_is_missing(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() with mock_wire_protocol(data_file) as protocol: data_file["vm_settings"] = "hostgaplugin/vm_settings-missing_cert.json" @@ -386,7 +396,7 @@ def test_it_should_raise_when_the_tenant_certificate_is_missing(self): self.assertIn(expected_message, str(context.exception)) def test_it_should_download_certs_on_a_new_fast_track_goal_state(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() with mock_wire_protocol(data_file) as protocol: goal_state = GoalState(protocol.client) @@ -410,7 +420,7 @@ def test_it_should_download_certs_on_a_new_fast_track_goal_state(self): self.assertTrue(os.path.isfile(crt_path)) def test_it_should_download_certs_on_a_new_fabric_goal_state(self): - data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() with mock_wire_protocol(data_file) as protocol: protocol.mock_wire_data.set_vm_settings_source(GoalStateSource.Fabric) @@ -457,14 +467,14 @@ def http_get_handler(url, *_, **__): return None http_get_handler.certificate_requests = 0 - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.set_http_handlers(http_get_handler=http_get_handler) protocol.mock_wire_data.reset_call_counts() goal_state = GoalState(protocol.client) self.assertEqual(2, protocol.mock_wire_data.call_counts['goalstate'], "There should have been exactly 2 requests for the goal state (original + refresh)") - self.assertEqual(4, http_get_handler.certificate_requests, "There should have been exactly 4 requests for the goal state certificates (2x original + 2x refresh)") + self.assertEqual(2, http_get_handler.certificate_requests, "There should have been exactly 2 requests for the goal state certificates (original + refresh)") thumbprints = [c.thumbprint for c in goal_state.certs.cert_list.certificates] diff --git a/tests/protocol/test_healthservice.py b/tests/common/protocol/test_healthservice.py similarity index 99% rename from tests/protocol/test_healthservice.py rename to tests/common/protocol/test_healthservice.py index cb523a78f..d9ba17755 100644 --- a/tests/protocol/test_healthservice.py +++ b/tests/common/protocol/test_healthservice.py @@ -18,8 +18,8 @@ from azurelinuxagent.common.exception import HttpError from azurelinuxagent.common.protocol.healthservice import Observation, HealthService from azurelinuxagent.common.utils import restutil -from tests.protocol.test_hostplugin import MockResponse -from tests.tools import AgentTestCase, patch +from tests.common.protocol.test_hostplugin import MockResponse +from tests.lib.tools import AgentTestCase, patch class TestHealthService(AgentTestCase): diff --git a/tests/protocol/test_hostplugin.py b/tests/common/protocol/test_hostplugin.py similarity index 98% rename from tests/protocol/test_hostplugin.py rename to tests/common/protocol/test_hostplugin.py index 47e6871be..4c97c73fd 100644 --- a/tests/protocol/test_hostplugin.py +++ b/tests/common/protocol/test_hostplugin.py @@ -34,10 +34,10 @@ from azurelinuxagent.common.protocol.goal_state import GoalState from azurelinuxagent.common.utils import restutil from azurelinuxagent.common.version import AGENT_VERSION, AGENT_NAME -from tests.protocol.mocks import mock_wire_protocol, mockwiredata, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE, DATA_FILE_NO_EXT -from tests.tools import AgentTestCase, PY_VERSION_MAJOR, Mock, patch +from tests.lib.mock_wire_protocol import mock_wire_protocol, wire_protocol_data, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE, DATA_FILE_NO_EXT +from tests.lib.tools import AgentTestCase, PY_VERSION_MAJOR, Mock, patch hostplugin_status_url = "http://168.63.129.16:32526/status" @@ -852,7 +852,7 @@ def http_get_handler(url, *_, **__): return MockHttpResponse(httpclient.INTERNAL_SERVER_ERROR, body="TEST ERROR") return None - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: protocol.set_http_handlers(http_get_handler=http_get_handler) with self.assertRaisesRegexCM(ProtocolError, r'GET vmSettings \[correlation ID: .* eTag: .*\]: \[HTTP Failed\] \[500: None].*TEST ERROR.*'): protocol.client.get_host_plugin().fetch_vm_settings() @@ -875,7 +875,7 @@ def http_get_handler(url, *_, **__): return mock_response return None - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: mock_response = MockHttpResponse(httpclient.INTERNAL_SERVER_ERROR) self._fetch_vm_settings_ignoring_errors(protocol) @@ -913,7 +913,7 @@ def http_get_handler(url, *_, **__): self.assertEqual(expected, summary, "The count of errors is incorrect") def test_it_should_limit_the_number_of_errors_it_reports(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: def http_get_handler(url, *_, **__): if self.is_host_plugin_vm_settings_request(url): return MockHttpResponse(httpclient.BAD_GATEWAY) # HostGAPlugin returns 502 for internal errors @@ -941,7 +941,7 @@ def get_telemetry_messages(): self.assertEqual(1, len(telemetry_messages), "Expected additional errors to be reported to telemetry in the next period (got: {0})".format(telemetry_messages)) def test_it_should_stop_issuing_vm_settings_requests_when_api_is_not_supported(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: def http_get_handler(url, *_, **__): if self.is_host_plugin_vm_settings_request(url): return MockHttpResponse(httpclient.NOT_FOUND) # HostGAPlugin returns 404 if the API is not supported @@ -969,7 +969,7 @@ def http_get_handler(url, *_, **__): return MockHttpResponse(httpclient.NOT_FOUND) # HostGAPlugin returns 404 if the API is not supported return None - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: host_ga_plugin = protocol.client.get_host_plugin() # Do an initial call to ensure the API is supported @@ -984,7 +984,7 @@ def http_get_handler(url, *_, **__): self.assertEqual(vm_settings.created_on_timestamp, cm.exception.timestamp) def test_it_should_save_the_timestamp_of_the_most_recent_fast_track_goal_state(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: host_ga_plugin = protocol.client.get_host_plugin() vm_settings, _ = host_ga_plugin.fetch_vm_settings() diff --git a/tests/protocol/test_image_info_matcher.py b/tests/common/protocol/test_image_info_matcher.py similarity index 100% rename from tests/protocol/test_image_info_matcher.py rename to tests/common/protocol/test_image_info_matcher.py diff --git a/tests/protocol/test_imds.py b/tests/common/protocol/test_imds.py similarity index 99% rename from tests/protocol/test_imds.py rename to tests/common/protocol/test_imds.py index 1f8e428c1..efc705ffa 100644 --- a/tests/protocol/test_imds.py +++ b/tests/common/protocol/test_imds.py @@ -26,8 +26,8 @@ from azurelinuxagent.common.exception import HttpError, ResourceGoneError from azurelinuxagent.common.future import ustr, httpclient from azurelinuxagent.common.utils import restutil -from tests.protocol.mocks import MockHttpResponse -from tests.tools import AgentTestCase, data_dir, MagicMock, Mock, patch +from tests.lib.mock_wire_protocol import MockHttpResponse +from tests.lib.tools import AgentTestCase, data_dir, MagicMock, Mock, patch def get_mock_compute_response(): diff --git a/tests/protocol/test_metadata_server_migration_util.py b/tests/common/protocol/test_metadata_server_migration_util.py similarity index 99% rename from tests/protocol/test_metadata_server_migration_util.py rename to tests/common/protocol/test_metadata_server_migration_util.py index 5950b43f1..70ef05333 100644 --- a/tests/protocol/test_metadata_server_migration_util.py +++ b/tests/common/protocol/test_metadata_server_migration_util.py @@ -27,7 +27,7 @@ _LEGACY_METADATA_SERVER_P7B_FILE_NAME, \ _KNOWN_METADATASERVER_IP from azurelinuxagent.common.utils.restutil import KNOWN_WIRESERVER_IP -from tests.tools import AgentTestCase, patch, MagicMock +from tests.lib.tools import AgentTestCase, patch, MagicMock class TestMetadataServerMigrationUtil(AgentTestCase): @patch('azurelinuxagent.common.conf.get_lib_dir') diff --git a/tests/protocol/test_protocol_util.py b/tests/common/protocol/test_protocol_util.py similarity index 98% rename from tests/protocol/test_protocol_util.py rename to tests/common/protocol/test_protocol_util.py index 3529e95d4..b60ca9af9 100644 --- a/tests/protocol/test_protocol_util.py +++ b/tests/common/protocol/test_protocol_util.py @@ -30,7 +30,7 @@ from azurelinuxagent.common.protocol.util import get_protocol_util, ProtocolUtil, PROTOCOL_FILE_NAME, \ WIRE_PROTOCOL_NAME, ENDPOINT_FILE_NAME from azurelinuxagent.common.utils.restutil import KNOWN_WIRESERVER_IP -from tests.tools import AgentTestCase, MagicMock, Mock, patch, clear_singleton_instances +from tests.lib.tools import AgentTestCase, MagicMock, Mock, patch, clear_singleton_instances @patch("time.sleep") @@ -127,14 +127,14 @@ def test_detect_protocol_no_dhcp(self, WireProtocol, mock_get_lib_dir, _): endpoint_file = protocol_util._get_wireserver_endpoint_file_path() # pylint: disable=unused-variable # Test wire protocol when no endpoint file has been written - protocol_util._detect_protocol() + protocol_util._detect_protocol(save_to_history=False) self.assertEqual(KNOWN_WIRESERVER_IP, protocol_util.get_wireserver_endpoint()) # Test wire protocol on dhcp failure protocol_util.osutil.is_dhcp_available.return_value = True protocol_util.dhcp_handler.run.side_effect = DhcpError() - self.assertRaises(ProtocolError, protocol_util._detect_protocol) + self.assertRaises(ProtocolError, lambda: protocol_util._detect_protocol(save_to_history=False)) @patch("azurelinuxagent.common.protocol.util.WireProtocol") def test_get_protocol(self, WireProtocol, _): diff --git a/tests/protocol/test_wire.py b/tests/common/protocol/test_wire.py similarity index 88% rename from tests/protocol/test_wire.py rename to tests/common/protocol/test_wire.py index 2a36fc291..9ce8339e9 100644 --- a/tests/protocol/test_wire.py +++ b/tests/common/protocol/test_wire.py @@ -40,12 +40,12 @@ from azurelinuxagent.common.version import CURRENT_VERSION, DISTRO_NAME, DISTRO_VERSION from azurelinuxagent.ga.exthandlers import get_exthandlers_handler from tests.ga.test_monitor import random_generator -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE_NO_EXT, DATA_FILE -from tests.protocol.mockwiredata import WireProtocolData -from tests.tools import patch, AgentTestCase, load_bin_data +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE_NO_EXT, DATA_FILE +from tests.lib.wire_protocol_data import WireProtocolData +from tests.lib.tools import patch, AgentTestCase, load_bin_data data_with_bom = b'\xef\xbb\xbfhehe' testurl = 'http://foo' @@ -120,37 +120,37 @@ def _yield_events(): def test_getters(self, *args): """Normal case""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) self._test_getters(test_data, True, *args) def test_getters_no_ext(self, *args): """Provision with agent is not checked""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_NO_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_NO_EXT) self._test_getters(test_data, True, *args) def test_getters_ext_no_settings(self, *args): """Extensions without any settings""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_NO_SETTINGS) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_NO_SETTINGS) self._test_getters(test_data, True, *args) def test_getters_ext_no_public(self, *args): """Extensions without any public settings""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_NO_PUBLIC) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_NO_PUBLIC) self._test_getters(test_data, True, *args) def test_getters_ext_no_cert_format(self, *args): """Certificate format not specified""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_NO_CERT_FORMAT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_NO_CERT_FORMAT) self._test_getters(test_data, True, *args) def test_getters_ext_cert_format_not_pfx(self, *args): """Certificate format is not Pkcs7BlobWithPfxContents specified""" - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_CERT_FORMAT_NOT_PFX) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_CERT_FORMAT_NOT_PFX) self._test_getters(test_data, False, *args) @patch("azurelinuxagent.common.protocol.healthservice.HealthService.report_host_plugin_extension_artifact") def test_getters_with_stale_goal_state(self, patch_report, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) test_data.emulate_stale_goal_state = True self._test_getters(test_data, True, *args) @@ -202,7 +202,7 @@ def test_call_storage_kwargs(self, *args): # pylint: disable=unused-argument self.assertTrue(c == (True if i != 3 else False)) def test_status_blob_parsing(self, *args): # pylint: disable=unused-argument - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state self.assertIsInstance(extensions_goal_state, ExtensionsGoalStateFromExtensionsConfig) self.assertEqual(extensions_goal_state.status_upload_blob, @@ -212,7 +212,7 @@ def test_status_blob_parsing(self, *args): # pylint: disable=unused-argument self.assertEqual(protocol.get_goal_state().extensions_goal_state.status_upload_blob_type, u'BlockBlob') def test_get_host_ga_plugin(self, *args): # pylint: disable=unused-argument - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: host_plugin = protocol.client.get_host_plugin() goal_state = protocol.client.get_goal_state() self.assertEqual(goal_state.container_id, host_plugin.container_id) @@ -223,7 +223,7 @@ def http_put_handler(url, *_, **__): # pylint: disable=inconsistent-return-stat if protocol.get_endpoint() in url and url.endswith('/status'): return MockHttpResponse(200) - with mock_wire_protocol(mockwiredata.DATA_FILE, http_put_handler=http_put_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_put_handler=http_put_handler) as protocol: HostPluginProtocol.is_default_channel = False protocol.client.status_blob.vm_status = VMStatus(message="Ready", status="Ready") @@ -254,14 +254,14 @@ def test_upload_status_blob_reports_prepare_error(self, *_): self.assertEqual(1, mock_prepare.call_count) def test_get_in_vm_artifacts_profile_blob_not_available(self, *_): - data_file = mockwiredata.DATA_FILE.copy() + data_file = wire_protocol_data.DATA_FILE.copy() data_file["ext_conf"] = "wire/ext_conf_in_vm_empty_artifacts_profile.xml" with mock_wire_protocol(data_file) as protocol: self.assertFalse(protocol.get_goal_state().extensions_goal_state.on_hold) def test_it_should_set_on_hold_to_false_when_the_in_vm_artifacts_profile_is_not_valid(self, *_): - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: extensions_on_hold = protocol.get_goal_state().extensions_goal_state.on_hold self.assertTrue(extensions_on_hold, "Extensions should be on hold in the test data") @@ -360,41 +360,60 @@ def mock_http_put(url, *args, **__): exthandlers_handler = get_exthandlers_handler(protocol) with patch("azurelinuxagent.common.agent_supported_feature._MultiConfigFeature.is_supported", True): - exthandlers_handler.run() - exthandlers_handler.report_ext_handlers_status() - - self.assertIsNotNone(protocol.aggregate_status, "Aggregate status should not be None") - self.assertIn("supportedFeatures", protocol.aggregate_status, "supported features not reported") - multi_config_feature = get_supported_feature_by_name(SupportedFeatureNames.MultiConfig) - found = False - for feature in protocol.aggregate_status['supportedFeatures']: - if feature['Key'] == multi_config_feature.name and feature['Value'] == multi_config_feature.version: - found = True - break - self.assertTrue(found, "Multi-config name should be present in supportedFeatures") + with patch("azurelinuxagent.common.agent_supported_feature._GAVersioningGovernanceFeature.is_supported", True): + exthandlers_handler.run() + exthandlers_handler.report_ext_handlers_status() + + self.assertIsNotNone(protocol.aggregate_status, "Aggregate status should not be None") + self.assertIn("supportedFeatures", protocol.aggregate_status, "supported features not reported") + multi_config_feature = get_supported_feature_by_name(SupportedFeatureNames.MultiConfig) + found = False + for feature in protocol.aggregate_status['supportedFeatures']: + if feature['Key'] == multi_config_feature.name and feature['Value'] == multi_config_feature.version: + found = True + break + self.assertTrue(found, "Multi-config name should be present in supportedFeatures") + + ga_versioning_feature = get_supported_feature_by_name(SupportedFeatureNames.GAVersioningGovernance) + found = False + for feature in protocol.aggregate_status['supportedFeatures']: + if feature['Key'] == ga_versioning_feature.name and feature['Value'] == ga_versioning_feature.version: + found = True + break + self.assertTrue(found, "ga versioning name should be present in supportedFeatures") # Feature should not be reported if not present with patch("azurelinuxagent.common.agent_supported_feature._MultiConfigFeature.is_supported", False): - exthandlers_handler.run() - exthandlers_handler.report_ext_handlers_status() - - self.assertIsNotNone(protocol.aggregate_status, "Aggregate status should not be None") - if "supportedFeatures" not in protocol.aggregate_status: - # In the case Multi-config was the only feature available, 'supportedFeatures' should not be - # reported in the status blob as its not supported as of now. - # Asserting no other feature was available to report back to crp - self.assertEqual(0, len(get_agent_supported_features_list_for_crp()), - "supportedFeatures should be available if there are more features") - return - - # If there are other features available, confirm MultiConfig was not reported - multi_config_feature = get_supported_feature_by_name(SupportedFeatureNames.MultiConfig) - found = False - for feature in protocol.aggregate_status['supportedFeatures']: - if feature['Key'] == multi_config_feature.name and feature['Value'] == multi_config_feature.version: - found = True - break - self.assertFalse(found, "Multi-config name should be present in supportedFeatures") + with patch("azurelinuxagent.common.agent_supported_feature._GAVersioningGovernanceFeature.is_supported", False): + + exthandlers_handler.run() + exthandlers_handler.report_ext_handlers_status() + + self.assertIsNotNone(protocol.aggregate_status, "Aggregate status should not be None") + if "supportedFeatures" not in protocol.aggregate_status: + # In the case Multi-config and GA Versioning only features available, 'supportedFeatures' should not be + # reported in the status blob as its not supported as of now. + # Asserting no other feature was available to report back to crp + self.assertEqual(0, len(get_agent_supported_features_list_for_crp()), + "supportedFeatures should be available if there are more features") + return + + # If there are other features available, confirm MultiConfig and GA versioning was not reported + multi_config_feature = get_supported_feature_by_name(SupportedFeatureNames.MultiConfig) + found = False + for feature in protocol.aggregate_status['supportedFeatures']: + if feature['Key'] == multi_config_feature.name and feature['Value'] == multi_config_feature.version: + found = True + break + self.assertFalse(found, "Multi-config name should not be present in supportedFeatures") + + ga_versioning_feature = get_supported_feature_by_name(SupportedFeatureNames.GAVersioningGovernance) + found = False + for feature in protocol.aggregate_status['supportedFeatures']: + if feature['Key'] == ga_versioning_feature.name and feature['Value'] == ga_versioning_feature.version: + found = True + break + self.assertFalse(found, "ga versioning name should not be present in supportedFeatures") @patch("azurelinuxagent.common.utils.restutil.http_request") def test_send_encoded_event(self, mock_http_request, *args): @@ -466,7 +485,7 @@ def test_get_ext_conf_without_extensions_should_retrieve_vmagent_manifests_info( # Basic test for extensions_goal_state when extensions are not present in the config. The test verifies that # extensions_goal_state fetches the correct data by comparing the returned data with the test data provided the # mock_wire_protocol. - with mock_wire_protocol(mockwiredata.DATA_FILE_NO_EXT) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_NO_EXT) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state ext_handlers_names = [ext_handler.name for ext_handler in extensions_goal_state.extensions] @@ -481,7 +500,7 @@ def test_get_ext_conf_without_extensions_should_retrieve_vmagent_manifests_info( def test_get_ext_conf_with_extensions_should_retrieve_ext_handlers_and_vmagent_manifests_info(self): # Basic test for extensions_goal_state when extensions are present in the config. The test verifies that extensions_goal_state # fetches the correct data by comparing the returned data with the test data provided the mock_wire_protocol. - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: extensions_goal_state = protocol.get_goal_state().extensions_goal_state ext_handlers_names = [ext_handler.name for ext_handler in extensions_goal_state.extensions] @@ -508,7 +527,7 @@ def http_get_handler(url, *_, **__): return MockHttpResponse(200, body=load_bin_data("ga/fake_extension.zip")) return None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: protocol.client.download_zip_package("extension package", [extension_url], target_file, target_directory, use_verify_header=False) self.assertTrue(os.path.exists(target_directory), "The extension package was not downloaded") @@ -526,7 +545,7 @@ def http_get_handler(url, *_, **__): self.fail('The host channel should not have been used') return None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: HostPluginProtocol.is_default_channel = False protocol.client.download_zip_package("extension package", [extension_url], target_file, target_directory, use_verify_header=False) @@ -549,7 +568,7 @@ def http_get_handler(url, *_, **kwargs): return MockHttpResponse(200, body=load_bin_data("ga/fake_extension.zip")) return None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: HostPluginProtocol.is_default_channel = False protocol.client.download_zip_package("extension package", [extension_url], target_file, target_directory, use_verify_header=False) @@ -580,7 +599,7 @@ def http_get_handler(url, *_, **kwargs): return None http_get_handler.goal_state_requests = 0 - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: HostPluginProtocol.is_default_channel = False try: @@ -614,7 +633,7 @@ def http_get_handler(url, *_, **kwargs): protocol.track_url(url) # keep track of goal state requests return None - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: HostPluginProtocol.is_default_channel = False # initialization of the host plugin triggers a request for the goal state; do it here before we start tracking those requests. @@ -642,7 +661,7 @@ def http_get_handler(url, *_, **kwargs): return MockHttpResponse(status=200, body=b"NOT A ZIP") return None - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.set_http_handlers(http_get_handler=http_get_handler) with self.assertRaises(ExtensionDownloadError): @@ -662,10 +681,10 @@ def http_get_handler(url, *_, **__): self.fail('The Host GA Plugin should not have been invoked') return None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: HostPluginProtocol.is_default_channel = False - manifest = protocol.client.fetch_manifest([manifest_url], use_verify_header=False) + manifest = protocol.client.fetch_manifest("test", [manifest_url], use_verify_header=False) urls = protocol.get_tracked_urls() self.assertEqual(manifest, manifest_xml, 'The expected manifest was not downloaded') @@ -684,11 +703,11 @@ def http_get_handler(url, *_, **kwargs): return MockHttpResponse(200, body=manifest_xml.encode('utf-8')) return None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: HostPluginProtocol.is_default_channel = False try: - manifest = protocol.client.fetch_manifest([manifest_url], use_verify_header=False) + manifest = protocol.client.fetch_manifest("test", [manifest_url], use_verify_header=False) urls = protocol.get_tracked_urls() self.assertEqual(manifest, manifest_xml, 'The expected manifest was not downloaded') @@ -717,7 +736,7 @@ def http_get_handler(url, *_, **kwargs): return None http_get_handler.goal_state_requests = 0 - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: HostPluginProtocol.is_default_channel = False try: @@ -725,7 +744,7 @@ def http_get_handler(url, *_, **kwargs): protocol.client.get_host_plugin() protocol.set_http_handlers(http_get_handler=http_get_handler) - manifest = protocol.client.fetch_manifest([manifest_url], use_verify_header=False) + manifest = protocol.client.fetch_manifest("test", [manifest_url], use_verify_header=False) urls = protocol.get_tracked_urls() self.assertEqual(manifest, manifest_xml) @@ -749,7 +768,7 @@ def http_get_handler(url, *_, **kwargs): return None # Everything fails. Goal state should have been updated and host channel should not have been set as default. - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: HostPluginProtocol.is_default_channel = False # initialization of the host plugin triggers a request for the goal state; do it here before we start @@ -759,7 +778,7 @@ def http_get_handler(url, *_, **kwargs): protocol.set_http_handlers(http_get_handler=http_get_handler) with self.assertRaises(ExtensionDownloadError): - protocol.client.fetch_manifest([manifest_url], use_verify_header=False) + protocol.client.fetch_manifest("test", [manifest_url], use_verify_header=False) urls = protocol.get_tracked_urls() self.assertEqual(len(urls), 4, "Unexpected number of HTTP requests: [{0}]".format(urls)) @@ -777,7 +796,7 @@ def http_get_handler(url, *_, **__): protocol.track_url(url) return None - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: protocol.set_http_handlers(http_get_handler=http_get_handler) HostPluginProtocol.is_default_channel = False @@ -795,7 +814,7 @@ def http_get_handler(url, *_, **kwargs): protocol.track_url(url) return None - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: protocol.set_http_handlers(http_get_handler=http_get_handler) HostPluginProtocol.is_default_channel = False @@ -824,7 +843,7 @@ def http_get_handler(url, *_, **kwargs): return None http_get_handler.host_plugin_calls = 0 - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: HostPluginProtocol.is_default_channel = False try: @@ -857,7 +876,7 @@ def http_get_handler(url, *_, **kwargs): return None http_get_handler.host_plugin_calls = 0 - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: HostPluginProtocol.is_default_channel = False # initialization of the host plugin triggers a request for the goal state; do it here before we start tracking those requests. @@ -898,7 +917,7 @@ def host_func(*_): return direct_func, host_func def test_download_using_appropriate_channel_should_not_invoke_secondary_when_primary_channel_succeeds(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: # Scenario #1: Direct channel default HostPluginProtocol.is_default_channel = False @@ -924,7 +943,7 @@ def test_download_using_appropriate_channel_should_not_invoke_secondary_when_pri self.assertTrue(HostPluginProtocol.is_default_channel) def test_download_using_appropriate_channel_should_not_change_default_channel_if_none_succeeds(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: # Scenario #1: Direct channel is default HostPluginProtocol.is_default_channel = False direct_func, host_func = self._set_and_fail_helper_channel_functions(fail_direct=True, fail_host=True) @@ -950,7 +969,7 @@ def test_download_using_appropriate_channel_should_not_change_default_channel_if self.assertTrue(HostPluginProtocol.is_default_channel) def test_download_using_appropriate_channel_should_change_default_channel_when_secondary_succeeds(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: # Scenario #1: Direct channel is default HostPluginProtocol.is_default_channel = False direct_func, host_func = self._set_and_fail_helper_channel_functions(fail_direct=True, fail_host=False) @@ -996,7 +1015,7 @@ class UpdateGoalStateTestCase(HttpRequestPredicates, AgentTestCase): """ def test_it_should_update_the_goal_state_and_the_host_plugin_when_the_incarnation_changes(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.client.get_host_plugin() # if the incarnation changes the behavior is the same for forced and non-forced updates @@ -1053,7 +1072,7 @@ def test_it_should_update_the_goal_state_and_the_host_plugin_when_the_incarnatio self.assertEqual(protocol.client.get_host_plugin().role_config_name, new_role_config_name) def test_non_forced_update_should_not_update_the_goal_state_but_should_update_the_host_plugin_when_the_incarnation_does_not_change(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.client.get_host_plugin() # The container id, role config name and shared config can change without the incarnation changing; capture the initial @@ -1077,7 +1096,7 @@ def test_non_forced_update_should_not_update_the_goal_state_but_should_update_th self.assertEqual(protocol.client.get_host_plugin().role_config_name, new_role_config_name) def test_forced_update_should_update_the_goal_state_and_the_host_plugin_when_the_incarnation_does_not_change(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.client.get_host_plugin() # The container id, role config name and shared config can change without the incarnation changing @@ -1100,7 +1119,7 @@ def test_forced_update_should_update_the_goal_state_and_the_host_plugin_when_the self.assertEqual(protocol.client.get_host_plugin().role_config_name, new_role_config_name) def test_reset_should_init_provided_goal_state_properties(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.client.reset_goal_state(goal_state_properties=GoalStateProperties.All & ~GoalStateProperties.Certificates) with self.assertRaises(ProtocolError) as context: @@ -1110,7 +1129,7 @@ def test_reset_should_init_provided_goal_state_properties(self): self.assertIn(expected_message, str(context.exception)) def test_reset_should_init_the_goal_state(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: new_container_id = str(uuid.uuid4()) new_role_config_name = str(uuid.uuid4()) protocol.mock_wire_data.set_container_id(new_container_id) @@ -1127,7 +1146,7 @@ class UpdateHostPluginFromGoalStateTestCase(AgentTestCase): Tests for WireClient.update_host_plugin_from_goal_state() """ def test_it_should_update_the_host_plugin_with_or_without_incarnation_changes(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.client.get_host_plugin() # the behavior should be the same whether the incarnation changes or not diff --git a/tests/common/test_agent_supported_feature.py b/tests/common/test_agent_supported_feature.py index cf367f90b..6a49dd887 100644 --- a/tests/common/test_agent_supported_feature.py +++ b/tests/common/test_agent_supported_feature.py @@ -18,7 +18,7 @@ from azurelinuxagent.common.agent_supported_feature import SupportedFeatureNames, \ get_agent_supported_features_list_for_crp, get_supported_feature_by_name, \ get_agent_supported_features_list_for_extensions -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class TestAgentSupportedFeature(AgentTestCase): @@ -53,3 +53,21 @@ def test_it_should_return_extension_supported_features_properly(self): self.assertEqual(SupportedFeatureNames.ExtensionTelemetryPipeline, get_supported_feature_by_name(SupportedFeatureNames.ExtensionTelemetryPipeline).name, "Invalid/Wrong feature returned") + + def test_it_should_return_ga_versioning_governance_feature_properly(self): + with patch("azurelinuxagent.common.agent_supported_feature._GAVersioningGovernanceFeature.is_supported", True): + self.assertIn(SupportedFeatureNames.GAVersioningGovernance, get_agent_supported_features_list_for_crp(), + "GAVersioningGovernance should be fetched in crp_supported_features") + + with patch("azurelinuxagent.common.agent_supported_feature._GAVersioningGovernanceFeature.is_supported", False): + self.assertNotIn(SupportedFeatureNames.GAVersioningGovernance, get_agent_supported_features_list_for_crp(), + "GAVersioningGovernance should not be fetched in crp_supported_features as not supported") + + self.assertEqual(SupportedFeatureNames.GAVersioningGovernance, + get_supported_feature_by_name(SupportedFeatureNames.GAVersioningGovernance).name, + "Invalid/Wrong feature returned") + + # Raise error if feature name not found + with self.assertRaises(NotImplementedError): + get_supported_feature_by_name("ABC") + diff --git a/tests/common/test_conf.py b/tests/common/test_conf.py index ebc57ffed..e6cc7de02 100644 --- a/tests/common/test_conf.py +++ b/tests/common/test_conf.py @@ -19,7 +19,7 @@ import azurelinuxagent.common.conf as conf from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, data_dir +from tests.lib.tools import AgentTestCase, data_dir class TestConf(AgentTestCase): @@ -27,6 +27,8 @@ class TestConf(AgentTestCase): # -- These values *MUST* match those from data/test_waagent.conf EXPECTED_CONFIGURATION = { "Extensions.Enabled": True, + "Extensions.WaitForCloudInit": False, + "Extensions.WaitForCloudInitTimeout": 3600, "Provisioning.Agent": "auto", "Provisioning.DeleteRootPassword": True, "Provisioning.RegenerateSshHostKeyPair": True, @@ -63,6 +65,7 @@ class TestConf(AgentTestCase): "OS.CheckRdmaDriver": False, "AutoUpdate.Enabled": True, "AutoUpdate.GAFamily": "Prod", + "AutoUpdate.UpdateToLatestVersion": True, "EnableOverProvisioning": True, "OS.AllowHTTP": False, "OS.EnableFirewall": False @@ -144,3 +147,56 @@ def test_write_agent_disabled(self): def test_get_extensions_enabled(self): self.assertTrue(conf.get_extensions_enabled(self.conf)) + + def test_get_get_auto_update_to_latest_version(self): + # update flags not set + self.assertTrue(conf.get_auto_update_to_latest_version(self.conf)) + + config = conf.ConfigurationProvider() + # AutoUpdate.Enabled is set to 'n' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_disabled.conf"), + config) + self.assertFalse(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'n'") + + # AutoUpdate.Enabled is set to 'y' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_enabled.conf"), + config) + self.assertTrue(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'y'") + + # AutoUpdate.UpdateToLatestVersion is set to 'n' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_update_to_latest_version_disabled.conf"), + config) + self.assertFalse(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'n'") + + # AutoUpdate.UpdateToLatestVersion is set to 'y' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_update_to_latest_version_enabled.conf"), + config) + self.assertTrue(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'y'") + + # AutoUpdate.Enabled is set to 'y' and AutoUpdate.UpdateToLatestVersion is set to 'n' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_enabled_update_to_latest_version_disabled.conf"), + config) + self.assertFalse(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'n'") + + # AutoUpdate.Enabled is set to 'n' and AutoUpdate.UpdateToLatestVersion is set to 'y' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_disabled_update_to_latest_version_enabled.conf"), + config) + self.assertTrue(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'y'") + + # AutoUpdate.Enabled is set to 'n' and AutoUpdate.UpdateToLatestVersion is set to 'n' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_disabled_update_to_latest_version_disabled.conf"), + config) + self.assertFalse(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'n'") + + # AutoUpdate.Enabled is set to 'y' and AutoUpdate.UpdateToLatestVersion is set to 'y' + conf.load_conf_from_file( + os.path.join(data_dir, "config/waagent_auto_update_enabled_update_to_latest_version_enabled.conf"), + config) + self.assertTrue(conf.get_auto_update_to_latest_version(config), "AutoUpdate.UpdateToLatestVersion should be 'y'") diff --git a/tests/common/test_errorstate.py b/tests/common/test_errorstate.py index 263d95ed7..c51682b70 100644 --- a/tests/common/test_errorstate.py +++ b/tests/common/test_errorstate.py @@ -2,7 +2,7 @@ from datetime import timedelta, datetime from azurelinuxagent.common.errorstate import ErrorState -from tests.tools import Mock, patch +from tests.lib.tools import Mock, patch class TestErrorState(unittest.TestCase): diff --git a/tests/common/test_event.py b/tests/common/test_event.py index de5ad7353..435ac2e80 100644 --- a/tests/common/test_event.py +++ b/tests/common/test_event.py @@ -20,6 +20,7 @@ import json import os +import platform import re import shutil import threading @@ -41,11 +42,11 @@ GuestAgentExtensionEventsSchema, GuestAgentPerfCounterEventsSchema from azurelinuxagent.common.version import CURRENT_AGENT, CURRENT_VERSION, AGENT_EXECUTION_MODE from azurelinuxagent.ga.collect_telemetry_events import _CollectAndEnqueueEvents -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.tools import AgentTestCase, data_dir, load_data, patch, skip_if_predicate_true, is_python_version_26_or_34 -from tests.utils.event_logger_tools import EventLoggerTools +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.tools import AgentTestCase, data_dir, load_data, patch, skip_if_predicate_true, is_python_version_26_or_34 +from tests.lib.event_logger_tools import EventLoggerTools class TestEvent(HttpRequestPredicates, AgentTestCase): @@ -70,7 +71,7 @@ def setUp(self): CommonTelemetryEventSchema.EventTid: threading.current_thread().ident, CommonTelemetryEventSchema.EventPid: os.getpid(), CommonTelemetryEventSchema.TaskName: threading.current_thread().getName(), - CommonTelemetryEventSchema.KeywordName: '', + CommonTelemetryEventSchema.KeywordName: json.dumps({"CpuArchitecture": platform.machine()}), # common parameters computed from the OS platform CommonTelemetryEventSchema.OSVersion: EventLoggerTools.get_expected_os_version(), CommonTelemetryEventSchema.ExecutionMode: AGENT_EXECUTION_MODE, @@ -155,7 +156,7 @@ def create_event_and_return_container_id(): # pylint: disable=inconsistent-retu self.fail("Could not find Contained ID on event") - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: contained_id = create_event_and_return_container_id() # The expect value comes from DATA_FILE self.assertEqual(contained_id, 'c6d5526c-5ac2-4200-b6e2-56f2b70c5ab2', "Incorrect container ID") @@ -787,7 +788,7 @@ def http_post_handler(url, body, **__): return None http_post_handler.request_body = None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_post_handler=http_post_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_post_handler=http_post_handler) as protocol: event_file_path = self._create_test_event_file("event_with_callstack.waagent.tld") expected_message = get_event_message_from_event_file(event_file_path) @@ -807,7 +808,7 @@ def http_post_handler(url, body, **__): return None http_post_handler.request_body = None - with mock_wire_protocol(mockwiredata.DATA_FILE, http_post_handler=http_post_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_post_handler=http_post_handler) as protocol: test_messages = [ 'Non-English message - 此文字不是英文的', "Ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία", diff --git a/tests/common/test_logger.py b/tests/common/test_logger.py index a10ea47c6..d792eb857 100644 --- a/tests/common/test_logger.py +++ b/tests/common/test_logger.py @@ -24,7 +24,7 @@ import azurelinuxagent.common.logger as logger from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, MagicMock, patch, skip_if_predicate_true +from tests.lib.tools import AgentTestCase, MagicMock, patch, skip_if_predicate_true _MSG_INFO = "This is our test info logging message {0} {1}" _MSG_WARN = "This is our test warn logging message {0} {1}" diff --git a/tests/common/test_singletonperthread.py b/tests/common/test_singletonperthread.py index 39d8c9917..7b1972635 100644 --- a/tests/common/test_singletonperthread.py +++ b/tests/common/test_singletonperthread.py @@ -3,7 +3,7 @@ from threading import Thread, currentThread from azurelinuxagent.common.singletonperthread import SingletonPerThread -from tests.tools import AgentTestCase, clear_singleton_instances +from tests.lib.tools import AgentTestCase, clear_singleton_instances class TestClassToTestSingletonPerThread(SingletonPerThread): diff --git a/tests/common/test_telemetryevent.py b/tests/common/test_telemetryevent.py index e23ab6844..ce232dab0 100644 --- a/tests/common/test_telemetryevent.py +++ b/tests/common/test_telemetryevent.py @@ -16,7 +16,7 @@ # from azurelinuxagent.common.telemetryevent import TelemetryEvent, TelemetryEventParam, GuestAgentExtensionEventsSchema, \ CommonTelemetryEventSchema -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase def get_test_event(name="DummyExtension", op="Unknown", is_success=True, duration=0, version="foo", evt_type="", is_internal=False, diff --git a/tests/common/test_version.py b/tests/common/test_version.py index ba1fb7672..156cdf1ab 100644 --- a/tests/common/test_version.py +++ b/tests/common/test_version.py @@ -30,7 +30,7 @@ get_f5_platform, get_distro, get_lis_version, PY_VERSION_MAJOR, \ PY_VERSION_MINOR, get_daemon_version, set_daemon_version, __DAEMON_VERSION_ENV_VARIABLE as DAEMON_VERSION_ENV_VARIABLE from azurelinuxagent.common.utils.flexible_version import FlexibleVersion -from tests.tools import AgentTestCase, open_patch, patch +from tests.lib.tools import AgentTestCase, open_patch, patch def freebsd_system(): diff --git a/tests/protocol/__init__.py b/tests/common/utils/__init__.py similarity index 100% rename from tests/protocol/__init__.py rename to tests/common/utils/__init__.py diff --git a/tests/utils/test_archive.py b/tests/common/utils/test_archive.py similarity index 99% rename from tests/utils/test_archive.py rename to tests/common/utils/test_archive.py index 54766862f..e65fef1e7 100644 --- a/tests/utils/test_archive.py +++ b/tests/common/utils/test_archive.py @@ -9,7 +9,7 @@ from azurelinuxagent.common import conf from azurelinuxagent.common.utils import fileutil, timeutil from azurelinuxagent.common.utils.archive import GoalStateHistory, StateArchiver, _MAX_ARCHIVED_STATES, ARCHIVE_DIRECTORY_NAME -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch debug = False if os.environ.get('DEBUG') == '1': diff --git a/tests/utils/test_crypt_util.py b/tests/common/utils/test_crypt_util.py similarity index 83% rename from tests/utils/test_crypt_util.py rename to tests/common/utils/test_crypt_util.py index 4c8ab2e37..4bd342976 100644 --- a/tests/utils/test_crypt_util.py +++ b/tests/common/utils/test_crypt_util.py @@ -21,7 +21,7 @@ import azurelinuxagent.common.conf as conf from azurelinuxagent.common.exception import CryptError from azurelinuxagent.common.utils.cryptutil import CryptUtil -from tests.tools import AgentTestCase, data_dir, load_data, is_python_version_26, skip_if_predicate_true +from tests.lib.tools import AgentTestCase, data_dir, load_data, is_python_version_26, skip_if_predicate_true class TestCryptoUtilOperations(AgentTestCase): @@ -67,6 +67,19 @@ def test_get_pubkey_from_crt(self): with open(expected_pub_key) as fh: self.assertEqual(fh.read(), crypto.get_pubkey_from_prv(prv_key)) + def test_get_pubkey_from_prv(self): + crypto = CryptUtil(conf.get_openssl_cmd()) + + def do_test(prv_key, expected_pub_key): + prv_key = os.path.join(data_dir, "wire", prv_key) + expected_pub_key = os.path.join(data_dir, "wire", expected_pub_key) + + with open(expected_pub_key) as fh: + self.assertEqual(fh.read(), crypto.get_pubkey_from_prv(prv_key)) + + do_test("rsa-key.pem", "rsa-key.pub.pem") + do_test("ec-key.pem", "ec-key.pub.pem") + def test_get_pubkey_from_crt_invalid_file(self): crypto = CryptUtil(conf.get_openssl_cmd()) prv_key = os.path.join(data_dir, "wire", "trans_prv_does_not_exist") diff --git a/tests/utils/test_extension_process_util.py b/tests/common/utils/test_extension_process_util.py similarity index 92% rename from tests/utils/test_extension_process_util.py rename to tests/common/utils/test_extension_process_util.py index a74c4ff73..316bad6a3 100644 --- a/tests/utils/test_extension_process_util.py +++ b/tests/common/utils/test_extension_process_util.py @@ -19,12 +19,12 @@ import subprocess import tempfile -from azurelinuxagent.common.cgroup import CpuCgroup +from azurelinuxagent.ga.cgroup import CpuCgroup from azurelinuxagent.common.exception import ExtensionError, ExtensionErrorCodes from azurelinuxagent.common.future import ustr -from azurelinuxagent.common.utils.extensionprocessutil import format_stdout_stderr, read_output, \ +from azurelinuxagent.ga.extensionprocessutil import format_stdout_stderr, read_output, \ wait_for_process_completion_or_timeout, handle_process_completion -from tests.tools import AgentTestCase, patch, data_dir +from tests.lib.tools import AgentTestCase, patch, data_dir class TestProcessUtils(AgentTestCase): @@ -68,7 +68,7 @@ def test_wait_for_process_completion_or_timeout_should_kill_process_on_timeout(s preexec_fn=os.setsid) # We don't actually mock the kill, just wrap it so we can assert its call count - with patch('azurelinuxagent.common.utils.extensionprocessutil.os.killpg', wraps=os.killpg) as patch_kill: + with patch('azurelinuxagent.ga.extensionprocessutil.os.killpg', wraps=os.killpg) as patch_kill: with patch('time.sleep') as mock_sleep: timed_out, ret, _ = wait_for_process_completion_or_timeout(process=process, timeout=timeout, cpu_cgroup=None) @@ -211,20 +211,20 @@ def test_handle_process_completion_should_raise_on_nonzero_exit_code(self): self.assertIn("Non-zero exit code:", ustr(context_manager.exception)) def test_read_output_should_return_no_content(self): - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 0): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 0): expected = "" actual = read_output(self.stdout, self.stderr) self.assertEqual(expected, actual) def test_read_output_should_truncate_the_content(self): - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 50): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 50): expected = "[stdout]\nr the lazy dog.\n\n" \ "[stderr]\ns jump quickly." actual = read_output(self.stdout, self.stderr) self.assertEqual(expected, actual) def test_read_output_should_not_truncate_the_content(self): - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 90): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 90): expected = "[stdout]\nThe quick brown fox jumps over the lazy dog.\n\n" \ "[stderr]\nThe five boxing wizards jump quickly." actual = read_output(self.stdout, self.stderr) @@ -240,7 +240,7 @@ def test_format_stdout_stderr00(self): stderr = "The five boxing wizards jump quickly." expected = "[stdout]\n{0}\n\n[stderr]\n{1}".format(stdout, stderr) - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 1000): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 1000): actual = format_stdout_stderr(stdout, stderr) self.assertEqual(expected, actual) @@ -254,7 +254,7 @@ def test_format_stdout_stderr01(self): # noinspection SpellCheckingInspection expected = '[stdout]\ns over the lazy dog.\n\n[stderr]\nizards jump quickly.' - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 60): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 60): actual = format_stdout_stderr(stdout, stderr) self.assertEqual(expected, actual) self.assertEqual(60, len(actual)) @@ -268,7 +268,7 @@ def test_format_stdout_stderr02(self): stderr = "The five boxing wizards jump quickly." expected = '[stdout]\nempty\n\n[stderr]\ns jump quickly.' - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 40): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 40): actual = format_stdout_stderr(stdout, stderr) self.assertEqual(expected, actual) self.assertEqual(40, len(actual)) @@ -282,7 +282,7 @@ def test_format_stdout_stderr03(self): stderr = "empty" expected = '[stdout]\nr the lazy dog.\n\n[stderr]\nempty' - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 40): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 40): actual = format_stdout_stderr(stdout, stderr) self.assertEqual(expected, actual) self.assertEqual(40, len(actual)) @@ -296,7 +296,7 @@ def test_format_stdout_stderr04(self): stderr = "The five boxing wizards jump quickly." expected = '' - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 4): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 4): actual = format_stdout_stderr(stdout, stderr) self.assertEqual(expected, actual) self.assertEqual(0, len(actual)) @@ -307,6 +307,6 @@ def test_format_stdout_stderr05(self): """ expected = '[stdout]\n\n\n[stderr]\n' - with patch('azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 1000): + with patch('azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN', 1000): actual = format_stdout_stderr('', '') self.assertEqual(expected, actual) diff --git a/tests/utils/test_file_util.py b/tests/common/utils/test_file_util.py similarity index 99% rename from tests/utils/test_file_util.py rename to tests/common/utils/test_file_util.py index 2dfa3bf96..f1514e5d0 100644 --- a/tests/utils/test_file_util.py +++ b/tests/common/utils/test_file_util.py @@ -27,7 +27,7 @@ import azurelinuxagent.common.utils.fileutil as fileutil from azurelinuxagent.common.future import ustr -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class TestFileOperations(AgentTestCase): diff --git a/tests/utils/test_flexible_version.py b/tests/common/utils/test_flexible_version.py similarity index 100% rename from tests/utils/test_flexible_version.py rename to tests/common/utils/test_flexible_version.py diff --git a/tests/utils/test_network_util.py b/tests/common/utils/test_network_util.py similarity index 99% rename from tests/utils/test_network_util.py rename to tests/common/utils/test_network_util.py index 4c3f5d014..e08f1aab3 100644 --- a/tests/utils/test_network_util.py +++ b/tests/common/utils/test_network_util.py @@ -19,7 +19,7 @@ from mock.mock import patch import azurelinuxagent.common.utils.networkutil as networkutil -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase class TestNetworkOperations(AgentTestCase): diff --git a/tests/utils/test_passwords.txt b/tests/common/utils/test_passwords.txt similarity index 100% rename from tests/utils/test_passwords.txt rename to tests/common/utils/test_passwords.txt diff --git a/tests/utils/test_rest_util.py b/tests/common/utils/test_rest_util.py similarity index 99% rename from tests/utils/test_rest_util.py rename to tests/common/utils/test_rest_util.py index a0b00f6cd..efcebb082 100644 --- a/tests/utils/test_rest_util.py +++ b/tests/common/utils/test_rest_util.py @@ -22,7 +22,7 @@ import azurelinuxagent.common.utils.restutil as restutil from azurelinuxagent.common.utils.restutil import HTTP_USER_AGENT from azurelinuxagent.common.future import httpclient, ustr -from tests.tools import AgentTestCase, call, Mock, MagicMock, patch +from tests.lib.tools import AgentTestCase, call, Mock, MagicMock, patch class TestIOErrorCounter(AgentTestCase): diff --git a/tests/utils/test_shell_util.py b/tests/common/utils/test_shell_util.py similarity index 97% rename from tests/utils/test_shell_util.py rename to tests/common/utils/test_shell_util.py index 83082bf7e..5eb5a83a6 100644 --- a/tests/utils/test_shell_util.py +++ b/tests/common/utils/test_shell_util.py @@ -18,14 +18,15 @@ import os import signal import subprocess +import sys import tempfile import threading import unittest from azurelinuxagent.common.future import ustr import azurelinuxagent.common.utils.shellutil as shellutil -from tests.tools import AgentTestCase, patch -from tests.utils.miscellaneous_tools import wait_for, format_processes +from tests.lib.tools import AgentTestCase, patch, skip_if_predicate_true +from tests.lib.miscellaneous_tools import wait_for, format_processes class ShellQuoteTestCase(AgentTestCase): @@ -225,6 +226,12 @@ def test_run_command_should_raise_an_exception_when_it_cannot_execute_the_comman self.__it_should_raise_an_exception_when_it_cannot_execute_the_command( lambda: shellutil.run_command("nonexistent_command")) + @skip_if_predicate_true(lambda: sys.version_info[0] == 2, "Timeouts are not supported on Python 2") + def test_run_command_should_raise_an_exception_when_the_command_times_out(self): + with self.assertRaises(shellutil.CommandError) as context: + shellutil.run_command(["sleep", "5"], timeout=1) + self.assertIn("command timeout", context.exception.stderr, "The command did not time out") + def test_run_pipe_should_raise_an_exception_when_it_cannot_execute_the_pipe(self): self.__it_should_raise_an_exception_when_it_cannot_execute_the_command( lambda: shellutil.run_pipe([["ls", "-ld", "."], ["nonexistent_command"], ["wc", "-l"]])) diff --git a/tests/utils/test_text_util.py b/tests/common/utils/test_text_util.py similarity index 99% rename from tests/utils/test_text_util.py rename to tests/common/utils/test_text_util.py index ff129c40b..5029cfb92 100644 --- a/tests/utils/test_text_util.py +++ b/tests/common/utils/test_text_util.py @@ -22,7 +22,7 @@ import azurelinuxagent.common.utils.textutil as textutil from azurelinuxagent.common.future import ustr -from tests.tools import AgentTestCase +from tests.lib.tools import AgentTestCase class TestTextUtil(AgentTestCase): diff --git a/tests/daemon/test_daemon.py b/tests/daemon/test_daemon.py index b5a75902b..4b34ddec7 100644 --- a/tests/daemon/test_daemon.py +++ b/tests/daemon/test_daemon.py @@ -22,7 +22,7 @@ import azurelinuxagent.common.conf as conf from azurelinuxagent.daemon.main import OPENSSL_FIPS_ENVIRONMENT, get_daemon_handler from azurelinuxagent.pa.provision.default import ProvisionHandler -from tests.tools import AgentTestCase, Mock, patch +from tests.lib.tools import AgentTestCase, Mock, patch class MockDaemonCall(object): diff --git a/tests/daemon/test_resourcedisk.py b/tests/daemon/test_resourcedisk.py index 301ac695e..092741442 100644 --- a/tests/daemon/test_resourcedisk.py +++ b/tests/daemon/test_resourcedisk.py @@ -15,10 +15,15 @@ # Requires Python 2.6+ and Openssl 1.0+ # +import os +import stat +import sys import unittest -from tests.tools import AgentTestCase, patch, DEFAULT +from tests.lib.tools import AgentTestCase, patch, DEFAULT +from azurelinuxagent.daemon.resourcedisk import get_resourcedisk_handler from azurelinuxagent.daemon.resourcedisk.default import ResourceDiskHandler +from azurelinuxagent.common.utils import shellutil class TestResourceDisk(AgentTestCase): @@ -80,6 +85,121 @@ def run_side_effect(*args, **kwargs): # pylint: disable=unused-argument size_mb=size_mb ) + def test_mkfile(self): + # setup + test_file = os.path.join(self.tmp_dir, 'test_file') + file_size = 1024 * 128 + if os.path.exists(test_file): + os.remove(test_file) + + # execute + get_resourcedisk_handler().mkfile(test_file, file_size) + + # assert + assert os.path.exists(test_file) + + # only the owner should have access + mode = os.stat(test_file).st_mode & ( + stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO) + assert mode == stat.S_IRUSR | stat.S_IWUSR + + # cleanup + os.remove(test_file) + + def test_mkfile_dd_fallback(self): + with patch.object(shellutil, "run") as run_patch: + # setup + run_patch.return_value = 1 + test_file = os.path.join(self.tmp_dir, 'test_file') + file_size = 1024 * 128 + + # execute + if sys.version_info >= (3, 3): + with patch("os.posix_fallocate", + side_effect=Exception('failure')): + get_resourcedisk_handler().mkfile(test_file, file_size) + else: + get_resourcedisk_handler().mkfile(test_file, file_size) + + # assert + assert run_patch.call_count > 1 + assert "fallocate" in run_patch.call_args_list[0][0][0] + assert "dd if" in run_patch.call_args_list[-1][0][0] + + def test_mkfile_xfs_fs(self): + # setup + test_file = os.path.join(self.tmp_dir, 'test_file') + file_size = 1024 * 128 + if os.path.exists(test_file): + os.remove(test_file) + + # execute + resource_disk_handler = get_resourcedisk_handler() + resource_disk_handler.fs = 'xfs' + + with patch.object(shellutil, "run") as run_patch: + resource_disk_handler.mkfile(test_file, file_size) + + # assert + if sys.version_info >= (3, 3): + with patch("os.posix_fallocate") as posix_fallocate: + self.assertEqual(0, posix_fallocate.call_count) + + assert run_patch.call_count == 1 + assert "dd if" in run_patch.call_args_list[0][0][0] + + def test_change_partition_type(self): + resource_handler = get_resourcedisk_handler() + # test when sfdisk --part-type does not exist + with patch.object(shellutil, "run_get_output", + side_effect=[[1, ''], [0, '']]) as run_patch: + resource_handler.change_partition_type( + suppress_message=True, option_str='') + + # assert + assert run_patch.call_count == 2 + assert "sfdisk --part-type" in run_patch.call_args_list[0][0][0] + assert "sfdisk -c" in run_patch.call_args_list[1][0][0] + + # test when sfdisk --part-type exists + with patch.object(shellutil, "run_get_output", + side_effect=[[0, '']]) as run_patch: + resource_handler.change_partition_type( + suppress_message=True, option_str='') + + # assert + assert run_patch.call_count == 1 + assert "sfdisk --part-type" in run_patch.call_args_list[0][0][0] + + def test_check_existing_swap_file(self): + test_file = os.path.join(self.tmp_dir, 'test_swap_file') + file_size = 1024 * 128 + if os.path.exists(test_file): + os.remove(test_file) + + with open(test_file, "wb") as file: # pylint: disable=redefined-builtin + file.write(bytearray(file_size)) + + os.chmod(test_file, stat.S_ISUID | stat.S_ISGID | stat.S_IRUSR | + stat.S_IWUSR | stat.S_IRWXG | stat.S_IRWXO) # 0o6677 + + def swap_on(_): # mimic the output of "swapon -s" + return [ + "Filename Type Size Used Priority", + "{0} partition 16498684 0 -2".format(test_file) + ] + + with patch.object(shellutil, "run_get_output", side_effect=swap_on): + get_resourcedisk_handler().check_existing_swap_file( + test_file, test_file, file_size) + + # it should remove access from group, others + mode = os.stat(test_file).st_mode & (stat.S_ISUID | stat.S_ISGID | + stat.S_IRWXU | stat.S_IWUSR | stat.S_IRWXG | stat.S_IRWXO) # 0o6777 + assert mode == stat.S_ISUID | stat.S_ISGID | stat.S_IRUSR | stat.S_IWUSR # 0o6600 + + os.remove(test_file) + if __name__ == '__main__': unittest.main() diff --git a/tests/distro/test_scvmm.py b/tests/daemon/test_scvmm.py similarity index 98% rename from tests/distro/test_scvmm.py rename to tests/daemon/test_scvmm.py index 109a96052..275f3f6e3 100644 --- a/tests/distro/test_scvmm.py +++ b/tests/daemon/test_scvmm.py @@ -26,7 +26,7 @@ from azurelinuxagent.common import conf from azurelinuxagent.common.osutil.default import DefaultOSUtil from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, Mock, patch +from tests.lib.tools import AgentTestCase, Mock, patch class TestSCVMM(AgentTestCase): diff --git a/tests/data/2 b/tests/data/2 new file mode 100644 index 000000000..38d819669 --- /dev/null +++ b/tests/data/2 @@ -0,0 +1,14 @@ +# This is private data. Do not parse. +ADDRESS=10.0.0.69 +NETMASK=255.255.255.0 +ROUTER=10.0.0.1 +SERVER_ADDRESS=168.63.129.16 +NEXT_SERVER=168.63.129.16 +T1=4294967295 +T2=4294967295 +LIFETIME=4294967295 +DNS=168.63.129.16 +DOMAINNAME=2rdlxelcdvjkok2emfc.bx.internal.cloudapp.net +ROUTES=0.0.0.0/0,10.0.0.1 168.63.129.16/32,10.0.0.1 169.254.169.254/32,10.0.0.1 +CLIENTID=ff0406a3a3000201120dc9092eccd2344 +OPTION_245=a83f8110 diff --git a/tests/data/config/waagent_auto_update_disabled.conf b/tests/data/config/waagent_auto_update_disabled.conf new file mode 100644 index 000000000..933c6b2b4 --- /dev/null +++ b/tests/data/config/waagent_auto_update_disabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=n + +# Enable or disable goal state processing auto-update, default is enabled +# AutoUpdate.UpdateToLatestVersion=y + diff --git a/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_disabled.conf b/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_disabled.conf new file mode 100644 index 000000000..484a3f222 --- /dev/null +++ b/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_disabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=n + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=n + diff --git a/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_enabled.conf b/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_enabled.conf new file mode 100644 index 000000000..2e6b51ce4 --- /dev/null +++ b/tests/data/config/waagent_auto_update_disabled_update_to_latest_version_enabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=n + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=y + diff --git a/tests/data/config/waagent_auto_update_enabled.conf b/tests/data/config/waagent_auto_update_enabled.conf new file mode 100644 index 000000000..1f9070ba6 --- /dev/null +++ b/tests/data/config/waagent_auto_update_enabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=y + +# Enable or disable goal state processing auto-update, default is enabled +# AutoUpdate.UpdateToLatestVersion=y + diff --git a/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_disabled.conf b/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_disabled.conf new file mode 100644 index 000000000..86a21ec3a --- /dev/null +++ b/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_disabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=y + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=n + diff --git a/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_enabled.conf b/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_enabled.conf new file mode 100644 index 000000000..497f03897 --- /dev/null +++ b/tests/data/config/waagent_auto_update_enabled_update_to_latest_version_enabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +AutoUpdate.Enabled=y + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=y + diff --git a/tests/data/config/waagent_update_to_latest_version_disabled.conf b/tests/data/config/waagent_update_to_latest_version_disabled.conf new file mode 100644 index 000000000..a2c7f859f --- /dev/null +++ b/tests/data/config/waagent_update_to_latest_version_disabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +# AutoUpdate.Enabled=n + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=n + diff --git a/tests/data/config/waagent_update_to_latest_version_enabled.conf b/tests/data/config/waagent_update_to_latest_version_enabled.conf new file mode 100644 index 000000000..48ed2e2de --- /dev/null +++ b/tests/data/config/waagent_update_to_latest_version_enabled.conf @@ -0,0 +1,11 @@ +# +# Microsoft Azure Linux Agent Configuration +# + +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility +# AutoUpdate.Enabled=n + +# Enable or disable goal state processing auto-update, default is enabled +AutoUpdate.UpdateToLatestVersion=y + diff --git a/tests/data/ga/WALinuxAgent-9.9.9.9-no_manifest.zip b/tests/data/ga/WALinuxAgent-9.9.9.10-no_manifest.zip similarity index 100% rename from tests/data/ga/WALinuxAgent-9.9.9.9-no_manifest.zip rename to tests/data/ga/WALinuxAgent-9.9.9.10-no_manifest.zip diff --git a/tests/data/hostgaplugin/ext_conf-requested_version.xml b/tests/data/hostgaplugin/ext_conf-agent_family_version.xml similarity index 97% rename from tests/data/hostgaplugin/ext_conf-requested_version.xml rename to tests/data/hostgaplugin/ext_conf-agent_family_version.xml index 48cc95cc9..5c9e0028f 100644 --- a/tests/data/hostgaplugin/ext_conf-requested_version.xml +++ b/tests/data/hostgaplugin/ext_conf-agent_family_version.xml @@ -4,6 +4,8 @@ Prod 9.9.9.10 + true + true https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml @@ -12,6 +14,8 @@ Test 9.9.9.10 + true + true https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml diff --git a/tests/data/hostgaplugin/ext_conf-rsm_version_properties_false.xml b/tests/data/hostgaplugin/ext_conf-rsm_version_properties_false.xml new file mode 100644 index 000000000..e1f1d6ba8 --- /dev/null +++ b/tests/data/hostgaplugin/ext_conf-rsm_version_properties_false.xml @@ -0,0 +1,152 @@ + + + + + Prod + 9.9.9.10 + false + false + + https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml + https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml + + + + Test + 9.9.9.10 + false + false + + https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml + https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml + + + + CentralUSEUAP + CRP + + + + MultipleExtensionsPerHandler + + +https://dcrcl3a0xs.blob.core.windows.net/$system/edp0plkw2b.86f4ae0a-61f8-48ae-9199-40f402d56864.status?sv=2018-03-28&sr=b&sk=system-1&sig=KNWgC2%3d&se=9999-01-01T00%3a00%3a00Z&sp=w + + + + https://zrdfepirv2cbn09pr02a.blob.core.windows.net/a47f0806d764480a8d989d009c75007d/Microsoft.Azure.Monitor_AzureMonitorLinuxAgent_useast2euap_manifest.xml + + + + + https://zrdfepirv2cbn06prdstr01a.blob.core.windows.net/4ef06ad957494df49c807a5334f2b5d2/Microsoft.Azure.Security.Monitoring_AzureSecurityLinuxAgent_useast2euap_manifest.xml + + + + + https://umsanh4b5rfz0q0p4pwm.blob.core.windows.net/5237dd14-0aad-f051-0fad-1e33e1b63091/5237dd14-0aad-f051-0fad-1e33e1b63091_manifest.xml + + + + + https://umsawqtlsshtn5v2nfgh.blob.core.windows.net/f4086d41-69f9-3103-78e0-8a2c7e789d0f/f4086d41-69f9-3103-78e0-8a2c7e789d0f_manifest.xml + + + + + https://umsah3cwjlctnmhsvzqv.blob.core.windows.net/2bbece4f-0283-d415-b034-cc0adc6997a1/2bbece4f-0283-d415-b034-cc0adc6997a1_manifest.xml + + + + + + { + "runtimeSettings": [ + { + "handlerSettings": { + "protectedSettingsCertThumbprint": "BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F", + "protectedSettings": "MIIBsAYJKoZIhvcNAQcDoIIBoTCCAZ0CAQAxggFpMIIBZQIBADBNMDkxNzA1BgoJkiaJk/IsZAEZFidXaW5kb3dzIEF6dXJlIENSUCBDZXJ0aWZpY2F0ZSBHZW5lcmF0b3ICEFpB/HKM/7evRk+DBz754wUwDQYJKoZIhvcNAQEBBQAEggEADPJwniDeIUXzxNrZCloitFdscQ59Bz1dj9DLBREAiM8jmxM0LLicTJDUv272Qm/4ZQgdqpFYBFjGab/9MX+Ih2x47FkVY1woBkckMaC/QOFv84gbboeQCmJYZC/rZJdh8rCMS+CEPq3uH1PVrvtSdZ9uxnaJ+E4exTPPviIiLIPtqWafNlzdbBt8HZjYaVw+SSe+CGzD2pAQeNttq3Rt/6NjCzrjG8ufKwvRoqnrInMs4x6nnN5/xvobKIBSv4/726usfk8Ug+9Q6Benvfpmre2+1M5PnGTfq78cO3o6mI3cPoBUjp5M0iJjAMGeMt81tyHkimZrEZm6pLa4NQMOEjArBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECC5nVaiJaWt+gAhgeYvxUOYHXw==", + "publicSettings": {"GCS_AUTO_CONFIG":true} + } + } + ] +} + + + { + "runtimeSettings": [ + { + "handlerSettings": { + "protectedSettingsCertThumbprint": "BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F", + "protectedSettings": "MIIBsAYJKoZIhvcNAQcDoIIBoTCCAZ0CAQAxggFpMIIBZQIBADBNMDkxNzA1BgoJkiaJk/IsZAEZFidXaW5kb3dzIEF6dXJlIENSUCBDZXJ0aWZpY2F0ZSBHZW5lcmF0b3ICEFpB/HKM/7evRk+DBz754wUwDQYJKoZIhvcNAQEBBQAEggEADPJwniDeIUXzxNrZCloitFdscQ59Bz1dj9DLBREAiM8jmxM0LLicTJDUv272Qm/4ZQgdqpFYBFjGab/9MX+Ih2x47FkVY1woBkckMaC/QOFv84gbboeQCmJYZC/rZJdh8rCMS+CEPq3uH1PVrvtSdZ9uxnaJ+E4exTPPviIiLIPtqWafNlzdbBt8HZjYaVw+SSe+CGzD2pAQeNttq3Rt/6NjCzrjG8ufKwvRoqnrInMs4x6nnN5/xvobKIBSv4/726usfk8Ug+9Q6Benvfpmre2+1M5PnGTfq78cO3o6mI3cPoBUjp5M0iJjAMGeMt81tyHkimZrEZm6pLa4NQMOEjArBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECC5nVaiJaWt+gAhgeYvxUOYHXw==", + "publicSettings": {"enableGenevaUpload":true} + } + } + ] +} + + + + + + { + "runtimeSettings": [ + { + "handlerSettings": { + "publicSettings": {"commandToExecute":"echo 'cee174d4-4daa-4b07-9958-53b9649445c2'"} + } + } + ] +} + + + + + + + + + + { + "runtimeSettings": [ + { + "handlerSettings": { + "publicSettings": {"source":{"script":"echo '4abb1e88-f349-41f8-8442-247d9fdfcac5'"}} + } + } + ] +} + { + "runtimeSettings": [ + { + "handlerSettings": { + "publicSettings": {"source":{"script":"echo 'e865c9bc-a7b3-42c6-9a79-cfa98a1ee8b3'"}} + } + } + ] +} + { + "runtimeSettings": [ + { + "handlerSettings": { + "publicSettings": {"source":{"script":"echo 'f923e416-0340-485c-9243-8b84fb9930c6'"}} + } + } + ] +} + + + { + "runtimeSettings": [ + { + "handlerSettings": { + "protectedSettingsCertThumbprint": "59A10F50FFE2A0408D3F03FE336C8FD5716CF25C", + "protectedSettings": "*** REDACTED ***" + } + } + ] +} + + +https://dcrcl3a0xs.blob.core.windows.net/$system/edp0plkw2b.86f4ae0a-61f8-48ae-9199-40f402d56864.vmSettings?sv=2018-03-28&sr=b&sk=system-1&sig=PaiLic%3d&se=9999-01-01T00%3a00%3a00Z&sp=r + diff --git a/tests/data/hostgaplugin/vm_settings-requested_version.json b/tests/data/hostgaplugin/vm_settings-agent_family_version.json similarity index 97% rename from tests/data/hostgaplugin/vm_settings-requested_version.json rename to tests/data/hostgaplugin/vm_settings-agent_family_version.json index 0f73cb255..734cc8147 100644 --- a/tests/data/hostgaplugin/vm_settings-requested_version.json +++ b/tests/data/hostgaplugin/vm_settings-agent_family_version.json @@ -29,6 +29,8 @@ { "name": "Prod", "version": "9.9.9.9", + "isVersionFromRSM": true, + "isVMEnabledForRSMUpgrades": true, "uris": [ "https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml", "https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml" @@ -37,6 +39,8 @@ { "name": "Test", "version": "9.9.9.9", + "isVersionFromRSM": true, + "isVMEnabledForRSMUpgrades": true, "uris": [ "https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml", "https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml" diff --git a/tests/data/hostgaplugin/vm_settings-requested_version_properties_false.json b/tests/data/hostgaplugin/vm_settings-requested_version_properties_false.json new file mode 100644 index 000000000..3a6eb8b1a --- /dev/null +++ b/tests/data/hostgaplugin/vm_settings-requested_version_properties_false.json @@ -0,0 +1,145 @@ +{ + "hostGAPluginVersion": "1.0.8.133", + "vmSettingsSchemaVersion": "0.0", + "activityId": "a33f6f53-43d6-4625-b322-1a39651a00c9", + "correlationId": "9a47a2a2-e740-4bfc-b11b-4f2f7cfe7d2e", + "inSvdSeqNo": 1, + "extensionsLastModifiedTickCount": 637726699999999999, + "extensionGoalStatesSource": "FastTrack", + "onHold": true, + "statusUploadBlob": { + "statusBlobType": "BlockBlob", + "value": "https://dcrcl3a0xs.blob.core.windows.net/$system/edp0plkw2b.86f4ae0a-61f8-48ae-9199-40f402d56864.status?sv=2018-03-28&sr=b&sk=system-1&sig=KNWgC2%3d&se=9999-01-01T00%3a00%3a00Z&sp=w" + }, + "inVMMetadata": { + "subscriptionId": "8e037ad4-618f-4466-8bc8-5099d41ac15b", + "resourceGroupName": "rg-dc-86fjzhp", + "vmName": "edp0plkw2b", + "location": "CentralUSEUAP", + "vmId": "86f4ae0a-61f8-48ae-9199-40f402d56864", + "vmSize": "Standard_B2s", + "osType": "Linux" + }, + "requiredFeatures": [ + { + "name": "MultipleExtensionsPerHandler" + } + ], + "gaFamilies": [ + { + "name": "Prod", + "version": "9.9.9.9", + "isVersionFromRSM": false, + "isVMEnabledForRSMUpgrades": false, + "uris": [ + "https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml", + "https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Prod_uscentraleuap_manifest.xml" + ] + }, + { + "name": "Test", + "version": "9.9.9.9", + "isVersionFromRSM": false, + "isVMEnabledForRSMUpgrades": false, + "uris": [ + "https://zrdfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml", + "https://ardfepirv2cdm03prdstr01a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_uscentraleuap_manifest.xml" + ] + } + ], + "extensionGoalStates": [ + { + "name": "Microsoft.Azure.Monitor.AzureMonitorLinuxAgent", + "version": "1.9.1", + "location": "https://zrdfepirv2cbn04prdstr01a.blob.core.windows.net/a47f0806d764480a8d989d009c75007d/Microsoft.Azure.Monitor_AzureMonitorLinuxAgent_useast2euap_manifest.xml", + "state": "enabled", + "autoUpgrade": true, + "runAsStartupTask": false, + "isJson": true, + "useExactVersion": true, + "settingsSeqNo": 0, + "settings": [ + { + "protectedSettingsCertThumbprint": "BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F", + "protectedSettings": "MIIBsAYJKoZIhvcNAQcDoIIBoTCCAZ0CAQAxggFpMIIBZQIBADBNMDkxNzA1BgoJkiaJk/IsZAEZFidXaW5kb3dzIEF6dXJlIENSUCBDZXJ0aWZpY2F0ZSBHZW5lcmF0b3ICEFpB/HKM/7evRk+DBz754wUwDQYJKoZIhvcNAQEBBQAEggEADPJwniDeIUXzxNrZCloitFdscQ59Bz1dj9DLBREAiM8jmxM0LLicTJDUv272Qm/4ZQgdqpFYBFjGab/9MX+Ih2x47FkVY1woBkckMaC/QOFv84gbboeQCmJYZC/rZJdh8rCMS+CEPq3uH1PVrvtSdZ9uxnaJ+E4exTPPviIiLIPtqWafNlzdbBt8HZjYaVw+SSe+CGzD2pAQeNttq3Rt/6NjCzrjG8ufKwvRoqnrInMs4x6nnN5/xvobKIBSv4/726usfk8Ug+9Q6Benvfpmre2+1M5PnGTfq78cO3o6mI3cPoBUjp5M0iJjAMGeMt81tyHkimZrEZm6pLa4NQMOEjArBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECC5nVaiJaWt+gAhgeYvxUOYHXw==", + "publicSettings": "{\"GCS_AUTO_CONFIG\":true}" + } + ] + }, + { + "name": "Microsoft.Azure.Security.Monitoring.AzureSecurityLinuxAgent", + "version": "2.15.112", + "location": "https://zrdfepirv2cbn04prdstr01a.blob.core.windows.net/4ef06ad957494df49c807a5334f2b5d2/Microsoft.Azure.Security.Monitoring_AzureSecurityLinuxAgent_useast2euap_manifest.xml", + "state": "enabled", + "autoUpgrade": true, + "runAsStartupTask": false, + "isJson": true, + "useExactVersion": true, + "settingsSeqNo": 0, + "settings": [ + { + "protectedSettingsCertThumbprint": "BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F", + "protectedSettings": "MIIBsAYJKoZIhvcNAQcDoIIBoTCCAZ0CAQAxggFpMIIBZQIBADBNMDkxNzA1BgoJkiaJk/IsZAEZFidXaW5kb3dzIEF6dXJlIENSUCBDZXJ0aWZpY2F0ZSBHZW5lcmF0b3ICEFpB/HKM/7evRk+DBz754wUwDQYJKoZIhvcNAQEBBQAEggEADPJwniDeIUXzxNrZCloitFdscQ59Bz1dj9DLBREAiM8jmxM0LLicTJDUv272Qm/4ZQgdqpFYBFjGab/9MX+Ih2x47FkVY1woBkckMaC/QOFv84gbboeQCmJYZC/rZJdh8rCMS+CEPq3uH1PVrvtSdZ9uxnaJ+E4exTPPviIiLIPtqWafNlzdbBt8HZjYaVw+SSe+CGzD2pAQeNttq3Rt/6NjCzrjG8ufKwvRoqnrInMs4x6nnN5/xvobKIBSv4/726usfk8Ug+9Q6Benvfpmre2+1M5PnGTfq78cO3o6mI3cPoBUjp5M0iJjAMGeMt81tyHkimZrEZm6pLa4NQMOEjArBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECC5nVaiJaWt+gAhgeYvxUOYHXw==", + "publicSettings": "{\"enableGenevaUpload\":true}" + } + ] + }, + { + "name": "Microsoft.Azure.Extensions.CustomScript", + "version": "2.1.6", + "location": "https://umsavwggj2v40kvqhc0w.blob.core.windows.net/5237dd14-0aad-f051-0fad-1e33e1b63091/5237dd14-0aad-f051-0fad-1e33e1b63091_manifest.xml", + "failoverlocation": "https://umsafwzhkbm1rfrhl0ws.blob.core.windows.net/5237dd14-0aad-f051-0fad-1e33e1b63091/5237dd14-0aad-f051-0fad-1e33e1b63091_manifest.xml", + "additionalLocations": [ + "https://umsanh4b5rfz0q0p4pwm.blob.core.windows.net/5237dd14-0aad-f051-0fad-1e33e1b63091/5237dd14-0aad-f051-0fad-1e33e1b63091_manifest.xml" + ], + "state": "enabled", + "autoUpgrade": true, + "runAsStartupTask": false, + "isJson": true, + "useExactVersion": true, + "settingsSeqNo": 0, + "isMultiConfig": false, + "settings": [ + { + "publicSettings": "{\"commandToExecute\":\"echo 'cee174d4-4daa-4b07-9958-53b9649445c2'\"}" + } + ] + }, + { + "name": "Microsoft.CPlat.Core.RunCommandHandlerLinux", + "version": "1.2.0", + "location": "https://umsavbvncrpzbnxmxzmr.blob.core.windows.net/f4086d41-69f9-3103-78e0-8a2c7e789d0f/f4086d41-69f9-3103-78e0-8a2c7e789d0f_manifest.xml", + "failoverlocation": "https://umsajbjtqrb3zqjvgb2z.blob.core.windows.net/f4086d41-69f9-3103-78e0-8a2c7e789d0f/f4086d41-69f9-3103-78e0-8a2c7e789d0f_manifest.xml", + "additionalLocations": [ + "https://umsawqtlsshtn5v2nfgh.blob.core.windows.net/f4086d41-69f9-3103-78e0-8a2c7e789d0f/f4086d41-69f9-3103-78e0-8a2c7e789d0f_manifest.xml" + ], + "state": "enabled", + "autoUpgrade": true, + "runAsStartupTask": false, + "isJson": true, + "useExactVersion": true, + "settingsSeqNo": 0, + "isMultiConfig": true, + "settings": [ + { + "publicSettings": "{\"source\":{\"script\":\"echo '4abb1e88-f349-41f8-8442-247d9fdfcac5'\"}}", + "seqNo": 0, + "extensionName": "MCExt1", + "extensionState": "enabled" + }, + { + "publicSettings": "{\"source\":{\"script\":\"echo 'e865c9bc-a7b3-42c6-9a79-cfa98a1ee8b3'\"}}", + "seqNo": 0, + "extensionName": "MCExt2", + "extensionState": "enabled" + }, + { + "publicSettings": "{\"source\":{\"script\":\"echo 'f923e416-0340-485c-9243-8b84fb9930c6'\"}}", + "seqNo": 0, + "extensionName": "MCExt3", + "extensionState": "enabled" + } + ] + } + ] +} \ No newline at end of file diff --git a/tests/data/test_waagent.conf b/tests/data/test_waagent.conf index cc60886e6..8fb051552 100644 --- a/tests/data/test_waagent.conf +++ b/tests/data/test_waagent.conf @@ -116,9 +116,13 @@ OS.SshDir=/notareal/path # OS.CheckRdmaDriver=n -# Enable or disable goal state processing auto-update, default is enabled +# Enable or disable goal state processing auto-update, default is enabled. +# Deprecated now but keep it for backward compatibility # AutoUpdate.Enabled=y +# Enable or disable goal state processing auto-update, default is enabled +# AutoUpdate.UpdateToLatestVersion=y + # Determine the update family, this should not be changed # AutoUpdate.GAFamily=Prod diff --git a/tests/data/wire/ec-key.pem b/tests/data/wire/ec-key.pem new file mode 100644 index 000000000..d157a12bb --- /dev/null +++ b/tests/data/wire/ec-key.pem @@ -0,0 +1,5 @@ +-----BEGIN EC PRIVATE KEY----- +MHcCAQEEIEydYXZkSbZjdKaNEurW6x2W3dEOC5+yDxM/Wkq1m6lUoAoGCCqGSM49 +AwEHoUQDQgAE8H1M+73QdzCyIDToTyU7OTMfi9cnIt8B4sz7e127ydNBVWjDwgGV +bKXPNtuQSWNgkfGW8A3tf9S8VcKNFxXaZg== +-----END EC PRIVATE KEY----- diff --git a/tests/data/wire/ec-key.pub.pem b/tests/data/wire/ec-key.pub.pem new file mode 100644 index 000000000..e29d8fb0b --- /dev/null +++ b/tests/data/wire/ec-key.pub.pem @@ -0,0 +1,4 @@ +-----BEGIN PUBLIC KEY----- +MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE8H1M+73QdzCyIDToTyU7OTMfi9cn +It8B4sz7e127ydNBVWjDwgGVbKXPNtuQSWNgkfGW8A3tf9S8VcKNFxXaZg== +-----END PUBLIC KEY----- diff --git a/tests/data/wire/ext_conf_missing_family.xml b/tests/data/wire/ext_conf_missing_family.xml index 058c40a88..10760a975 100644 --- a/tests/data/wire/ext_conf_missing_family.xml +++ b/tests/data/wire/ext_conf_missing_family.xml @@ -7,25 +7,6 @@ Prod - - Test - - https://mock-goal-state/rdfepirv2bl2prdstr01.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl2prdstr02.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl2prdstr03.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl2prdstr04.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl3prdstr01.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl3prdstr02.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/rdfepirv2bl3prdstr03.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl4prdstr01.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl4prdstr03.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl5prdstr02.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl5prdstr04.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl5prdstr06.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl5prdstr09a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - https://mock-goal-state/zrdfepirv2bl6prdstr02a.blob.core.windows.net/7d89d439b79f4452950452399add2c90/Microsoft.OSTCLinuxAgent_Test_useast_manifest.xml - - eastus diff --git a/tests/data/wire/ext_conf_requested_version.xml b/tests/data/wire/ext_conf_rsm_version.xml similarity index 89% rename from tests/data/wire/ext_conf_requested_version.xml rename to tests/data/wire/ext_conf_rsm_version.xml index d12352c29..806063541 100644 --- a/tests/data/wire/ext_conf_requested_version.xml +++ b/tests/data/wire/ext_conf_rsm_version.xml @@ -3,6 +3,8 @@ Prod 9.9.9.10 + True + True http://mock-goal-state/manifest_of_ga.xml @@ -10,6 +12,8 @@ Test 9.9.9.10 + True + True http://mock-goal-state/manifest_of_ga.xml diff --git a/tests/data/wire/ext_conf_version_missing_in_agent_family.xml b/tests/data/wire/ext_conf_version_missing_in_agent_family.xml new file mode 100644 index 000000000..3f81ed119 --- /dev/null +++ b/tests/data/wire/ext_conf_version_missing_in_agent_family.xml @@ -0,0 +1,31 @@ + + + + Prod + True + True + + http://mock-goal-state/manifest_of_ga.xml + + + + Test + True + True + + http://mock-goal-state/manifest_of_ga.xml + + + + + + + + + + {"runtimeSettings":[{"handlerSettings":{"protectedSettingsCertThumbprint":"BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F","protectedSettings":"MIICWgYJK","publicSettings":{"foo":"bar"}}}]} + + + +https://test.blob.core.windows.net/vhds/test-cs12.test-cs12.test-cs12.status?sr=b&sp=rw&se=9999-01-01&sk=key1&sv=2014-02-14&sig=hfRh7gzUE7sUtYwke78IOlZOrTRCYvkec4hGZ9zZzXo + diff --git a/tests/data/wire/ext_conf_missing_requested_version.xml b/tests/data/wire/ext_conf_version_missing_in_manifest.xml similarity index 89% rename from tests/data/wire/ext_conf_missing_requested_version.xml rename to tests/data/wire/ext_conf_version_missing_in_manifest.xml index 84043e2d7..c750d5d3a 100644 --- a/tests/data/wire/ext_conf_missing_requested_version.xml +++ b/tests/data/wire/ext_conf_version_missing_in_manifest.xml @@ -4,6 +4,8 @@ Prod 5.2.1.0 + True + True http://mock-goal-state/manifest_of_ga.xml @@ -11,6 +13,8 @@ Test 5.2.1.0 + True + True http://mock-goal-state/manifest_of_ga.xml diff --git a/tests/data/wire/ext_conf_version_not_from_rsm.xml b/tests/data/wire/ext_conf_version_not_from_rsm.xml new file mode 100644 index 000000000..9da8f5da7 --- /dev/null +++ b/tests/data/wire/ext_conf_version_not_from_rsm.xml @@ -0,0 +1,33 @@ + + + + Prod + 9.9.9.10 + False + True + + http://mock-goal-state/manifest_of_ga.xml + + + + Test + 9.9.9.10 + False + True + + http://mock-goal-state/manifest_of_ga.xml + + + + + + + + + + {"runtimeSettings":[{"handlerSettings":{"protectedSettingsCertThumbprint":"BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F","protectedSettings":"MIICWgYJK","publicSettings":{"foo":"bar"}}}]} + + + +https://test.blob.core.windows.net/vhds/test-cs12.test-cs12.test-cs12.status?sr=b&sp=rw&se=9999-01-01&sk=key1&sv=2014-02-14&sig=hfRh7gzUE7sUtYwke78IOlZOrTRCYvkec4hGZ9zZzXo + diff --git a/tests/data/wire/ext_conf_vm_not_enabled_for_rsm_upgrades.xml b/tests/data/wire/ext_conf_vm_not_enabled_for_rsm_upgrades.xml new file mode 100644 index 000000000..384723f46 --- /dev/null +++ b/tests/data/wire/ext_conf_vm_not_enabled_for_rsm_upgrades.xml @@ -0,0 +1,33 @@ + + + + Prod + 9.9.9.10 + False + False + + http://mock-goal-state/manifest_of_ga.xml + + + + Test + 9.9.9.10 + False + False + + http://mock-goal-state/manifest_of_ga.xml + + + + + + + + + + {"runtimeSettings":[{"handlerSettings":{"protectedSettingsCertThumbprint":"BD447EF71C3ADDF7C837E84D630F3FAC22CCD22F","protectedSettings":"MIICWgYJK","publicSettings":{"foo":"bar"}}}]} + + + +https://test.blob.core.windows.net/vhds/test-cs12.test-cs12.test-cs12.status?sr=b&sp=rw&se=9999-01-01&sk=key1&sv=2014-02-14&sig=hfRh7gzUE7sUtYwke78IOlZOrTRCYvkec4hGZ9zZzXo + diff --git a/tests/data/wire/ga_manifest.xml b/tests/data/wire/ga_manifest.xml index e12f05491..c51bdbbc4 100644 --- a/tests/data/wire/ga_manifest.xml +++ b/tests/data/wire/ga_manifest.xml @@ -25,10 +25,13 @@ 2.1.0http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__2.1.0 + + 2.5.0http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__2.5.0 + 9.9.9.10 - http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__99999.0.0.0 + http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__9.9.9.10 diff --git a/tests/data/wire/ga_manifest_no_uris.xml b/tests/data/wire/ga_manifest_no_uris.xml new file mode 100644 index 000000000..89573ad63 --- /dev/null +++ b/tests/data/wire/ga_manifest_no_uris.xml @@ -0,0 +1,39 @@ + + + + + 1.0.0 + + http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__1.0.0 + + + + 1.1.0 + + http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__1.1.0 + + + + 1.2.0 + + http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__1.2.0 + + + + 2.0.0http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__2.0.0 + + + 2.1.0http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__2.1.0 + + + 9.9.9.10 + + http://mock-goal-state/ga-manifests/OSTCExtensions.WALinuxAgent__99999.0.0.0 + + + + 99999.0.0.0 + + + + diff --git a/tests/data/wire/rsa-key.pem b/tests/data/wire/rsa-key.pem new file mode 100644 index 000000000..d59f8391b --- /dev/null +++ b/tests/data/wire/rsa-key.pem @@ -0,0 +1,28 @@ +-----BEGIN PRIVATE KEY----- +MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQDe7cwx76yO+OjR +hWHJrKt0L1ih9F/Bctyq7Ddi/v3CitVBvkQUve4k+xeT538mHyeoOuGI3QFs5mLh +i535zbOFaHwfMMQI/CI4ZDtRrQh59XrJSsPytu0fXihsJ81IwNURuNDKwxYR0tKI +KUuUN4YxsDSBeqvP5vjSKT05f90gniscuGvPJ6Zgyynmg56KQtSXKaetbyNzPW/4 +QFmadyqsgdR7oZHEYj+1Tl6T9/tAPg/dgO55hT7WVdC8JxXeSiaDyRS1NRMFL0bC +fcnLNsO4tni2WJsfuju9a4GTrWe3NQ3+vsQV5s59MtuOhoObuYNVcETYiEjBVVsf ++shxRxL/AgMBAAECggEAfslt/eSbFoFYIHmkoQe0R5L57LpIj4QdHpTT91igyDkf +ipGEtOtEewHXagYaWXsUmehLBwTy35W0HSTDxyQHetNu7GpWw+lqKPpQhmZL0Nkd +aUg9Y1hISjPJ96E3bq5FQBwFm5wSfDaUCF68HmLpzm6xngY/mzF4yEYuDPq8r+RV +SDhVtrovSImpwLbKmPdn634PqC6bPDgO5htkT/lL/TVkR3Sla3U/YYMu90m7DiAA +46DEblx0yt+zBB+mKR3TU4zIPSFiTWYs/Srsm6nUnNqjf5rvupvXFZt0/eDZat7/ +L+/V5HPV0BxGIkCGt0Uv+qZYMGpC3eU+aEbByOr/wQKBgQDy+l4Rvgl0i+XzUPyw +N6UrDDpxBVsZ/w48DrBEBMQqTbZxVDK77E2CeMK/JlYMFYFdIT/c9W0U7eWPqe35 +kk9jVsPXc3xeoSiZvqK4CZeHEugE9OtJ4jJL1CfDXMcgPM+iSSj/QOJc5v7891QH +3gMOvmVk3Kk/I2MyBAEE6p6WHwKBgQDq4FvO77tsIZRkgmp3gPg4iImcTgwrgDxz +aHqlSVc98o4jzWsUShbZTwRgfcZm+kD3eas+gkux8CevYhwjafWiukrnwu3xvUaO +AKmgXU7ud/kS9bK/AT6ZpJsfoZzM/CQsConFbz0eXVb/tmipCBpyzi2yskLdk6SP +pEZYISknIQKBgHwE9PzjXdoiChYekUu0q1aEoFPN4wkq2W4oJSoisKnTDrtbuaWX +4Jwm3WhJvgPe+i+55+n1T18uakzg9Hm9h03yHHYdGS8H3TxURKPhKXmlWc4l4O7O +SNPRjxY1heHbiDOSWh2nVaMLuL0P1NFLLY5Z+lD4HF8AxgHib06+HoILAoGBALvg +oa+jNhGlvrSzWYSkJmnaVfEwwS1e03whe9GRG/cSeb6Lx3agWSyUt1ST50tiLOuI +aIGE6hW4m5X/7bAqRvFXASnoVDtFgxV91DHR0ZyRXSxcWxHMZg2yjN89gFa77hdI +irHibEpIsZm0iH2FXNqusAE79J6XRlAcQKSoSenhAoGARAP9q1WaftXdK4X7L1Ut +wnWJSVYMx6AsEo58SsJgNGqpbCl/vZMCwnSo6pdgO4xInu2tld3TKdPWZLoRCGCo +PDYVM1GXj5SS8QPmq+h/6fxS65Gl0h0oHUcKXoPD+AxHn2MWWqWzxMdRuthUQATE +MT+l5wgZPiEuiceY3Bp1hYk= +-----END PRIVATE KEY----- diff --git a/tests/data/wire/rsa-key.pub.pem b/tests/data/wire/rsa-key.pub.pem new file mode 100644 index 000000000..940785f40 --- /dev/null +++ b/tests/data/wire/rsa-key.pub.pem @@ -0,0 +1,9 @@ +-----BEGIN PUBLIC KEY----- +MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3u3MMe+sjvjo0YVhyayr +dC9YofRfwXLcquw3Yv79worVQb5EFL3uJPsXk+d/Jh8nqDrhiN0BbOZi4Yud+c2z +hWh8HzDECPwiOGQ7Ua0IefV6yUrD8rbtH14obCfNSMDVEbjQysMWEdLSiClLlDeG +MbA0gXqrz+b40ik9OX/dIJ4rHLhrzyemYMsp5oOeikLUlymnrW8jcz1v+EBZmncq +rIHUe6GRxGI/tU5ek/f7QD4P3YDueYU+1lXQvCcV3komg8kUtTUTBS9Gwn3JyzbD +uLZ4tlibH7o7vWuBk61ntzUN/r7EFebOfTLbjoaDm7mDVXBE2IhIwVVbH/rIcUcS +/wIDAQAB +-----END PUBLIC KEY----- diff --git a/tests/distro/test_resourceDisk.py b/tests/distro/test_resourceDisk.py deleted file mode 100644 index 04acd3915..000000000 --- a/tests/distro/test_resourceDisk.py +++ /dev/null @@ -1,148 +0,0 @@ -# Copyright 2018 Microsoft Corporation -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# -# Requires Python 2.6+ and Openssl 1.0+ -# -# Implements parts of RFC 2131, 1541, 1497 and -# http://msdn.microsoft.com/en-us/library/cc227282%28PROT.10%29.aspx -# http://msdn.microsoft.com/en-us/library/cc227259%28PROT.13%29.aspx - -import os -import stat -import sys -import unittest -from azurelinuxagent.common.utils import shellutil -from azurelinuxagent.daemon.resourcedisk import get_resourcedisk_handler -from tests.tools import AgentTestCase, patch - - -class TestResourceDisk(AgentTestCase): - def test_mkfile(self): - # setup - test_file = os.path.join(self.tmp_dir, 'test_file') - file_size = 1024 * 128 - if os.path.exists(test_file): - os.remove(test_file) - - # execute - get_resourcedisk_handler().mkfile(test_file, file_size) - - # assert - assert os.path.exists(test_file) - - # only the owner should have access - mode = os.stat(test_file).st_mode & ( - stat.S_IRWXU | stat.S_IRWXG | stat.S_IRWXO) - assert mode == stat.S_IRUSR | stat.S_IWUSR - - # cleanup - os.remove(test_file) - - def test_mkfile_dd_fallback(self): - with patch.object(shellutil, "run") as run_patch: - # setup - run_patch.return_value = 1 - test_file = os.path.join(self.tmp_dir, 'test_file') - file_size = 1024 * 128 - - # execute - if sys.version_info >= (3, 3): - with patch("os.posix_fallocate", - side_effect=Exception('failure')): - get_resourcedisk_handler().mkfile(test_file, file_size) - else: - get_resourcedisk_handler().mkfile(test_file, file_size) - - # assert - assert run_patch.call_count > 1 - assert "fallocate" in run_patch.call_args_list[0][0][0] - assert "dd if" in run_patch.call_args_list[-1][0][0] - - def test_mkfile_xfs_fs(self): - # setup - test_file = os.path.join(self.tmp_dir, 'test_file') - file_size = 1024 * 128 - if os.path.exists(test_file): - os.remove(test_file) - - # execute - resource_disk_handler = get_resourcedisk_handler() - resource_disk_handler.fs = 'xfs' - - with patch.object(shellutil, "run") as run_patch: - resource_disk_handler.mkfile(test_file, file_size) - - # assert - if sys.version_info >= (3, 3): - with patch("os.posix_fallocate") as posix_fallocate: - self.assertEqual(0, posix_fallocate.call_count) - - assert run_patch.call_count == 1 - assert "dd if" in run_patch.call_args_list[0][0][0] - - def test_change_partition_type(self): - resource_handler = get_resourcedisk_handler() - # test when sfdisk --part-type does not exist - with patch.object(shellutil, "run_get_output", - side_effect=[[1, ''], [0, '']]) as run_patch: - resource_handler.change_partition_type( - suppress_message=True, option_str='') - - # assert - assert run_patch.call_count == 2 - assert "sfdisk --part-type" in run_patch.call_args_list[0][0][0] - assert "sfdisk -c" in run_patch.call_args_list[1][0][0] - - # test when sfdisk --part-type exists - with patch.object(shellutil, "run_get_output", - side_effect=[[0, '']]) as run_patch: - resource_handler.change_partition_type( - suppress_message=True, option_str='') - - # assert - assert run_patch.call_count == 1 - assert "sfdisk --part-type" in run_patch.call_args_list[0][0][0] - - def test_check_existing_swap_file(self): - test_file = os.path.join(self.tmp_dir, 'test_swap_file') - file_size = 1024 * 128 - if os.path.exists(test_file): - os.remove(test_file) - - with open(test_file, "wb") as file: # pylint: disable=redefined-builtin - file.write(bytearray(file_size)) - - os.chmod(test_file, stat.S_ISUID | stat.S_ISGID | stat.S_IRUSR | - stat.S_IWUSR | stat.S_IRWXG | stat.S_IRWXO) # 0o6677 - - def swap_on(_): # mimic the output of "swapon -s" - return [ - "Filename Type Size Used Priority", - "{0} partition 16498684 0 -2".format(test_file) - ] - - with patch.object(shellutil, "run_get_output", side_effect=swap_on): - get_resourcedisk_handler().check_existing_swap_file( - test_file, test_file, file_size) - - # it should remove access from group, others - mode = os.stat(test_file).st_mode & (stat.S_ISUID | stat.S_ISGID | - stat.S_IRWXU | stat.S_IWUSR | stat.S_IRWXG | stat.S_IRWXO) # 0o6777 - assert mode == stat.S_ISUID | stat.S_ISGID | stat.S_IRUSR | stat.S_IWUSR # 0o6600 - - os.remove(test_file) - - -if __name__ == '__main__': - unittest.main() diff --git a/tests/ga/test_agent_update_handler.py b/tests/ga/test_agent_update_handler.py new file mode 100644 index 000000000..c6e41469f --- /dev/null +++ b/tests/ga/test_agent_update_handler.py @@ -0,0 +1,537 @@ +import contextlib +import json +import os + +from azurelinuxagent.common import conf +from azurelinuxagent.common.event import WALAEventOperation +from azurelinuxagent.common.exception import AgentUpgradeExitException +from azurelinuxagent.common.future import ustr, httpclient +from azurelinuxagent.common.protocol.restapi import VMAgentUpdateStatuses + +from azurelinuxagent.common.protocol.util import ProtocolUtil +from azurelinuxagent.common.version import CURRENT_VERSION, AGENT_NAME +from azurelinuxagent.ga.agent_update_handler import get_agent_update_handler +from azurelinuxagent.ga.guestagent import GuestAgent +from tests.ga.test_update import UpdateTestCase +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.wire_protocol_data import DATA_FILE +from tests.lib.tools import clear_singleton_instances, load_bin_data, patch + + +class TestAgentUpdate(UpdateTestCase): + + def setUp(self): + UpdateTestCase.setUp(self) + # Since ProtocolUtil is a singleton per thread, we need to clear it to ensure that the test cases do not + # reuse a previous state + clear_singleton_instances(ProtocolUtil) + + @contextlib.contextmanager + def _get_agent_update_handler(self, test_data=None, autoupdate_frequency=0.001, autoupdate_enabled=True, protocol_get_error=False, mock_get_header=None, mock_put_header=None): + # Default to DATA_FILE of test_data parameter raises the pylint warning + # W0102: Dangerous default value DATA_FILE (builtins.dict) as argument (dangerous-default-value) + test_data = DATA_FILE if test_data is None else test_data + + with mock_wire_protocol(test_data) as protocol: + + def get_handler(url, **kwargs): + if HttpRequestPredicates.is_agent_package_request(url): + if not protocol_get_error: + agent_pkg = load_bin_data(self._get_agent_file_name(), self._agent_zip_dir) + return MockHttpResponse(status=httpclient.OK, body=agent_pkg) + else: + return MockHttpResponse(status=httpclient.SERVICE_UNAVAILABLE) + + return protocol.mock_wire_data.mock_http_get(url, **kwargs) + + def put_handler(url, *args, **_): + if HttpRequestPredicates.is_host_plugin_status_request(url): + # Skip reading the HostGA request data as its encoded + return MockHttpResponse(status=500) + protocol.aggregate_status = json.loads(args[0]) + return MockHttpResponse(status=201) + + http_get_handler = mock_get_header if mock_get_header else get_handler + http_put_handler = mock_put_header if mock_put_header else put_handler + + protocol.set_http_handlers(http_get_handler=http_get_handler, http_put_handler=http_put_handler) + + with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=autoupdate_enabled): + with patch("azurelinuxagent.common.conf.get_autoupdate_frequency", return_value=autoupdate_frequency): + with patch("azurelinuxagent.common.conf.get_autoupdate_gafamily", return_value="Prod"): + with patch("azurelinuxagent.common.conf.get_enable_ga_versioning", return_value=True): + with patch("azurelinuxagent.common.event.EventLogger.add_event") as mock_telemetry: + agent_update_handler = get_agent_update_handler(protocol) + agent_update_handler._protocol = protocol + yield agent_update_handler, mock_telemetry + + def _assert_agent_directories_available(self, versions): + for version in versions: + self.assertTrue(os.path.exists(self.agent_dir(version)), "Agent directory {0} not found".format(version)) + + def _assert_agent_directories_exist_and_others_dont_exist(self, versions): + self._assert_agent_directories_available(versions=versions) + other_agents = [agent_dir for agent_dir in self.agent_dirs() if + agent_dir not in [self.agent_dir(version) for version in versions]] + self.assertFalse(any(other_agents), + "All other agents should be purged from agent dir: {0}".format(other_agents)) + + def _assert_agent_rsm_version_in_goal_state(self, mock_telemetry, inc=1, version="9.9.9.10"): + upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + 'New agent version:{0} requested by RSM in Goal state incarnation_{1}'.format(version, inc) in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade] + self.assertEqual(1, len(upgrade_event_msgs), + "Did not find the event indicating that the agent requested version found. Got: {0}".format( + mock_telemetry.call_args_list)) + + def _assert_update_discovered_from_agent_manifest(self, mock_telemetry, inc=1, version="9.9.9.10"): + upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + 'Self-update is ready to upgrade the new agent: {0} now before processing the goal state: incarnation_{1}'.format(version, inc) in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade] + self.assertEqual(1, len(upgrade_event_msgs), + "Did not find the event indicating that the new version found. Got: {0}".format( + mock_telemetry.call_args_list)) + + def _assert_no_agent_package_telemetry_emitted(self, mock_telemetry, version="9.9.9.10"): + upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + 'No matching package found in the agent manifest for version: {0}'.format(version) in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade] + self.assertEqual(1, len(upgrade_event_msgs), + "Did not find the event indicating that the agent package not found. Got: {0}".format( + mock_telemetry.call_args_list)) + + def _assert_agent_exit_process_telemetry_emitted(self, message): + self.assertIn("Current Agent {0} completed all update checks, exiting current process".format(CURRENT_VERSION), message) + + def test_it_should_not_update_when_autoupdate_disabled(self): + self.prepare_agents(count=1) + with self._get_agent_update_handler(autoupdate_enabled=False) as (agent_update_handler, mock_telemetry): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + self.assertEqual(0, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "requesting a new agent version" in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "should not check for rsm version") + + def test_it_should_update_to_largest_version_if_ga_versioning_disabled(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + with patch.object(conf, "get_enable_ga_versioning", return_value=False): + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_update_discovered_from_agent_manifest(mock_telemetry, version="99999.0.0.0") + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), "99999.0.0.0"]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_not_update_to_largest_version_if_time_window_not_elapsed(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file["ga_manifest"] = "wire/ga_manifest_no_uris.xml" + with self._get_agent_update_handler(test_data=data_file, autoupdate_frequency=10) as (agent_update_handler, _): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + agent_update_handler._protocol.mock_wire_data.set_ga_manifest("wire/ga_manifest.xml") + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + + def test_it_should_update_to_largest_version_if_time_window_elapsed(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file["ga_manifest"] = "wire/ga_manifest_no_uris.xml" + with patch("azurelinuxagent.common.conf.get_self_update_hotfix_frequency", return_value=0.001): + with patch("azurelinuxagent.common.conf.get_self_update_regular_frequency", return_value=0.001): + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + agent_update_handler._protocol.mock_wire_data.set_ga_manifest("wire/ga_manifest.xml") + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_update_discovered_from_agent_manifest(mock_telemetry, inc=2, version="99999.0.0.0") + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), "99999.0.0.0"]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_not_allow_update_if_largest_version_below_current_version(self): + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file["ga_manifest"] = "wire/ga_manifest_no_upgrade.xml" + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + + def test_it_should_update_to_largest_version_if_rsm_version_not_available(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf.xml" + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_update_discovered_from_agent_manifest(mock_telemetry, version="99999.0.0.0") + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), "99999.0.0.0"]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_not_download_manifest_again_if_last_attempted_download_time_not_elapsed(self): + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf.xml" + with self._get_agent_update_handler(test_data=data_file, autoupdate_frequency=10, protocol_get_error=True) as (agent_update_handler, _): + # making multiple agent update attempts + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + mock_wire_data = agent_update_handler._protocol.mock_wire_data + self.assertEqual(1, mock_wire_data.call_counts['manifest_of_ga.xml'], "Agent manifest should not be downloaded again") + + def test_it_should_download_manifest_if_last_attempted_download_time_is_elapsed(self): + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf.xml" + + with self._get_agent_update_handler(test_data=data_file, autoupdate_frequency=0.00001, protocol_get_error=True) as (agent_update_handler, _): + # making multiple agent update attempts + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + mock_wire_data = agent_update_handler._protocol.mock_wire_data + self.assertEqual(3, mock_wire_data.call_counts['manifest_of_ga.xml'], "Agent manifest should be downloaded in all attempts") + + def test_it_should_not_agent_update_if_rsm_version_is_same_as_current_version(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family( + str(CURRENT_VERSION)) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self.assertEqual(0, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "requesting a new agent version" in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "rsm version should be same as current version") + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + + def test_it_should_upgrade_agent_if_rsm_version_is_available_greater_than_current_version(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_rsm_version_in_goal_state(mock_telemetry, version="9.9.9.10") + self._assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10", str(CURRENT_VERSION)]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_downgrade_agent_if_rsm_version_is_available_less_than_current_version(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + downgraded_version = "2.5.0" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family(downgraded_version) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_rsm_version_in_goal_state(mock_telemetry, inc=2, version=downgraded_version) + self._assert_agent_directories_exist_and_others_dont_exist( + versions=[downgraded_version, str(CURRENT_VERSION)]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_not_do_rsm_update_if_gs_not_updated_in_next_attempt(self): + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + version = "5.2.0.1" + with self._get_agent_update_handler(test_data=data_file, autoupdate_frequency=10) as (agent_update_handler, mock_telemetry): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family(version) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self._assert_agent_rsm_version_in_goal_state(mock_telemetry, inc=2, version=version) + self._assert_no_agent_package_telemetry_emitted(mock_telemetry, version=version) + # Now we shouldn't check for download if update not allowed(GS not updated).This run should not add new logs + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), False) + self._assert_agent_rsm_version_in_goal_state(mock_telemetry, inc=2, version=version) + self._assert_no_agent_package_telemetry_emitted(mock_telemetry, version=version) + + def test_it_should_not_downgrade_below_daemon_version(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + downgraded_version = "1.2.0" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family(downgraded_version) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self.assertFalse(os.path.exists(self.agent_dir(downgraded_version)), + "New agent directory should not be found") + + def test_it_should_update_to_largest_version_if_vm_not_enabled_for_rsm_upgrades(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf_vm_not_enabled_for_rsm_upgrades.xml" + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + with self.assertRaises(AgentUpgradeExitException) as context: + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_update_discovered_from_agent_manifest(mock_telemetry, version="99999.0.0.0") + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), "99999.0.0.0"]) + self._assert_agent_exit_process_telemetry_emitted(ustr(context.exception.reason)) + + def test_it_should_not_update_to_version_if_version_not_from_rsm(self): + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_version_not_from_rsm.xml" + downgraded_version = "2.5.0" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family(downgraded_version) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_directories_exist_and_others_dont_exist( + versions=[str(CURRENT_VERSION)]) + self.assertFalse(os.path.exists(self.agent_dir(downgraded_version)), + "New agent directory should not be found") + + def test_handles_if_rsm_version_not_found_in_pkgs_to_download(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + version = "5.2.0.4" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family(version) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self._assert_agent_rsm_version_in_goal_state(mock_telemetry, inc=2, version=version) + self.assertFalse(os.path.exists(self.agent_dir(version)), + "New agent directory should not be found") + + self._assert_no_agent_package_telemetry_emitted(mock_telemetry, version=version) + + def test_handles_missing_agent_family(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_missing_family.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "No manifest links found for agent family" in kwarg[ + 'message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "Agent manifest should not be in GS") + + # making multiple agent update attempts and assert only one time logged + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), False) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), False) + + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "No manifest links found for agent family" in kwarg[ + 'message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), + "Agent manifest error should be logged once if it's same goal state") + + def test_it_should_report_update_status_with_success(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family( + str(CURRENT_VERSION)) + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + vm_agent_update_status = agent_update_handler.get_vmagent_update_status() + self.assertEqual(VMAgentUpdateStatuses.Success, vm_agent_update_status.status) + self.assertEqual(0, vm_agent_update_status.code) + self.assertEqual(str(CURRENT_VERSION), vm_agent_update_status.expected_version) + + def test_it_should_report_update_status_with_error_on_download_fail(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + with self._get_agent_update_handler(test_data=data_file, protocol_get_error=True) as (agent_update_handler, _): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + vm_agent_update_status = agent_update_handler.get_vmagent_update_status() + self.assertEqual(VMAgentUpdateStatuses.Error, vm_agent_update_status.status) + self.assertEqual(1, vm_agent_update_status.code) + self.assertEqual("9.9.9.10", vm_agent_update_status.expected_version) + self.assertIn("Failed to download agent package from all URIs", vm_agent_update_status.message) + + def test_it_should_report_update_status_with_missing_rsm_version_error(self): + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf_version_missing_in_agent_family.xml" + + with self._get_agent_update_handler(test_data=data_file, protocol_get_error=True) as (agent_update_handler, _): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + vm_agent_update_status = agent_update_handler.get_vmagent_update_status() + self.assertEqual(VMAgentUpdateStatuses.Error, vm_agent_update_status.status) + self.assertEqual(1, vm_agent_update_status.code) + self.assertIn("missing version property. So, skipping agent update", vm_agent_update_status.message) + + def test_it_should_not_log_same_error_next_hours(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_missing_family.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") + + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "No manifest links found for agent family" in kwarg[ + 'message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "Agent manifest should not be in GS") + + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "No manifest links found for agent family" in kwarg[ + 'message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "Agent manifest should not be in GS") + + def test_it_should_save_rsm_state_of_the_most_recent_goal_state(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + with self.assertRaises(AgentUpgradeExitException): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + state_file = os.path.join(conf.get_lib_dir(), "rsm_update.json") + self.assertTrue(os.path.exists(state_file), "The rsm state file was not saved (can't find {0})".format(state_file)) + + # check if state gets updated if most recent goal state has different values + agent_update_handler._protocol.mock_wire_data.set_extension_config_is_vm_enabled_for_rsm_upgrades("False") + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + with self.assertRaises(AgentUpgradeExitException): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self.assertFalse(os.path.exists(state_file), "The rsm file should be removed (file: {0})".format(state_file)) + + def test_it_should_not_update_to_latest_if_flag_is_disabled(self): + self.prepare_agents(count=1) + + data_file = DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf.xml" + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, _): + with patch("azurelinuxagent.common.conf.get_auto_update_to_latest_version", return_value=False): + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + + def test_it_should_continue_with_update_if_number_of_update_attempts_less_than_3(self): + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + latest_version = self.prepare_agents(count=2) + self.expand_agents() + latest_path = os.path.join(self.tmp_dir, "{0}-{1}".format(AGENT_NAME, latest_version)) + agent = GuestAgent.from_installed_agent(latest_path) + # marking agent as bad agent on first attempt + agent.mark_failure(is_fatal=True) + agent.inc_update_attempt_count() + self.assertTrue(agent.is_blacklisted, "Agent should be blacklisted") + self.assertEqual(1, agent.get_update_attempt_count(), "Agent update attempts should be 1") + with self._get_agent_update_handler(test_data=data_file) as (agent_update_handler, mock_telemetry): + # Rest 2 attempts it should continue with update even agent is marked as bad agent in first attempt + for i in range(2): + with self.assertRaises(AgentUpgradeExitException): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family( + str(latest_version)) + agent_update_handler._protocol.mock_wire_data.set_version_in_ga_manifest(str(latest_version)) + agent_update_handler._protocol.mock_wire_data.set_incarnation(i+2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION), str(latest_version)]) + agent = GuestAgent.from_installed_agent(latest_path) + self.assertFalse(agent.is_blacklisted, "Agent should not be blacklisted") + self.assertEqual(i+2, agent.get_update_attempt_count(), "Agent update attempts should be {0}".format(i+2)) + + # check if next update is not attempted + agent.mark_failure(is_fatal=True) + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + agent = GuestAgent.from_installed_agent(latest_path) + self.assertTrue(agent.is_blacklisted, "Agent should be blacklisted") + self.assertEqual(3, agent.get_update_attempt_count(), "Agent update attempts should be 3") + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "Attempted enough update retries for version: {0} but still agent not recovered from bad state".format(latest_version) in kwarg[ + 'message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), + "Update is not allowed after 3 attempts") + + def test_it_should_fail_the_update_if_agent_pkg_is_invalid(self): + agent_uri = 'https://foo.blob.core.windows.net/bar/OSTCExtensions.WALinuxAgent__9.9.9.10' + + def http_get_handler(uri, *_, **__): + if uri in (agent_uri, 'http://168.63.129.16:32526/extensionArtifact'): + response = load_bin_data("ga/WALinuxAgent-9.9.9.10-no_manifest.zip") + return MockHttpResponse(status=httpclient.OK, body=response) + return None + self.prepare_agents(count=1) + data_file = DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + with self._get_agent_update_handler(test_data=data_file, mock_get_header=http_get_handler) as (agent_update_handler, mock_telemetry): + agent_update_handler._protocol.mock_wire_data.set_version_in_agent_family("9.9.9.10") + agent_update_handler._protocol.mock_wire_data.set_incarnation(2) + agent_update_handler._protocol.client.update_goal_state() + agent_update_handler.run(agent_update_handler._protocol.get_goal_state(), True) + self._assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + self.assertEqual(1, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if + "Downloaded agent package: WALinuxAgent-9.9.9.10 is missing agent handler manifest file" in kwarg['message'] and kwarg[ + 'op'] == WALAEventOperation.AgentUpgrade]), "Agent update should fail") diff --git a/tests/common/test_cgroupapi.py b/tests/ga/test_cgroupapi.py similarity index 91% rename from tests/common/test_cgroupapi.py rename to tests/ga/test_cgroupapi.py index a31d57d72..ad8ef80c2 100644 --- a/tests/common/test_cgroupapi.py +++ b/tests/ga/test_cgroupapi.py @@ -22,13 +22,13 @@ import subprocess import tempfile -from azurelinuxagent.common.cgroupapi import CGroupsApi, SystemdCgroupsApi -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroupapi import CGroupsApi, SystemdCgroupsApi +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.osutil import systemd from azurelinuxagent.common.utils import fileutil -from tests.common.mock_cgroup_environment import mock_cgroup_environment -from tests.tools import AgentTestCase, patch, mock_sleep -from tests.utils.cgroups_tools import CGroupsTools +from tests.lib.mock_cgroup_environment import mock_cgroup_environment +from tests.lib.tools import AgentTestCase, patch, mock_sleep +from tests.lib.cgroups_tools import CGroupsTools class _MockedFileSystemTestCase(AgentTestCase): def setUp(self): @@ -39,7 +39,7 @@ def setUp(self): os.mkdir(os.path.join(self.cgroups_file_system_root, "cpu")) os.mkdir(os.path.join(self.cgroups_file_system_root, "memory")) - self.mock_cgroups_file_system_root = patch("azurelinuxagent.common.cgroupapi.CGROUPS_FILE_SYSTEM_ROOT", self.cgroups_file_system_root) + self.mock_cgroups_file_system_root = patch("azurelinuxagent.ga.cgroupapi.CGROUPS_FILE_SYSTEM_ROOT", self.cgroups_file_system_root) self.mock_cgroups_file_system_root.start() def tearDown(self): @@ -56,24 +56,26 @@ def test_cgroups_should_be_supported_only_on_ubuntu16_centos7dot4_redhat7dot4_an (['ubuntu', '18.10', 'cosmic'], True), (['ubuntu', '20.04', 'focal'], True), (['ubuntu', '20.10', 'groovy'], True), - (['centos', '7.8', 'Source'], False), - (['redhat', '7.8', 'Maipo'], False), - (['redhat', '7.9.1908', 'Core'], False), - (['centos', '8.1', 'Source'], False), - (['redhat', '8.2', 'Maipo'], False), - (['redhat', '8.2.2111', 'Core'], False), (['centos', '7.4', 'Source'], False), (['redhat', '7.4', 'Maipo'], False), (['centos', '7.5', 'Source'], False), (['centos', '7.3', 'Maipo'], False), (['redhat', '7.2', 'Maipo'], False), + (['centos', '7.8', 'Source'], False), + (['redhat', '7.8', 'Maipo'], False), + (['redhat', '7.9.1908', 'Core'], False), + (['centos', '8.1', 'Source'], True), + (['redhat', '8.2', 'Maipo'], True), + (['redhat', '8.2.2111', 'Core'], True), + (['redhat', '9.1', 'Core'], False), + (['centos', '9.1', 'Source'], False), (['bigip', '15.0.1', 'Final'], False), (['gaia', '273.562', 'R80.30'], False), (['debian', '9.1', ''], False), ] for (distro, supported) in test_cases: - with patch("azurelinuxagent.common.cgroupapi.get_distro", return_value=distro): + with patch("azurelinuxagent.ga.cgroupapi.get_distro", return_value=distro): self.assertEqual(CGroupsApi.cgroups_supported(), supported, "cgroups_supported() failed on {0}".format(distro)) @@ -150,7 +152,7 @@ def mock_popen(command, *args, **kwargs): with mock_cgroup_environment(self.tmp_dir): with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as output_file: - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen) as popen_patch: # pylint: disable=unused-variable + with patch("subprocess.Popen", side_effect=mock_popen) as popen_patch: # pylint: disable=unused-variable command_output = SystemdCgroupsApi().start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", command="A_TEST_COMMAND", @@ -191,7 +193,7 @@ def test_start_extension_command_should_execute_the_command_in_a_cgroup(self, _) @patch('time.sleep', side_effect=lambda _: mock_sleep()) def test_start_extension_command_should_use_systemd_to_execute_the_command(self, _): with mock_cgroup_environment(self.tmp_dir): - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: + with patch("subprocess.Popen", wraps=subprocess.Popen) as popen_patch: SystemdCgroupsApi().start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", command="the-test-extension-command", @@ -219,7 +221,7 @@ def test_cleanup_legacy_cgroups_should_remove_legacy_cgroups(self): legacy_cpu_cgroup = CGroupsTools.create_legacy_agent_cgroup(self.cgroups_file_system_root, "cpu", '') legacy_memory_cgroup = CGroupsTools.create_legacy_agent_cgroup(self.cgroups_file_system_root, "memory", '') - with patch("azurelinuxagent.common.cgroupapi.get_agent_pid_file_path", return_value=daemon_pid_file): + with patch("azurelinuxagent.ga.cgroupapi.get_agent_pid_file_path", return_value=daemon_pid_file): legacy_cgroups = SystemdCgroupsApi().cleanup_legacy_cgroups() self.assertEqual(legacy_cgroups, 2, "cleanup_legacy_cgroups() did not find all the expected cgroups") diff --git a/tests/common/test_cgroupconfigurator.py b/tests/ga/test_cgroupconfigurator.py similarity index 96% rename from tests/common/test_cgroupconfigurator.py rename to tests/ga/test_cgroupconfigurator.py index 7e2dc45b4..b5a9e0994 100644 --- a/tests/common/test_cgroupconfigurator.py +++ b/tests/ga/test_cgroupconfigurator.py @@ -29,18 +29,18 @@ from nose.plugins.attrib import attr from azurelinuxagent.common import conf -from azurelinuxagent.common.cgroup import AGENT_NAME_TELEMETRY, MetricsCounter, MetricValue, MetricsCategory, CpuCgroup -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator, DisableCgroups -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import AGENT_NAME_TELEMETRY, MetricsCounter, MetricValue, MetricsCategory, CpuCgroup +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator, DisableCgroups +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.event import WALAEventOperation from azurelinuxagent.common.exception import CGroupsException, ExtensionError, ExtensionErrorCodes, \ AgentMemoryExceededException from azurelinuxagent.common.future import ustr from azurelinuxagent.common.utils import shellutil, fileutil -from tests.common.mock_environment import MockCommand -from tests.common.mock_cgroup_environment import mock_cgroup_environment, UnitFilePaths -from tests.tools import AgentTestCase, patch, mock_sleep, i_am_root, data_dir, is_python_version_26_or_34, skip_if_predicate_true -from tests.utils.miscellaneous_tools import format_processes, wait_for +from tests.lib.mock_environment import MockCommand +from tests.lib.mock_cgroup_environment import mock_cgroup_environment, UnitFilePaths +from tests.lib.tools import AgentTestCase, patch, mock_sleep, i_am_root, data_dir, is_python_version_26_or_34, skip_if_predicate_true +from tests.lib.miscellaneous_tools import format_processes, wait_for class CGroupConfiguratorSystemdTestCase(AgentTestCase): @@ -361,7 +361,7 @@ def test_start_extension_command_should_not_use_systemd_when_cgroups_are_not_ena with self._get_cgroup_configurator() as configurator: configurator.disable("UNIT TEST", DisableCgroups.ALL) - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as patcher: + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as patcher: configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", command="date", @@ -381,7 +381,7 @@ def test_start_extension_command_should_not_use_systemd_when_cgroups_are_not_ena @patch('time.sleep', side_effect=lambda _: mock_sleep()) def test_start_extension_command_should_use_systemd_run_when_cgroups_are_enabled(self, _): with self._get_cgroup_configurator() as configurator: - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", command="the-test-extension-command", @@ -432,7 +432,7 @@ def mock_popen(command_arg, *args, **kwargs): raise Exception("A TEST EXCEPTION") return original_popen(command_arg, *args, **kwargs) - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen): + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", side_effect=mock_popen): with self.assertRaises(Exception) as context_manager: configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", @@ -454,7 +454,7 @@ def test_start_extension_command_should_disable_cgroups_and_invoke_the_command_d configurator.mocks.add_command(MockCommand("systemd-run", return_value=1, stdout='', stderr='Failed to start transient scope unit: syntax error')) with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as output_file: - with patch("azurelinuxagent.common.cgroupconfigurator.add_event") as mock_add_event: + with patch("azurelinuxagent.ga.cgroupconfigurator.add_event") as mock_add_event: with patch("subprocess.Popen", wraps=subprocess.Popen) as popen_patch: CGroupsTelemetry.reset() @@ -539,7 +539,7 @@ def test_start_extension_command_should_not_use_fallback_option_if_extension_fai with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stdout: with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stderr: - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: with self.assertRaises(ExtensionError) as context_manager: configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", @@ -567,7 +567,7 @@ def test_start_extension_command_should_not_use_fallback_option_if_extension_fai @skip_if_predicate_true(is_python_version_26_or_34, "Disabled on Python 2.6 and 3.4 for now. Need to revisit to fix it") @attr('requires_sudo') @patch('time.sleep', side_effect=lambda _: mock_sleep()) - @patch("azurelinuxagent.common.utils.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN", 5) + @patch("azurelinuxagent.ga.extensionprocessutil.TELEMETRY_MESSAGE_MAX_LEN", 5) def test_start_extension_command_should_not_use_fallback_option_if_extension_fails_with_long_output(self, *args): self.assertTrue(i_am_root(), "Test does not run when non-root") @@ -579,7 +579,7 @@ def test_start_extension_command_should_not_use_fallback_option_if_extension_fai with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stdout: with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stderr: - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", wraps=subprocess.Popen) as popen_patch: with self.assertRaises(ExtensionError) as context_manager: configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", @@ -613,9 +613,9 @@ def test_start_extension_command_should_not_use_fallback_option_if_extension_tim with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stdout: with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stderr: - with patch("azurelinuxagent.common.utils.extensionprocessutil.wait_for_process_completion_or_timeout", + with patch("azurelinuxagent.ga.extensionprocessutil.wait_for_process_completion_or_timeout", return_value=[True, None, 0]): - with patch("azurelinuxagent.common.cgroupapi.SystemdCgroupsApi._is_systemd_failure", + with patch("azurelinuxagent.ga.cgroupapi.SystemdCgroupsApi._is_systemd_failure", return_value=False): with self.assertRaises(ExtensionError) as context_manager: configurator.start_extension_command( @@ -654,7 +654,7 @@ def mock_popen(command, *args, **kwargs): with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stdout: with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stderr: - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen): + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", side_effect=mock_popen): # We expect this call to fail because of the syntax error process_output = configurator.start_extension_command( extension_name="Microsoft.Compute.TestExtension-1.2.3", @@ -896,7 +896,7 @@ def mock_popen(command, *args, **kwargs): return process with patch('time.sleep', side_effect=lambda _: original_sleep(0.1)): # start_extension_command has a small delay; skip it - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", side_effect=mock_popen): + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", side_effect=mock_popen): with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stdout: with tempfile.TemporaryFile(dir=self.tmp_dir, mode="w+b") as stderr: configurator.start_extension_command( @@ -943,7 +943,7 @@ def get_completed_process(): agent_processes = [os.getppid(), os.getpid()] + agent_command_processes + [start_extension.systemd_run_pid] other_processes = [1, get_completed_process()] + extension_processes - with patch("azurelinuxagent.common.cgroupconfigurator.CGroupsApi.get_processes_in_cgroup", return_value=agent_processes + other_processes): + with patch("azurelinuxagent.ga.cgroupconfigurator.CGroupsApi.get_processes_in_cgroup", return_value=agent_processes + other_processes): with self.assertRaises(CGroupsException) as context_manager: configurator._check_processes_in_agent_cgroup() @@ -987,7 +987,7 @@ def test_check_cgroups_should_disable_cgroups_when_a_check_fails(self): patchers.append(p) p.start() - with patch("azurelinuxagent.common.cgroupconfigurator.add_event") as add_event: + with patch("azurelinuxagent.ga.cgroupconfigurator.add_event") as add_event: configurator.enable() tracked_metrics = [ @@ -1017,7 +1017,7 @@ def test_check_agent_memory_usage_should_raise_a_cgroups_exception_when_the_limi with self.assertRaises(AgentMemoryExceededException) as context_manager: with self._get_cgroup_configurator() as configurator: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_tracked_metrics") as tracked_metrics: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_tracked_metrics") as tracked_metrics: tracked_metrics.return_value = metrics configurator.check_agent_memory_usage() diff --git a/tests/common/test_cgroups.py b/tests/ga/test_cgroups.py similarity index 98% rename from tests/common/test_cgroups.py rename to tests/ga/test_cgroups.py index 7f549e5b8..0ffcfed1b 100644 --- a/tests/common/test_cgroups.py +++ b/tests/ga/test_cgroups.py @@ -22,11 +22,11 @@ import random import shutil -from azurelinuxagent.common.cgroup import CpuCgroup, MemoryCgroup, MetricsCounter, CounterNotFound +from azurelinuxagent.ga.cgroup import CpuCgroup, MemoryCgroup, MetricsCounter, CounterNotFound from azurelinuxagent.common.exception import CGroupsException from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, patch, data_dir +from tests.lib.tools import AgentTestCase, patch, data_dir def consume_cpu_time(): diff --git a/tests/common/test_cgroupstelemetry.py b/tests/ga/test_cgroupstelemetry.py similarity index 84% rename from tests/common/test_cgroupstelemetry.py rename to tests/ga/test_cgroupstelemetry.py index fe1ff299a..26fcecbf6 100644 --- a/tests/common/test_cgroupstelemetry.py +++ b/tests/ga/test_cgroupstelemetry.py @@ -19,10 +19,10 @@ import random import time -from azurelinuxagent.common.cgroup import CpuCgroup, MemoryCgroup -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import CpuCgroup, MemoryCgroup +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, data_dir, patch +from tests.lib.tools import AgentTestCase, data_dir, patch def raise_ioerror(*_): @@ -136,12 +136,12 @@ def test_telemetry_polling_with_active_cgroups(self, *args): # pylint: disable= self._track_new_extension_cgroups(num_extensions) - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage") as patch_get_memory_max_usage: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.try_swap_memory_usage") as patch_try_swap_memory_usage: - with patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") as patch_get_cpu_usage: - with patch("azurelinuxagent.common.cgroup.CGroup.is_active") as patch_is_active: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage") as patch_get_memory_max_usage: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.try_swap_memory_usage") as patch_try_swap_memory_usage: + with patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") as patch_get_cpu_usage: + with patch("azurelinuxagent.ga.cgroup.CGroup.is_active") as patch_is_active: patch_is_active.return_value = True current_cpu = 30 @@ -163,10 +163,10 @@ def test_telemetry_polling_with_active_cgroups(self, *args): # pylint: disable= self.assertEqual(len(metrics), num_extensions * num_of_metrics_per_extn_expected) self._assert_polled_metrics_equal(metrics, current_cpu, current_memory, current_max_memory, current_swap_memory) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage", side_effect=raise_ioerror) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) - @patch("azurelinuxagent.common.cgroup.CGroup.is_active", return_value=False) + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.CGroup.is_active", return_value=False) def test_telemetry_polling_with_inactive_cgroups(self, *_): num_extensions = 5 no_extensions_expected = 0 # pylint: disable=unused-variable @@ -182,10 +182,10 @@ def test_telemetry_polling_with_inactive_cgroups(self, *_): self.assertEqual(len(metrics), 0) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage") - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") - @patch("azurelinuxagent.common.cgroup.CGroup.is_active") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") + @patch("azurelinuxagent.ga.cgroup.CGroup.is_active") def test_telemetry_polling_with_changing_cgroups_state(self, patch_is_active, patch_get_cpu_usage, # pylint: disable=unused-argument patch_get_mem, patch_get_max_mem, *args): num_extensions = 5 @@ -274,11 +274,11 @@ def test_telemetry_polling_to_generate_transient_logs_index_error(self): CGroupsTelemetry.poll_all_tracked() self.assertEqual(expected_call_count, patch_periodic_warn.call_count) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.try_swap_memory_usage") - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage") - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") - @patch("azurelinuxagent.common.cgroup.CGroup.is_active") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.try_swap_memory_usage") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") + @patch("azurelinuxagent.ga.cgroup.CGroup.is_active") def test_telemetry_calculations(self, patch_is_active, patch_get_cpu_usage, patch_get_memory_usage, patch_get_memory_max_usage, patch_try_memory_swap_usage, *args): # pylint: disable=unused-argument num_polls = 10 @@ -321,13 +321,13 @@ def test_cgroup_is_tracked(self, *args): # pylint: disable=unused-argument self.assertFalse(CGroupsTelemetry.is_tracked("not_present_cpu_dummy_path")) self.assertFalse(CGroupsTelemetry.is_tracked("not_present_memory_dummy_path")) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) def test_process_cgroup_metric_with_no_memory_cgroup_mounted(self, *args): # pylint: disable=unused-argument num_extensions = 5 self._track_new_extension_cgroups(num_extensions) - with patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") as patch_get_cpu_usage: - with patch("azurelinuxagent.common.cgroup.CGroup.is_active") as patch_is_active: + with patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") as patch_get_cpu_usage: + with patch("azurelinuxagent.ga.cgroup.CGroup.is_active") as patch_is_active: patch_is_active.return_value = True current_cpu = 30 @@ -341,16 +341,16 @@ def test_process_cgroup_metric_with_no_memory_cgroup_mounted(self, *args): # py self.assertEqual(len(metrics), num_extensions * 1) # Only CPU populated self._assert_polled_metrics_equal(metrics, current_cpu, 0, 0, 0) - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) def test_process_cgroup_metric_with_no_cpu_cgroup_mounted(self, *args): # pylint: disable=unused-argument num_extensions = 5 self._track_new_extension_cgroups(num_extensions) - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage") as patch_get_memory_max_usage: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: - with patch("azurelinuxagent.common.cgroup.MemoryCgroup.try_swap_memory_usage") as patch_try_swap_memory_usage: - with patch("azurelinuxagent.common.cgroup.CGroup.is_active") as patch_is_active: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage") as patch_get_memory_max_usage: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") as patch_get_memory_usage: + with patch("azurelinuxagent.ga.cgroup.MemoryCgroup.try_swap_memory_usage") as patch_try_swap_memory_usage: + with patch("azurelinuxagent.ga.cgroup.CGroup.is_active") as patch_is_active: patch_is_active.return_value = True current_memory = 209715200 @@ -367,14 +367,14 @@ def test_process_cgroup_metric_with_no_cpu_cgroup_mounted(self, *args): # pylin self.assertEqual(len(metrics), num_extensions * 3) self._assert_polled_metrics_equal(metrics, 0, current_memory, current_max_memory, current_swap_memory) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_max_memory_usage", side_effect=raise_ioerror) - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_max_memory_usage", side_effect=raise_ioerror) + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage", side_effect=raise_ioerror) def test_extension_telemetry_not_sent_for_empty_perf_metrics(self, *args): # pylint: disable=unused-argument num_extensions = 5 self._track_new_extension_cgroups(num_extensions) - with patch("azurelinuxagent.common.cgroup.CGroup.is_active") as patch_is_active: + with patch("azurelinuxagent.ga.cgroup.CGroup.is_active") as patch_is_active: patch_is_active.return_value = False poll_count = 1 @@ -383,9 +383,9 @@ def test_extension_telemetry_not_sent_for_empty_perf_metrics(self, *args): # py metrics = CGroupsTelemetry.poll_all_tracked() self.assertEqual(0, len(metrics)) - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_throttled_time") - @patch("azurelinuxagent.common.cgroup.CGroup.is_active") + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_throttled_time") + @patch("azurelinuxagent.ga.cgroup.CGroup.is_active") def test_cgroup_telemetry_should_not_report_cpu_negative_value(self, patch_is_active, path_get_throttled_time, patch_get_cpu_usage): num_polls = 5 diff --git a/tests/ga/test_collect_logs.py b/tests/ga/test_collect_logs.py index 14593726d..4ac3f03fb 100644 --- a/tests/ga/test_collect_logs.py +++ b/tests/ga/test_collect_logs.py @@ -18,17 +18,17 @@ import os from azurelinuxagent.common import logger, conf -from azurelinuxagent.common.cgroup import CpuCgroup, MemoryCgroup, MetricValue -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator +from azurelinuxagent.ga.cgroup import CpuCgroup, MemoryCgroup, MetricValue +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator from azurelinuxagent.common.logger import Logger from azurelinuxagent.common.protocol.util import ProtocolUtil from azurelinuxagent.common.utils import fileutil from azurelinuxagent.ga.collect_logs import get_collect_logs_handler, is_log_collection_allowed, \ get_log_collector_monitor_handler -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE -from tests.tools import Mock, MagicMock, patch, AgentTestCase, clear_singleton_instances, skip_if_predicate_true, \ +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE +from tests.lib.tools import Mock, MagicMock, patch, AgentTestCase, clear_singleton_instances, skip_if_predicate_true, \ is_python_version_26, data_dir @@ -225,7 +225,7 @@ def test_send_extension_metrics_telemetry(self, patch_poll_resource_usage, patch @patch("azurelinuxagent.ga.collect_logs.LogCollectorMonitorHandler._poll_resource_usage") def test_verify_log_collector_memory_limit_exceeded(self, patch_poll_resource_usage, mock_exit): with _create_log_collector_monitor_handler() as log_collector_monitor_handler: - with patch("azurelinuxagent.common.cgroupconfigurator.LOGCOLLECTOR_MEMORY_LIMIT", 8): + with patch("azurelinuxagent.ga.cgroupconfigurator.LOGCOLLECTOR_MEMORY_LIMIT", 8): patch_poll_resource_usage.return_value = [MetricValue("Process", "% Processor Time", "service", 1), MetricValue("Process", "Throttled Time", "service", 1), MetricValue("Memory", "Total Memory Usage", "service", 9), diff --git a/tests/ga/test_collect_telemetry_events.py b/tests/ga/test_collect_telemetry_events.py index bdd763eff..509af2cef 100644 --- a/tests/ga/test_collect_telemetry_events.py +++ b/tests/ga/test_collect_telemetry_events.py @@ -36,8 +36,8 @@ CommonTelemetryEventSchema from azurelinuxagent.common.utils import fileutil from azurelinuxagent.ga.collect_telemetry_events import ExtensionEventSchema, _ProcessExtensionEvents -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.tools import AgentTestCase, clear_singleton_instances, data_dir +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.tools import AgentTestCase, clear_singleton_instances, data_dir class TestExtensionTelemetryHandler(AgentTestCase, HttpRequestPredicates): diff --git a/tests/ga/test_env.py b/tests/ga/test_env.py index aa4b74ab1..29ca6fec1 100644 --- a/tests/ga/test_env.py +++ b/tests/ga/test_env.py @@ -19,7 +19,7 @@ from azurelinuxagent.common.osutil import get_osutil from azurelinuxagent.common.osutil.default import DefaultOSUtil, shellutil from azurelinuxagent.ga.env import MonitorDhcpClientRestart -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class MonitorDhcpClientRestartTestCase(AgentTestCase): diff --git a/tests/ga/test_extension.py b/tests/ga/test_extension.py index 2272a1907..62bd11099 100644 --- a/tests/ga/test_extension.py +++ b/tests/ga/test_extension.py @@ -28,16 +28,14 @@ import unittest from azurelinuxagent.common import conf -from azurelinuxagent.common.agent_supported_feature import get_agent_supported_features_list_for_extensions, \ - get_agent_supported_features_list_for_crp -from azurelinuxagent.common.cgroupconfigurator import CGroupConfigurator +from azurelinuxagent.common.agent_supported_feature import get_agent_supported_features_list_for_crp +from azurelinuxagent.ga.cgroupconfigurator import CGroupConfigurator from azurelinuxagent.common.datacontract import get_properties from azurelinuxagent.common.event import WALAEventOperation from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common.utils.fileutil import read_file from azurelinuxagent.common.utils.flexible_version import FlexibleVersion -from azurelinuxagent.common.version import PY_VERSION_MAJOR, PY_VERSION_MINOR, PY_VERSION_MICRO, AGENT_NAME, \ - AGENT_VERSION +from azurelinuxagent.common.version import AGENT_VERSION from azurelinuxagent.common.exception import ResourceGoneError, ExtensionDownloadError, ProtocolError, \ ExtensionErrorCodes, ExtensionError, GoalStateAggregateStatusCodes from azurelinuxagent.common.protocol.restapi import ExtensionSettings, Extension, ExtHandlerStatus, \ @@ -50,12 +48,12 @@ get_exthandlers_handler, ExtCommandEnvVariable, HandlerManifest, NOT_RUN, \ ExtensionStatusValue, HANDLER_COMPLETE_NAME_PATTERN, HandlerEnvironment, GoalStateStatus -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE, DATA_FILE_EXT_ADDITIONAL_LOCATIONS -from tests.tools import AgentTestCase, data_dir, MagicMock, Mock, patch, mock_sleep -from tests.ga.extension_emulator import Actions, ExtensionCommandNames, extension_emulator, \ +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE, DATA_FILE_EXT_ADDITIONAL_LOCATIONS +from tests.lib.tools import AgentTestCase, data_dir, MagicMock, Mock, patch, mock_sleep +from tests.lib.extension_emulator import Actions, ExtensionCommandNames, extension_emulator, \ enable_invocations, generate_put_handler # Mocking the original sleep to reduce test execution time @@ -137,7 +135,7 @@ def mock_http_put(url, *args, **_): yield exthandlers_handler, protocol, no_of_extensions def test_cleanup_leaves_installed_extensions(self): - with self._setup_test_env(mockwiredata.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, no_of_exts): + with self._setup_test_env(wire_protocol_data.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, no_of_exts): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -147,7 +145,7 @@ def test_cleanup_leaves_installed_extensions(self): version="1.0.0") def test_cleanup_removes_uninstalled_extensions(self): - with self._setup_test_env(mockwiredata.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, no_of_exts): + with self._setup_test_env(wire_protocol_data.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, no_of_exts): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() self._assert_ext_handler_status(protocol.aggregate_status, "Ready", expected_ext_handler_count=no_of_exts, @@ -167,7 +165,7 @@ def test_cleanup_removes_uninstalled_extensions(self): self.assertEqual(0, TestExtensionCleanup._count_extension_directories(), "All extension directories should be removed") def test_cleanup_removes_orphaned_packages(self): - data_file = mockwiredata.DATA_FILE_NO_EXT.copy() + data_file = wire_protocol_data.DATA_FILE_NO_EXT.copy() data_file["ext_conf"] = "wire/ext_conf_no_extensions-no_status_blob.xml" no_of_orphaned_packages = 5 @@ -197,8 +195,8 @@ def test_cleanup_leaves_failed_extensions(self): def mock_fail_popen(*args, **kwargs): # pylint: disable=unused-argument return original_popen("fail_this_command", **kwargs) - with self._setup_test_env(mockwiredata.DATA_FILE_EXT_SINGLE) as (exthandlers_handler, protocol, no_of_exts): - with patch("azurelinuxagent.common.cgroupapi.subprocess.Popen", mock_fail_popen): + with self._setup_test_env(wire_protocol_data.DATA_FILE_EXT_SINGLE) as (exthandlers_handler, protocol, no_of_exts): + with patch("azurelinuxagent.ga.cgroupapi.subprocess.Popen", mock_fail_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -235,7 +233,7 @@ def assert_extension_seq_no(expected_seq_no): self.assertEqual(expected_seq_no, handler_status['runtimeSettingsStatus']['sequenceNumber'], "Sequence number mismatch") - with self._setup_test_env(mockwiredata.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, orig_no_of_exts): + with self._setup_test_env(wire_protocol_data.DATA_FILE_MULTIPLE_EXT) as (exthandlers_handler, protocol, orig_no_of_exts): # Run 1 - GS has no required features and contains 5 extensions exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -249,7 +247,7 @@ def assert_extension_seq_no(expected_seq_no): # Run 2 - Change the GS to one with Required features not supported by the agent # This ExtensionConfig has 1 extension - ExampleHandlerLinuxWithRequiredFeatures - protocol.mock_wire_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_REQUIRED_FEATURES) + protocol.mock_wire_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_REQUIRED_FEATURES) protocol.mock_wire_data.set_incarnation(2) protocol.mock_wire_data.set_extensions_config_sequence_number(random.randint(10, 100)) protocol.client.update_goal_state() @@ -272,7 +270,7 @@ def assert_extension_seq_no(expected_seq_no): # Run 3 - Run a GS with no Required Features and ensure we execute all extensions properly # This ExtensionConfig has 1 extension - OSTCExtensions.ExampleHandlerLinux - protocol.mock_wire_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + protocol.mock_wire_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) protocol.mock_wire_data.set_incarnation(3) extension_seq_no = random.randint(10, 100) protocol.mock_wire_data.set_extensions_config_sequence_number(extension_seq_no) @@ -495,7 +493,7 @@ def _set_up_update_test_and_update_gs(self, patch_command, *args): :param args: Any additional args passed to the function, needed for creating a mock for handler and protocol :return: test_data, exthandlers_handler, protocol """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter # Ensure initial install and enable is successful @@ -524,7 +522,7 @@ def _create_extension_handlers_handler(protocol): def test_ext_handler(self, *args): # Test enable scenario. - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -612,7 +610,7 @@ def _assert_handler_status_and_manifest_download_count(protocol, test_data, mani self.assertEqual(test_data.call_counts['manifest.xml'], manifest_count, "We should have downloaded extension manifest {0} times".format(manifest_count)) - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -634,7 +632,7 @@ def test_it_should_fail_handler_on_bad_extension_config_and_report_error(self, m for bad_config_file_path in os.listdir(invalid_config_dir): bad_conf = DATA_FILE.copy() bad_conf["ext_conf"] = os.path.join(invalid_config_dir, bad_config_file_path) - test_data = mockwiredata.WireProtocolData(bad_conf) + test_data = wire_protocol_data.WireProtocolData(bad_conf) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) with patch('azurelinuxagent.ga.exthandlers.add_event') as patch_add_event: @@ -651,7 +649,7 @@ def test_it_should_process_valid_extensions_if_present(self, mock_get, mock_cryp bad_conf = DATA_FILE.copy() bad_conf["ext_conf"] = os.path.join("wire", "ext_conf_invalid_and_valid_handlers.xml") - test_data = mockwiredata.WireProtocolData(bad_conf) + test_data = wire_protocol_data.WireProtocolData(bad_conf) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) exthandlers_handler.run() @@ -675,7 +673,7 @@ def test_it_should_process_valid_extensions_if_present(self, mock_get, mock_cryp def test_it_should_ignore_case_when_parsing_plugin_settings(self, mock_get, mock_crypt_util, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_CASE_MISMATCH_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_CASE_MISMATCH_EXT) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) exthandlers_handler.run() @@ -704,7 +702,7 @@ def test_it_should_ignore_case_when_parsing_plugin_settings(self, mock_get, mock self.assertEqual(0, len(expected_ext_handlers), "All handlers not reported") def test_ext_handler_no_settings(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_NO_SETTINGS) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_NO_SETTINGS) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter test_ext = extension_emulator(name="OSTCExtensions.ExampleHandlerLinux") @@ -734,7 +732,7 @@ def test_ext_handler_no_settings(self, *args): ) def test_ext_handler_no_public_settings(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_NO_PUBLIC) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_NO_PUBLIC) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -742,7 +740,7 @@ def test_ext_handler_no_public_settings(self, *args): self._assert_handler_status(protocol.report_vm_status, "Ready", 1, "1.0.0") def test_ext_handler_no_ext(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_NO_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_NO_EXT) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter # Assert no extension handler status @@ -752,7 +750,7 @@ def test_ext_handler_no_ext(self, *args): def test_ext_handler_sequencing(self, *args): # Test enable scenario. - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter dep_ext_level_2 = extension_emulator(name="OSTCExtensions.ExampleHandlerLinux") @@ -881,7 +879,7 @@ def test_it_should_process_sequencing_properly_even_if_no_settings_for_dependent self, mock_get, mock_crypt, *args): test_data_file = DATA_FILE.copy() test_data_file["ext_conf"] = "wire/ext_conf_dependencies_with_empty_settings.xml" - test_data = mockwiredata.WireProtocolData(test_data_file) + test_data = wire_protocol_data.WireProtocolData(test_data_file) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt, *args) ext_1 = extension_emulator(name="OSTCExtensions.ExampleHandlerLinux") @@ -910,7 +908,7 @@ def test_it_should_process_sequencing_properly_even_if_no_settings_for_dependent ) def test_ext_handler_sequencing_should_fail_if_handler_failed(self, mock_get, mock_crypt, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt, *args) original_popen = subprocess.Popen @@ -986,7 +984,7 @@ def mock_fail_extension_commands(args, **kwargs): ) def test_ext_handler_sequencing_default_dependency_level(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=unused-variable,no-value-for-parameter exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -995,7 +993,7 @@ def test_ext_handler_sequencing_default_dependency_level(self, *args): self.assertEqual(exthandlers_handler.ext_handlers[0].settings[0].dependencyLevel, 0) def test_ext_handler_sequencing_invalid_dependency_level(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING) test_data.set_incarnation(2) test_data.set_extensions_config_sequence_number(1) test_data.ext_conf = test_data.ext_conf.replace("dependencyLevel=\"1\"", @@ -1012,7 +1010,7 @@ def test_ext_handler_sequencing_invalid_dependency_level(self, *args): def test_ext_handler_rollingupgrade(self, *args): # Test enable scenario. - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_ROLLINGUPGRADE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_ROLLINGUPGRADE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -1123,7 +1121,7 @@ def test_it_should_create_extension_events_dir_and_set_handler_environment_only_ with patch("azurelinuxagent.common.agent_supported_feature._ETPFeature.is_supported", enable_extensions): # Create new object for each run to force re-installation of extensions as we # only create handler_environment on installation - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_MULTIPLE_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_MULTIPLE_EXT) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -1154,7 +1152,7 @@ def test_it_should_create_extension_events_dir_and_set_handler_environment_only_ shutil.rmtree(tmp_lib_dir, ignore_errors=True) def test_it_should_not_delete_extension_events_directory_on_extension_uninstall(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter with patch("azurelinuxagent.common.agent_supported_feature._ETPFeature.is_supported", True): @@ -1178,7 +1176,7 @@ def test_it_should_not_delete_extension_events_directory_on_extension_uninstall( self.assertTrue(os.path.exists(ehi.get_extension_events_dir()), "Events directory should still exist") def test_it_should_uninstall_unregistered_extensions_properly(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1205,7 +1203,7 @@ def test_it_should_uninstall_unregistered_extensions_properly(self, *args): @patch('azurelinuxagent.common.errorstate.ErrorState.is_triggered') @patch('azurelinuxagent.ga.exthandlers.add_event') def test_ext_handler_report_status_permanent(self, mock_add_event, mock_error_state, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter protocol.report_vm_status = Mock(side_effect=ProtocolError) @@ -1221,7 +1219,7 @@ def test_ext_handler_report_status_permanent(self, mock_add_event, mock_error_st @patch('azurelinuxagent.ga.exthandlers.add_event') def test_ext_handler_report_status_resource_gone(self, mock_add_event, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter protocol.report_vm_status = Mock(side_effect=ResourceGoneError) @@ -1236,7 +1234,7 @@ def test_ext_handler_report_status_resource_gone(self, mock_add_event, *args): @patch('azurelinuxagent.common.errorstate.ErrorState.is_triggered') @patch('azurelinuxagent.ga.exthandlers.add_event') def test_ext_handler_download_failure_permanent_ProtocolError(self, mock_add_event, mock_error_state, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter protocol.get_goal_state().fetch_extension_manifest = Mock(side_effect=ProtocolError) @@ -1254,7 +1252,7 @@ def test_ext_handler_download_failure_permanent_ProtocolError(self, mock_add_eve @patch('azurelinuxagent.ga.exthandlers.fileutil') def test_ext_handler_io_error(self, mock_fileutil, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=unused-variable,no-value-for-parameter mock_fileutil.write_file.return_value = IOError("Mock IO Error") @@ -1277,7 +1275,7 @@ def _assert_ext_status(self, vm_agent_status, expected_status, self.assertIn(expected_msg, ext_status.message) def test_it_should_initialise_and_use_command_execution_log_for_extensions(self, mock_get, mock_crypt_util, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1290,7 +1288,7 @@ def test_it_should_initialise_and_use_command_execution_log_for_extensions(self, self.assertGreater(os.path.getsize(command_execution_log), 0, "The file should not be empty") def test_ext_handler_no_reporting_status(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1317,7 +1315,7 @@ def test_wait_for_handler_completion_no_status(self, mock_http_get, mock_crypt_u Expected to retry and eventually report failure for all dependent extensions. """ exthandlers_handler, protocol = self._create_mock( - mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) + wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) original_popen = subprocess.Popen @@ -1331,7 +1329,7 @@ def mock_popen(cmd, *args, **kwargs): os.remove(status_path) return original_popen(["echo", "Yes"], *args, **kwargs) - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): with patch('azurelinuxagent.ga.exthandlers._DEFAULT_EXT_TIMEOUT_MINUTES', 0.01): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1371,10 +1369,10 @@ def mock_popen(cmd, *_, **kwargs): aks_test_mock = DATA_FILE.copy() aks_test_mock["ext_conf"] = "wire/ext_conf_aks_extension.xml" - exthandlers_handler, protocol = self._create_mock(mockwiredata.WireProtocolData(aks_test_mock), + exthandlers_handler, protocol = self._create_mock(wire_protocol_data.WireProtocolData(aks_test_mock), mock_http_get, mock_crypt_util, *args) - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1405,7 +1403,7 @@ def test_it_should_include_part_of_status_in_ext_handler_message(self, mock_http debugging. """ exthandlers_handler, protocol = self._create_mock( - mockwiredata.WireProtocolData(mockwiredata.DATA_FILE), mock_http_get, mock_crypt_util, *args) + wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE), mock_http_get, mock_crypt_util, *args) original_popen = subprocess.Popen @@ -1422,7 +1420,7 @@ def mock_popen(cmd, *args, **kwargs): return original_popen(["echo", "Yes"], *args, **kwargs) - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1440,7 +1438,7 @@ def test_wait_for_handler_completion_success_status(self, mock_http_get, mock_cr Testing depends-on scenario on a successful case. Expected to report the status for both extensions properly. """ exthandlers_handler, protocol = self._create_mock( - mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) + wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1463,7 +1461,7 @@ def test_wait_for_handler_completion_error_status(self, mock_http_get, mock_cryp Expected to return False. """ exthandlers_handler, protocol = self._create_mock( - mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) + wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SEQUENCING), mock_http_get, mock_crypt_util, *args) original_popen = subprocess.Popen @@ -1473,7 +1471,7 @@ def mock_popen(cmd, *args, **kwargs): return original_popen(["/fail/this/command"], *args, **kwargs) return original_popen(cmd, *args, **kwargs) - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -1491,7 +1489,7 @@ def test_get_ext_handling_status(self, *args): Testing get_ext_handling_status() function with various cases and verifying against the expected values """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=unused-variable,no-value-for-parameter handler_name = "Handler" @@ -1534,7 +1532,7 @@ def test_is_ext_handling_complete(self, *args): Testing is_ext_handling_complete() with various input and verifying against the expected output values. """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=unused-variable,no-value-for-parameter handler_name = "Handler" @@ -1572,18 +1570,18 @@ def test_ext_handler_version_decide_autoupgrade_internalversion(self, *args): config_version = '1.3.0' decision_version = '1.3.0' if autoupgrade: - datafile = mockwiredata.DATA_FILE_EXT_AUTOUPGRADE_INTERNALVERSION + datafile = wire_protocol_data.DATA_FILE_EXT_AUTOUPGRADE_INTERNALVERSION else: - datafile = mockwiredata.DATA_FILE_EXT_INTERNALVERSION + datafile = wire_protocol_data.DATA_FILE_EXT_INTERNALVERSION else: config_version = '1.0.0' decision_version = '1.0.0' if autoupgrade: - datafile = mockwiredata.DATA_FILE_EXT_AUTOUPGRADE + datafile = wire_protocol_data.DATA_FILE_EXT_AUTOUPGRADE else: - datafile = mockwiredata.DATA_FILE + datafile = wire_protocol_data.DATA_FILE - _, protocol = self._create_mock(mockwiredata.WireProtocolData(datafile), *args) # pylint: disable=no-value-for-parameter + _, protocol = self._create_mock(wire_protocol_data.WireProtocolData(datafile), *args) # pylint: disable=no-value-for-parameter ext_handlers = protocol.get_goal_state().extensions_goal_state.extensions self.assertEqual(1, len(ext_handlers)) ext_handler = ext_handlers[0] @@ -1616,7 +1614,7 @@ def test_ext_handler_version_decide_between_minor_versions(self, *args): (None, '4.1', '4.1.0.0'), ] - _, protocol = self._create_mock(mockwiredata.WireProtocolData(mockwiredata.DATA_FILE), *args) # pylint: disable=no-value-for-parameter + _, protocol = self._create_mock(wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE), *args) # pylint: disable=no-value-for-parameter version_uri = 'http://mock-goal-state/Microsoft.OSTCExtensions_ExampleHandlerLinux_asiaeast_manifest.xml' for (installed_version, config_version, expected_version) in cases: @@ -1635,7 +1633,7 @@ def test_ext_handler_version_decide_between_minor_versions(self, *args): @patch('azurelinuxagent.common.conf.get_extensions_enabled', return_value=False) def test_extensions_disabled(self, _, *args): # test status is reported for no extensions - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_NO_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_NO_EXT) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -1644,16 +1642,28 @@ def test_extensions_disabled(self, _, *args): self._assert_no_handler_status(protocol.report_vm_status) # test status is reported, but extensions are not processed - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() - self._assert_no_handler_status(protocol.report_vm_status) + report_vm_status = protocol.report_vm_status + self.assertTrue(report_vm_status.called) + args, kw = report_vm_status.call_args # pylint: disable=unused-variable + vm_status = args[0] + self.assertEqual(1, len(vm_status.vmAgent.extensionHandlers)) + exthandler = vm_status.vmAgent.extensionHandlers[0] + self.assertEqual(-1, exthandler.code) + self.assertEqual('NotReady', exthandler.status) + self.assertEqual("Extension will not be processed since extension processing is disabled. To enable extension processing, set Extensions.Enabled=y in '/etc/waagent.conf'", exthandler.message) + ext_status = exthandler.extension_status + self.assertEqual(-1, ext_status.code) + self.assertEqual('error', ext_status.status) + self.assertEqual("Extension will not be processed since extension processing is disabled. To enable extension processing, set Extensions.Enabled=y in '/etc/waagent.conf'", ext_status.message) def test_extensions_deleted(self, *args): # Ensure initial enable is successful - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_DELETION) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_DELETION) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter exthandlers_handler.run() @@ -1682,7 +1692,7 @@ def test_install_failure(self, patch_get_install_command, patch_install, *args): """ When extension install fails, the operation should not be retried. """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter # Ensure initial install is unsuccessful @@ -1700,7 +1710,7 @@ def test_install_failure_check_exception_handling(self, patch_get_install_comman """ When extension install fails, the operation should be reported to our telemetry service. """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter # Ensure install is unsuccessful @@ -1717,7 +1727,7 @@ def test_enable_failure_check_exception_handling(self, patch_get_enable_command, """ When extension enable fails, the operation should be reported. """ - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter # Ensure initial install is successful, but enable fails @@ -1737,7 +1747,7 @@ def test_disable_failure_with_exception_handling(self, patch_get_disable_command When extension disable fails, the operation should be reported. """ # Ensure initial install and enable is successful, but disable fails - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter patch_get_disable_command.call_count = 0 patch_get_disable_command.return_value = "exit 1" @@ -1768,7 +1778,7 @@ def test_uninstall_failure(self, patch_get_uninstall_command, *args): When extension uninstall fails, the operation should not be retried. """ # Ensure initial install and enable is successful, but uninstall fails - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter patch_get_uninstall_command.call_count = 0 patch_get_uninstall_command.return_value = "exit 1" @@ -1823,7 +1833,7 @@ def mock_popen(*args, **kwargs): extension_calls.append(args[0]) return original_popen(*args, **kwargs) - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() @@ -2131,7 +2141,7 @@ def test_uninstall_rc_env_var_should_report_not_run_for_non_update_calls_to_exth self._assert_ext_status(protocol.report_vm_status, "success", 0) def test_ext_path_and_version_env_variables_set_for_ever_operation(self, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter with patch.object(CGroupConfigurator.get_instance(), "start_extension_command") as patch_start_cmd: @@ -2148,9 +2158,9 @@ def test_ext_path_and_version_env_variables_set_for_ever_operation(self, *args): self._assert_handler_status(protocol.report_vm_status, "Ready", expected_ext_count=1, version="1.0.0") - @patch("azurelinuxagent.common.cgroupconfigurator.handle_process_completion", side_effect="Process Successful") + @patch("azurelinuxagent.ga.cgroupconfigurator.handle_process_completion", side_effect="Process Successful") def test_ext_sequence_no_should_be_set_for_every_command_call(self, _, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_MULTIPLE_EXT) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_MULTIPLE_EXT) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=no-value-for-parameter with patch("subprocess.Popen") as patch_popen: @@ -2205,7 +2215,7 @@ def test_ext_sequence_no_should_be_set_from_within_extension(self, *args): os.mkdir(base_dir) self.create_script(os.path.join(base_dir, test_file_name), test_file) - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_EXT_SINGLE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_EXT_SINGLE) exthandlers_handler, protocol = self._create_mock(test_data, *args) # pylint: disable=unused-variable,no-value-for-parameter expected_seq_no = 0 @@ -2297,7 +2307,7 @@ def test_correct_exit_code_should_be_set_on_uninstall_cmd_failure(self, *args): self.assertIn("%s=%s" % (ExtCommandEnvVariable.UninstallReturnCode, exit_code), enable_kwargs['message']) def test_it_should_persist_goal_state_aggregate_status_until_new_incarnation(self, mock_get, mock_crypt_util, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) exthandlers_handler.run() @@ -2325,7 +2335,7 @@ def test_it_should_persist_goal_state_aggregate_status_until_new_incarnation(sel self.assertEqual(new_gs_aggregate_status.in_svd_seq_no, "2", "Incorrect seq no") def test_it_should_parse_required_features_properly(self, mock_get, mock_crypt_util, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_REQUIRED_FEATURES) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_REQUIRED_FEATURES) _, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) required_features = protocol.get_goal_state().extensions_goal_state.required_features @@ -2334,7 +2344,7 @@ def test_it_should_parse_required_features_properly(self, mock_get, mock_crypt_u self.assertEqual(feature, "TestRequiredFeature{0}".format(i+1), "Name mismatch") def test_it_should_fail_goal_state_if_required_features_not_supported(self, mock_get, mock_crypt_util, *args): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE_REQUIRED_FEATURES) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE_REQUIRED_FEATURES) exthandlers_handler, protocol = self._create_mock(test_data, mock_get, mock_crypt_util, *args) exthandlers_handler.run() @@ -2355,7 +2365,7 @@ def test_it_should_fail_goal_state_if_required_features_not_supported(self, mock class TestExtensionSequencing(AgentTestCase): def _create_mock(self, mock_http_get, MockCryptUtil): - test_data = mockwiredata.WireProtocolData(mockwiredata.DATA_FILE) + test_data = wire_protocol_data.WireProtocolData(wire_protocol_data.DATA_FILE) # Mock protocol to return test data mock_http_get.side_effect = test_data.mock_http_get @@ -3185,7 +3195,7 @@ def manifest_location_handler(url, **kwargs): wire._DOWNLOAD_TIMEOUT = datetime.timedelta(minutes=0) try: with self.assertRaises(ExtensionDownloadError): - protocol.client.fetch_manifest(ext_handlers[0].manifest_uris, use_verify_header=False) + protocol.client.fetch_manifest("extension", ext_handlers[0].manifest_uris, use_verify_header=False) finally: wire._DOWNLOAD_TIMEOUT = download_timeout @@ -3209,8 +3219,9 @@ def tearDown(self): AgentTestCase.tearDown(self) @patch('time.gmtime', MagicMock(return_value=time.gmtime(0))) - def test_ext_handler_reporting_status_file(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + @patch("azurelinuxagent.common.version.get_daemon_version", return_value=FlexibleVersion("0.0.0.0")) + def test_ext_handler_reporting_status_file(self, _): + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: def mock_http_put(url, *args, **_): if HttpRequestPredicates.is_host_plugin_status_request(url): @@ -3234,91 +3245,75 @@ def mock_http_put(url, *args, **_): ) expected_status = { - "__comment__": "The __status__ property is the actual status reported to CRP", - "__status__": { - "version": "1.1", - "timestampUTC": "1970-01-01T00:00:00Z", - "aggregateStatus": { - "guestAgentStatus": { - "version": AGENT_VERSION, + "version": "1.1", + "timestampUTC": "1970-01-01T00:00:00Z", + "aggregateStatus": { + "guestAgentStatus": { + "version": AGENT_VERSION, + "status": "Ready", + "formattedMessage": { + "lang": "en-US", + "message": "Guest Agent is running" + } + }, + "handlerAggregateStatus": [ + { + "handlerVersion": "1.0.0", + "handlerName": "OSTCExtensions.ExampleHandlerLinux", "status": "Ready", + "code": 0, + "useExactVersion": True, "formattedMessage": { "lang": "en-US", - "message": "Guest Agent is running" - } - }, - "handlerAggregateStatus": [ - { - "handlerVersion": "1.0.0", - "handlerName": "OSTCExtensions.ExampleHandlerLinux", - "status": "Ready", - "code": 0, - "useExactVersion": True, - "formattedMessage": { - "lang": "en-US", - "message": "Plugin enabled" - }, - "runtimeSettingsStatus": { - "settingsStatus": { - "status": { - "name": "OSTCExtensions.ExampleHandlerLinux", - "configurationAppliedTime": None, - "operation": None, - "status": "success", - "code": 0, - "formattedMessage": { - "lang": "en-US", - "message": None - } - }, - "version": 1.0, - "timestampUTC": "1970-01-01T00:00:00Z" + "message": "Plugin enabled" + }, + "runtimeSettingsStatus": { + "settingsStatus": { + "status": { + "name": "OSTCExtensions.ExampleHandlerLinux", + "configurationAppliedTime": None, + "operation": None, + "status": "success", + "code": 0, + "formattedMessage": { + "lang": "en-US", + "message": None + } }, - "sequenceNumber": 0 - } - } - ], - "vmArtifactsAggregateStatus": { - "goalStateAggregateStatus": { - "formattedMessage": { - "lang": "en-US", - "message": "GoalState executed successfully" + "version": 1.0, + "timestampUTC": "1970-01-01T00:00:00Z" }, - "timestampUTC": "1970-01-01T00:00:00Z", - "inSvdSeqNo": "1", - "status": "Success", - "code": 0 + "sequenceNumber": 0 } } - }, - "guestOSInfo": None, - "supportedFeatures": supported_features - }, - "__debug__": { - "agentName": AGENT_NAME, - "daemonVersion": "0.0.0.0", - "pythonVersion": "Python: {0}.{1}.{2}".format(PY_VERSION_MAJOR, PY_VERSION_MINOR, PY_VERSION_MICRO), - "extensionSupportedFeatures": [name for name, _ in get_agent_supported_features_list_for_extensions().items()], - "supportsMultiConfig": { - "OSTCExtensions.ExampleHandlerLinux": False + ], + "vmArtifactsAggregateStatus": { + "goalStateAggregateStatus": { + "formattedMessage": { + "lang": "en-US", + "message": "GoalState executed successfully" + }, + "timestampUTC": "1970-01-01T00:00:00Z", + "inSvdSeqNo": "1", + "status": "Success", + "code": 0 + } } - } + }, + "guestOSInfo": None, + "supportedFeatures": supported_features } - exthandlers_handler.run() - vm_status = exthandlers_handler.report_ext_handlers_status() - actual_status_json = json.loads(exthandlers_handler.get_ext_handlers_status_debug_info(vm_status)) + exthandlers_handler.report_ext_handlers_status() - # Don't compare the guestOSInfo - status_property = actual_status_json.get("__status__") - self.assertIsNotNone(status_property, "The status file is missing the __status__ property") - self.assertIsNotNone(status_property.get("guestOSInfo"), "The status file is missing the guestOSInfo property") - status_property["guestOSInfo"] = None + actual_status = json.loads(protocol.get_status_blob_data()) - actual_status_json.pop('guestOSInfo', None) + # Don't compare the guestOSInfo + self.assertIsNotNone(actual_status.get("guestOSInfo"), "The status file is missing the guestOSInfo property") + actual_status["guestOSInfo"] = None - self.assertEqual(expected_status, actual_status_json) + self.assertEqual(expected_status, actual_status) def test_it_should_process_extensions_only_if_allowed(self): def assert_extensions_called(exthandlers_handler, expected_call_count=0): @@ -3345,7 +3340,7 @@ def http_get_handler(url, *_, **kwargs): mock_in_vm_artifacts_profile_response = MockHttpResponse(200, body='{ "onHold": false }'.encode('utf-8')) - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE, http_get_handler=http_get_handler) as protocol: protocol.report_vm_status = MagicMock() exthandlers_handler = get_exthandlers_handler(protocol) @@ -3387,7 +3382,7 @@ def http_get_handler(url, *_, **kwargs): def test_it_should_process_extensions_appropriately_on_artifact_hold(self): with patch('time.sleep', side_effect=lambda _: mock_sleep(0.001)): with patch("azurelinuxagent.common.conf.get_enable_overprovisioning", return_value=True): - with mock_wire_protocol(mockwiredata.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_IN_VM_ARTIFACTS_PROFILE) as protocol: protocol.report_vm_status = MagicMock() exthandlers_handler = get_exthandlers_handler(protocol) # @@ -3418,6 +3413,33 @@ def http_get_handler(url, *_, **kwargs): self._assert_handler_status(protocol.report_vm_status, "Ready", 1, "1.0.0") self.assertEqual("1", protocol.report_vm_status.call_args[0][0].vmAgent.vm_artifacts_aggregate_status.goal_state_aggregate_status.in_svd_seq_no, "SVD sequence number mismatch") + def test_it_should_redact_access_tokens_in_extension_output(self): + original = r'''ONE https://foo.blob.core.windows.net/bar?sv=2000&ss=bfqt&srt=sco&sp=rw&se=2025&st=2022&spr=https&sig=SI%3D + TWO:HTTPS://bar.blob.core.com/foo/bar/foo.txt?sv=2018&sr=b&sig=Yx%3D&st=2023%3A52Z&se=9999%3A59%3A59Z&sp=r TWO + https://bar.com/foo?uid=2018&sr=b THREE''' + expected = r'''ONE https://foo.blob.core.windows.net/bar? + TWO:HTTPS://bar.blob.core.com/foo/bar/foo.txt? TWO + https://bar.com/foo?uid=2018&sr=b THREE''' + + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: + exthandlers_handler = get_exthandlers_handler(protocol) + + original_popen = subprocess.Popen + + def mock_popen(cmd, *args, **kwargs): + if cmd.endswith("sample.py -enable"): + cmd = "echo '{0}'; >&2 echo '{0}'; exit 1".format(original) + return original_popen(cmd, *args, **kwargs) + + with patch.object(subprocess, 'Popen', side_effect=mock_popen): + exthandlers_handler.run() + + status = exthandlers_handler.report_ext_handlers_status() + self.assertEqual(1, len(status.vmAgent.extensionHandlers), 'Expected exactly 1 extension status') + message = status.vmAgent.extensionHandlers[0].message + self.assertIn('[stdout]\n{0}'.format(expected), message, "The extension's stdout was not redacted correctly") + self.assertIn('[stderr]\n{0}'.format(expected), message, "The extension's stderr was not redacted correctly") + if __name__ == '__main__': unittest.main() diff --git a/tests/ga/test_exthandlers.py b/tests/ga/test_exthandlers.py index 67b077177..f56ebce14 100644 --- a/tests/ga/test_exthandlers.py +++ b/tests/ga/test_exthandlers.py @@ -28,12 +28,12 @@ from azurelinuxagent.common.protocol.util import ProtocolUtil from azurelinuxagent.common.protocol.wire import WireProtocol from azurelinuxagent.common.utils import fileutil -from azurelinuxagent.common.utils.extensionprocessutil import TELEMETRY_MESSAGE_MAX_LEN, format_stdout_stderr, \ +from azurelinuxagent.ga.extensionprocessutil import TELEMETRY_MESSAGE_MAX_LEN, format_stdout_stderr, \ read_output from azurelinuxagent.ga.exthandlers import parse_ext_status, ExtHandlerInstance, ExtCommandEnvVariable, \ ExtensionStatusError, _DEFAULT_SEQ_NO, get_exthandlers_handler, ExtHandlerState -from tests.protocol.mocks import mock_wire_protocol, mockwiredata -from tests.tools import AgentTestCase, patch, mock_sleep, clear_singleton_instances +from tests.lib.mock_wire_protocol import mock_wire_protocol, wire_protocol_data +from tests.lib.tools import AgentTestCase, patch, mock_sleep, clear_singleton_instances class TestExtHandlers(AgentTestCase): @@ -247,7 +247,7 @@ def test_extension_sequence_number(self): expected_sequence_number=-1) def test_it_should_report_error_if_plugin_settings_version_mismatch(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_PLUGIN_SETTINGS_MISMATCH) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_PLUGIN_SETTINGS_MISMATCH) as protocol: with patch("azurelinuxagent.common.protocol.extensions_goal_state_from_extensions_config.add_event") as mock_add_event: # Forcing update of GoalState to allow the ExtConfig to report an event protocol.mock_wire_data.set_incarnation(2) @@ -287,12 +287,34 @@ def test_command_extension_log_truncates_correctly(self, mock_log_dir): with open(log_file_path) as truncated_log_file: self.assertEqual(truncated_log_file.read(), "{second_line}\n".format(second_line=second_line)) + def test_set_logger_should_not_reset_the_mode_of_the_log_directory(self): + ext_log_dir = os.path.join(self.tmp_dir, "log_directory") + + with patch("azurelinuxagent.common.conf.get_ext_log_dir", return_value=ext_log_dir): + ext_handler = Extension(name='foo') + ext_handler.version = "1.2.3" + ext_handler_instance = ExtHandlerInstance(ext_handler=ext_handler, protocol=None) + ext_handler_log_dir = os.path.join(ext_log_dir, ext_handler.name) + + # Double-check the initial mode + get_mode = lambda f: os.stat(f).st_mode & 0o777 + mode = get_mode(ext_handler_log_dir) + if mode != 0o755: + raise Exception("The initial mode of the log directory should be 0o755, got 0{0:o}".format(mode)) + + new_mode = 0o700 + os.chmod(ext_handler_log_dir, new_mode) + ext_handler_instance.set_logger() + + mode = get_mode(ext_handler_log_dir) + self.assertEqual(new_mode, mode, "The mode of the log directory should not have changed") + def test_it_should_report_the_message_in_the_hearbeat(self): def heartbeat_with_message(): return {'code': 0, 'formattedMessage': {'lang': 'en-US', 'message': 'This is a heartbeat message'}, 'status': 'ready'} - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: with patch("azurelinuxagent.common.protocol.wire.WireProtocol.report_vm_status", return_value=None): with patch("azurelinuxagent.ga.exthandlers.ExtHandlerInstance.collect_heartbeat", side_effect=heartbeat_with_message): @@ -653,7 +675,7 @@ def test_it_should_read_only_the_head_of_large_outputs(self): # Mocking the call to file.read() is difficult, so instead we mock the call to format_stdout_stderr, which takes the # return value of the calls to file.read(). The intention of the test is to verify we never read (and load in memory) # more than a few KB of data from the files used to capture stdout/stderr - with patch('azurelinuxagent.common.utils.extensionprocessutil.format_stdout_stderr', side_effect=format_stdout_stderr) as mock_format: + with patch('azurelinuxagent.ga.extensionprocessutil.format_stdout_stderr', side_effect=format_stdout_stderr) as mock_format: output = self.ext_handler_instance.launch_command(command) self.assertGreaterEqual(len(output), 1024) @@ -686,7 +708,7 @@ def test_it_should_handle_errors_while_reading_the_command_output(self): def capture_process_output(stdout_file, stderr_file): # pylint: disable=unused-argument return original_capture_process_output(None, None) - with patch('azurelinuxagent.common.utils.extensionprocessutil.read_output', side_effect=capture_process_output): + with patch('azurelinuxagent.ga.extensionprocessutil.read_output', side_effect=capture_process_output): output = self.ext_handler_instance.launch_command(command) self.assertIn("[stderr]\nCannot read stdout/stderr:", output) diff --git a/tests/ga/test_exthandlers_download_extension.py b/tests/ga/test_exthandlers_download_extension.py index 556254fa3..b3ed96a89 100644 --- a/tests/ga/test_exthandlers_download_extension.py +++ b/tests/ga/test_exthandlers_download_extension.py @@ -10,9 +10,9 @@ from azurelinuxagent.common.protocol.restapi import Extension, ExtHandlerPackage from azurelinuxagent.common.protocol.wire import WireProtocol from azurelinuxagent.ga.exthandlers import ExtHandlerInstance, ExtHandlerState -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol -from tests.tools import AgentTestCase, patch, Mock +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol +from tests.lib.tools import AgentTestCase, patch, Mock class DownloadExtensionTestCase(AgentTestCase): @@ -42,7 +42,7 @@ def setUp(self): protocol.client.get_artifact_request = Mock(return_value=(None, None)) # create a dummy goal state, since downloads are done via the GoalState class - with mock_wire_protocol(mockwiredata.DATA_FILE) as p: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as p: goal_state = p.get_goal_state() goal_state._wire_client = protocol.client protocol.client._goal_state = goal_state diff --git a/tests/ga/test_exthandlers_exthandlerinstance.py b/tests/ga/test_exthandlers_exthandlerinstance.py index 6295d68d2..846bb89e9 100644 --- a/tests/ga/test_exthandlers_exthandlerinstance.py +++ b/tests/ga/test_exthandlers_exthandlerinstance.py @@ -7,7 +7,7 @@ from azurelinuxagent.common.protocol.restapi import Extension, ExtHandlerPackage from azurelinuxagent.ga.exthandlers import ExtHandlerInstance -from tests.tools import AgentTestCase, patch +from tests.lib.tools import AgentTestCase, patch class ExtHandlerInstanceTestCase(AgentTestCase): diff --git a/tests/ga/test_guestagent.py b/tests/ga/test_guestagent.py new file mode 100644 index 000000000..972e603c2 --- /dev/null +++ b/tests/ga/test_guestagent.py @@ -0,0 +1,301 @@ +import contextlib +import json +import os +import tempfile + +from azurelinuxagent.common import conf +from azurelinuxagent.common.exception import UpdateError +from azurelinuxagent.ga.guestagent import GuestAgent, AGENT_MANIFEST_FILE, AGENT_ERROR_FILE, GuestAgentError, \ + MAX_FAILURE, GuestAgentUpdateAttempt +from azurelinuxagent.common.version import AGENT_NAME +from tests.ga.test_update import UpdateTestCase, EMPTY_MANIFEST, WITH_ERROR, NO_ERROR + + +class TestGuestAgent(UpdateTestCase): + def setUp(self): + UpdateTestCase.setUp(self) + self.copy_agents(self._get_agent_file_path()) + self.agent_path = os.path.join(self.tmp_dir, self._get_agent_name()) + + def test_creation(self): + with self.assertRaises(UpdateError): + GuestAgent.from_installed_agent("A very bad file name") + + with self.assertRaises(UpdateError): + GuestAgent.from_installed_agent("{0}-a.bad.version".format(AGENT_NAME)) + + self.expand_agents() + + agent = GuestAgent.from_installed_agent(self.agent_path) + self.assertNotEqual(None, agent) + self.assertEqual(self._get_agent_name(), agent.name) + self.assertEqual(self._get_agent_version(), agent.version) + + self.assertEqual(self.agent_path, agent.get_agent_dir()) + + path = os.path.join(self.agent_path, AGENT_MANIFEST_FILE) + self.assertEqual(path, agent.get_agent_manifest_path()) + + self.assertEqual( + os.path.join(self.agent_path, AGENT_ERROR_FILE), + agent.get_agent_error_file()) + + path = ".".join((os.path.join(conf.get_lib_dir(), self._get_agent_name()), "zip")) + self.assertEqual(path, agent.get_agent_pkg_path()) + + self.assertTrue(agent.is_downloaded) + self.assertFalse(agent.is_blacklisted) + self.assertTrue(agent.is_available) + + def test_clear_error(self): + self.expand_agents() + + agent = GuestAgent.from_installed_agent(self.agent_path) + agent.mark_failure(is_fatal=True) + + self.assertTrue(agent.error.last_failure > 0.0) + self.assertEqual(1, agent.error.failure_count) + self.assertTrue(agent.is_blacklisted) + self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) + + agent.clear_error() + self.assertEqual(0.0, agent.error.last_failure) + self.assertEqual(0, agent.error.failure_count) + self.assertFalse(agent.is_blacklisted) + self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) + + def test_is_available(self): + self.expand_agents() + + agent = GuestAgent.from_installed_agent(self.agent_path) + + self.assertTrue(agent.is_available) + agent.mark_failure(is_fatal=True) + self.assertFalse(agent.is_available) + + def test_is_blacklisted(self): + self.expand_agents() + + agent = GuestAgent.from_installed_agent(self.agent_path) + self.assertFalse(agent.is_blacklisted) + self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) + + agent.mark_failure(is_fatal=True) + self.assertTrue(agent.is_blacklisted) + self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) + + def test_is_downloaded(self): + self.expand_agents() + agent = GuestAgent.from_installed_agent(self.agent_path) + self.assertTrue(agent.is_downloaded) + + def test_mark_failure(self): + agent = GuestAgent.from_installed_agent(self.agent_path) + + agent.mark_failure() + self.assertEqual(1, agent.error.failure_count) + + agent.mark_failure(is_fatal=True) + self.assertEqual(2, agent.error.failure_count) + self.assertTrue(agent.is_blacklisted) + + def test_inc_update_attempt_count(self): + agent = GuestAgent.from_installed_agent(self.agent_path) + agent.inc_update_attempt_count() + self.assertEqual(1, agent.update_attempt_data.count) + + agent.inc_update_attempt_count() + self.assertEqual(2, agent.update_attempt_data.count) + + def test_get_update_count(self): + agent = GuestAgent.from_installed_agent(self.agent_path) + agent.inc_update_attempt_count() + self.assertEqual(1, agent.get_update_attempt_count()) + + agent.inc_update_attempt_count() + self.assertEqual(2, agent.get_update_attempt_count()) + + def test_load_manifest(self): + self.expand_agents() + agent = GuestAgent.from_installed_agent(self.agent_path) + agent._load_manifest() + self.assertEqual(agent.manifest.get_enable_command(), + agent.get_agent_cmd()) + + def test_load_manifest_missing(self): + self.expand_agents() + agent = GuestAgent.from_installed_agent(self.agent_path) + os.remove(agent.get_agent_manifest_path()) + self.assertRaises(UpdateError, agent._load_manifest) + + def test_load_manifest_is_empty(self): + self.expand_agents() + agent = GuestAgent.from_installed_agent(self.agent_path) + self.assertTrue(os.path.isfile(agent.get_agent_manifest_path())) + + with open(agent.get_agent_manifest_path(), "w") as file: # pylint: disable=redefined-builtin + json.dump(EMPTY_MANIFEST, file) + self.assertRaises(UpdateError, agent._load_manifest) + + def test_load_manifest_is_malformed(self): + self.expand_agents() + agent = GuestAgent.from_installed_agent(self.agent_path) + self.assertTrue(os.path.isfile(agent.get_agent_manifest_path())) + + with open(agent.get_agent_manifest_path(), "w") as file: # pylint: disable=redefined-builtin + file.write("This is not JSON data") + self.assertRaises(UpdateError, agent._load_manifest) + + def test_load_error(self): + agent = GuestAgent.from_installed_agent(self.agent_path) + agent.error = None + + agent._load_error() + self.assertTrue(agent.error is not None) + + +class TestGuestAgentError(UpdateTestCase): + def test_creation(self): + self.assertRaises(TypeError, GuestAgentError) + self.assertRaises(UpdateError, GuestAgentError, None) + + with self.get_error_file(error_data=WITH_ERROR) as path: + err = GuestAgentError(path.name) + err.load() + self.assertEqual(path.name, err.path) + self.assertNotEqual(None, err) + + self.assertEqual(WITH_ERROR["last_failure"], err.last_failure) + self.assertEqual(WITH_ERROR["failure_count"], err.failure_count) + self.assertEqual(WITH_ERROR["was_fatal"], err.was_fatal) + return + + def test_clear(self): + with self.get_error_file(error_data=WITH_ERROR) as path: + err = GuestAgentError(path.name) + err.load() + self.assertEqual(path.name, err.path) + self.assertNotEqual(None, err) + + err.clear() + self.assertEqual(NO_ERROR["last_failure"], err.last_failure) + self.assertEqual(NO_ERROR["failure_count"], err.failure_count) + self.assertEqual(NO_ERROR["was_fatal"], err.was_fatal) + return + + def test_save(self): + err1 = self.create_error() + err1.mark_failure() + err1.mark_failure(is_fatal=True) + + err2 = self.create_error(err1.to_json()) + self.assertEqual(err1.last_failure, err2.last_failure) + self.assertEqual(err1.failure_count, err2.failure_count) + self.assertEqual(err1.was_fatal, err2.was_fatal) + + def test_mark_failure(self): + err = self.create_error() + self.assertFalse(err.is_blacklisted) + + for i in range(0, MAX_FAILURE): # pylint: disable=unused-variable + err.mark_failure() + + # Agent failed >= MAX_FAILURE, it should be blacklisted + self.assertTrue(err.is_blacklisted) + self.assertEqual(MAX_FAILURE, err.failure_count) + return + + def test_mark_failure_permanent(self): + err = self.create_error() + + self.assertFalse(err.is_blacklisted) + + # Fatal errors immediately blacklist + err.mark_failure(is_fatal=True) + self.assertTrue(err.is_blacklisted) + self.assertTrue(err.failure_count < MAX_FAILURE) + return + + def test_str(self): + err = self.create_error(error_data=NO_ERROR) + s = "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( + NO_ERROR["last_failure"], + NO_ERROR["failure_count"], + NO_ERROR["was_fatal"], + NO_ERROR["reason"]) + self.assertEqual(s, str(err)) + + err = self.create_error(error_data=WITH_ERROR) + s = "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( + WITH_ERROR["last_failure"], + WITH_ERROR["failure_count"], + WITH_ERROR["was_fatal"], + WITH_ERROR["reason"]) + self.assertEqual(s, str(err)) + return + + +UPDATE_ATTEMPT = { + "count": 2 +} + +NO_ATTEMPT = { + "count": 0 +} + + +class TestGuestAgentUpdateAttempt(UpdateTestCase): + @contextlib.contextmanager + def get_attempt_count_file(self, attempt_count=None): + if attempt_count is None: + attempt_count = NO_ATTEMPT + with tempfile.NamedTemporaryFile(mode="w") as fp: + json.dump(attempt_count, fp) + fp.seek(0) + yield fp + + def test_creation(self): + self.assertRaises(TypeError, GuestAgentUpdateAttempt) + self.assertRaises(UpdateError, GuestAgentUpdateAttempt, None) + + with self.get_attempt_count_file(UPDATE_ATTEMPT) as path: + update_data = GuestAgentUpdateAttempt(path.name) + update_data.load() + self.assertEqual(path.name, update_data.path) + self.assertNotEqual(None, update_data) + + self.assertEqual(UPDATE_ATTEMPT["count"], update_data.count) + + def test_clear(self): + with self.get_attempt_count_file(UPDATE_ATTEMPT) as path: + update_data = GuestAgentUpdateAttempt(path.name) + update_data.load() + self.assertEqual(path.name, update_data.path) + self.assertNotEqual(None, update_data) + + update_data.clear() + self.assertEqual(NO_ATTEMPT["count"], update_data.count) + + def test_save(self): + with self.get_attempt_count_file(UPDATE_ATTEMPT) as path: + update_data = GuestAgentUpdateAttempt(path.name) + update_data.load() + update_data.inc_count() + update_data.save() + + with self.get_attempt_count_file(update_data.to_json()) as path: + new_data = GuestAgentUpdateAttempt(path.name) + new_data.load() + + self.assertEqual(update_data.count, new_data.count) + + def test_inc_count(self): + with self.get_attempt_count_file() as path: + update_data = GuestAgentUpdateAttempt(path.name) + update_data.load() + + self.assertEqual(0, update_data.count) + update_data.inc_count() + self.assertEqual(1, update_data.count) + update_data.inc_count() + self.assertEqual(2, update_data.count) diff --git a/tests/common/test_logcollector.py b/tests/ga/test_logcollector.py similarity index 85% rename from tests/common/test_logcollector.py rename to tests/ga/test_logcollector.py index 521e0f23e..cedf894b0 100644 --- a/tests/common/test_logcollector.py +++ b/tests/ga/test_logcollector.py @@ -22,10 +22,10 @@ import tempfile import zipfile -from azurelinuxagent.common.logcollector import LogCollector +from azurelinuxagent.ga.logcollector import LogCollector from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common.utils.fileutil import rm_dirs, mkdir, rm_files -from tests.tools import AgentTestCase, is_python_version_26, patch, skip_if_predicate_true, data_dir +from tests.lib.tools import AgentTestCase, is_python_version_26, patch, skip_if_predicate_true, data_dir SMALL_FILE_SIZE = 1 * 1024 * 1024 # 1 MB LARGE_FILE_SIZE = 5 * 1024 * 1024 # 5 MB @@ -48,26 +48,26 @@ def setUpClass(cls): @classmethod def _mock_constants(cls): - cls.mock_manifest = patch("azurelinuxagent.common.logcollector.MANIFEST_NORMAL", cls._build_manifest()) + cls.mock_manifest = patch("azurelinuxagent.ga.logcollector.MANIFEST_NORMAL", cls._build_manifest()) cls.mock_manifest.start() cls.log_collector_dir = os.path.join(cls.tmp_dir, "logcollector") - cls.mock_log_collector_dir = patch("azurelinuxagent.common.logcollector._LOG_COLLECTOR_DIR", + cls.mock_log_collector_dir = patch("azurelinuxagent.ga.logcollector._LOG_COLLECTOR_DIR", cls.log_collector_dir) cls.mock_log_collector_dir.start() cls.truncated_files_dir = os.path.join(cls.tmp_dir, "truncated") - cls.mock_truncated_files_dir = patch("azurelinuxagent.common.logcollector._TRUNCATED_FILES_DIR", + cls.mock_truncated_files_dir = patch("azurelinuxagent.ga.logcollector._TRUNCATED_FILES_DIR", cls.truncated_files_dir) cls.mock_truncated_files_dir.start() cls.output_results_file_path = os.path.join(cls.log_collector_dir, "results.txt") - cls.mock_output_results_file_path = patch("azurelinuxagent.common.logcollector.OUTPUT_RESULTS_FILE_PATH", + cls.mock_output_results_file_path = patch("azurelinuxagent.ga.logcollector.OUTPUT_RESULTS_FILE_PATH", cls.output_results_file_path) cls.mock_output_results_file_path.start() cls.compressed_archive_path = os.path.join(cls.log_collector_dir, "logs.zip") - cls.mock_compressed_archive_path = patch("azurelinuxagent.common.logcollector.COMPRESSED_ARCHIVE_PATH", + cls.mock_compressed_archive_path = patch("azurelinuxagent.ga.logcollector.COMPRESSED_ARCHIVE_PATH", cls.compressed_archive_path) cls.mock_compressed_archive_path.start() @@ -210,9 +210,9 @@ def test_log_collector_parses_commands_in_manifest(self): copy,{1} diskinfo,""".format(folder_to_list, file_to_collect) - with patch("azurelinuxagent.common.logcollector.MANIFEST_NORMAL", manifest): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector.MANIFEST_NORMAL", manifest): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() archive = log_collector.collect_logs_and_get_archive() with open(self.output_results_file_path, "r") as fh: @@ -239,9 +239,9 @@ def test_log_collector_uses_full_manifest_when_full_mode_enabled(self): copy,{0} """.format(file_to_collect) - with patch("azurelinuxagent.common.logcollector.MANIFEST_FULL", manifest): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(is_full_mode=True, cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector.MANIFEST_FULL", manifest): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector(is_full_mode=True) archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(archive) @@ -254,8 +254,8 @@ def test_log_collector_should_collect_all_files(self): # All files in the manifest should be collected, since none of them are over the individual file size limit, # and combined they do not cross the archive size threshold. - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(archive) @@ -275,9 +275,9 @@ def test_log_collector_should_collect_all_files(self): def test_log_collector_should_truncate_large_text_files_and_ignore_large_binary_files(self): # Set the size limit so that some files are too large to collect in full. - with patch("azurelinuxagent.common.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(archive) @@ -308,10 +308,10 @@ def test_log_collector_should_prioritize_important_files_if_archive_too_big(self os.path.join(self.root_collect_dir, "less_important_file*") ] - with patch("azurelinuxagent.common.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 10 * 1024 * 1024): - with patch("azurelinuxagent.common.logcollector._MUST_COLLECT_FILES", must_collect_files): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 10 * 1024 * 1024): + with patch("azurelinuxagent.ga.logcollector._MUST_COLLECT_FILES", must_collect_files): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(archive) @@ -336,8 +336,8 @@ def test_log_collector_should_prioritize_important_files_if_archive_too_big(self # if there is enough space. rm_files(os.path.join(self.root_collect_dir, "waagent.log.3.gz")) - with patch("azurelinuxagent.common.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 10 * 1024 * 1024): - with patch("azurelinuxagent.common.logcollector._MUST_COLLECT_FILES", must_collect_files): + with patch("azurelinuxagent.ga.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 10 * 1024 * 1024): + with patch("azurelinuxagent.ga.logcollector._MUST_COLLECT_FILES", must_collect_files): second_archive = log_collector.collect_logs_and_get_archive() expected_files = [ @@ -361,8 +361,8 @@ def test_log_collector_should_prioritize_important_files_if_archive_too_big(self def test_log_collector_should_update_archive_when_files_are_new_or_modified_or_deleted(self): # Ensure the archive reflects the state of files on the disk at collection time. If a file was updated, it # needs to be updated in the archive, deleted if removed from disk, and added if not previously seen. - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() first_archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(first_archive) @@ -429,11 +429,11 @@ def test_log_collector_should_clean_up_uncollected_truncated_files(self): # Set the archive size limit so that not all files can be collected. In that case, files will be added to the # archive according to their priority. # Set the size limit so that only two files can be collected, of which one needs to be truncated. - with patch("azurelinuxagent.common.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 2 * SMALL_FILE_SIZE): - with patch("azurelinuxagent.common.logcollector._MUST_COLLECT_FILES", must_collect_files): - with patch("azurelinuxagent.common.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 2 * SMALL_FILE_SIZE): + with patch("azurelinuxagent.ga.logcollector._MUST_COLLECT_FILES", must_collect_files): + with patch("azurelinuxagent.ga.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() archive = log_collector.collect_logs_and_get_archive() self._assert_archive_created(archive) @@ -451,11 +451,11 @@ def test_log_collector_should_clean_up_uncollected_truncated_files(self): # removed both from the archive and from the filesystem. rm_files(os.path.join(self.root_collect_dir, "waagent.log.1")) - with patch("azurelinuxagent.common.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 2 * SMALL_FILE_SIZE): - with patch("azurelinuxagent.common.logcollector._MUST_COLLECT_FILES", must_collect_files): - with patch("azurelinuxagent.common.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): - with patch('azurelinuxagent.common.logcollector.LogCollector._initialize_telemetry'): - log_collector = LogCollector(cpu_cgroup_path="dummy_cpu_path", memory_cgroup_path="dummy_memory_path") + with patch("azurelinuxagent.ga.logcollector._UNCOMPRESSED_ARCHIVE_SIZE_LIMIT", 2 * SMALL_FILE_SIZE): + with patch("azurelinuxagent.ga.logcollector._MUST_COLLECT_FILES", must_collect_files): + with patch("azurelinuxagent.ga.logcollector._FILE_SIZE_LIMIT", SMALL_FILE_SIZE): + with patch('azurelinuxagent.ga.logcollector.LogCollector._initialize_telemetry'): + log_collector = LogCollector() second_archive = log_collector.collect_logs_and_get_archive() expected_files = [ diff --git a/tests/ga/test_monitor.py b/tests/ga/test_monitor.py index 5853b23ef..1dbec27c3 100644 --- a/tests/ga/test_monitor.py +++ b/tests/ga/test_monitor.py @@ -21,8 +21,8 @@ import string from azurelinuxagent.common import event, logger -from azurelinuxagent.common.cgroup import CpuCgroup, MemoryCgroup, MetricValue, _REPORT_EVERY_HOUR -from azurelinuxagent.common.cgroupstelemetry import CGroupsTelemetry +from azurelinuxagent.ga.cgroup import CpuCgroup, MemoryCgroup, MetricValue, _REPORT_EVERY_HOUR +from azurelinuxagent.ga.cgroupstelemetry import CGroupsTelemetry from azurelinuxagent.common.event import EVENTS_DIRECTORY from azurelinuxagent.common.protocol.healthservice import HealthService from azurelinuxagent.common.protocol.util import ProtocolUtil @@ -30,10 +30,10 @@ from azurelinuxagent.ga.monitor import get_monitor_handler, PeriodicOperation, SendImdsHeartbeat, \ ResetPeriodicLogMessages, SendHostPluginHeartbeat, PollResourceUsage, \ ReportNetworkErrors, ReportNetworkConfigurationChanges, PollSystemWideResourceUsage -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE -from tests.tools import Mock, MagicMock, patch, AgentTestCase, clear_singleton_instances +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE +from tests.lib.tools import Mock, MagicMock, patch, AgentTestCase, clear_singleton_instances def random_generator(size=6, chars=string.ascii_uppercase + string.digits + string.ascii_lowercase): @@ -198,7 +198,7 @@ def tearDown(self): self.get_protocol.stop() @patch('azurelinuxagent.common.event.EventLogger.add_metric') - @patch("azurelinuxagent.common.cgroupstelemetry.CGroupsTelemetry.poll_all_tracked") + @patch("azurelinuxagent.ga.cgroupstelemetry.CGroupsTelemetry.poll_all_tracked") def test_send_extension_metrics_telemetry(self, patch_poll_all_tracked, # pylint: disable=unused-argument patch_add_metric, *args): patch_poll_all_tracked.return_value = [MetricValue("Process", "% Processor Time", "service", 1), @@ -212,7 +212,7 @@ def test_send_extension_metrics_telemetry(self, patch_poll_all_tracked, # pylin self.assertEqual(4, patch_add_metric.call_count) # Four metrics being sent. @patch('azurelinuxagent.common.event.EventLogger.add_metric') - @patch("azurelinuxagent.common.cgroupstelemetry.CGroupsTelemetry.poll_all_tracked") + @patch("azurelinuxagent.ga.cgroupstelemetry.CGroupsTelemetry.poll_all_tracked") def test_send_extension_metrics_telemetry_for_empty_cgroup(self, patch_poll_all_tracked, # pylint: disable=unused-argument patch_add_metric, *args): patch_poll_all_tracked.return_value = [] @@ -222,7 +222,7 @@ def test_send_extension_metrics_telemetry_for_empty_cgroup(self, patch_poll_all_ self.assertEqual(0, patch_add_metric.call_count) @patch('azurelinuxagent.common.event.EventLogger.add_metric') - @patch("azurelinuxagent.common.cgroup.MemoryCgroup.get_memory_usage") + @patch("azurelinuxagent.ga.cgroup.MemoryCgroup.get_memory_usage") @patch('azurelinuxagent.common.logger.Logger.periodic_warn') def test_send_extension_metrics_telemetry_handling_memory_cgroup_exceptions_errno2(self, patch_periodic_warn, # pylint: disable=unused-argument patch_get_memory_usage, @@ -238,7 +238,7 @@ def test_send_extension_metrics_telemetry_handling_memory_cgroup_exceptions_errn self.assertEqual(0, patch_add_metric.call_count) # No metrics should be sent. @patch('azurelinuxagent.common.event.EventLogger.add_metric') - @patch("azurelinuxagent.common.cgroup.CpuCgroup.get_cpu_usage") + @patch("azurelinuxagent.ga.cgroup.CpuCgroup.get_cpu_usage") @patch('azurelinuxagent.common.logger.Logger.periodic_warn') def test_send_extension_metrics_telemetry_handling_cpu_cgroup_exceptions_errno2(self, patch_periodic_warn, # pylint: disable=unused-argument patch_cpu_usage, patch_add_metric, diff --git a/tests/ga/test_multi_config_extension.py b/tests/ga/test_multi_config_extension.py index 365052f5d..0fe8dea5a 100644 --- a/tests/ga/test_multi_config_extension.py +++ b/tests/ga/test_multi_config_extension.py @@ -13,12 +13,12 @@ from azurelinuxagent.common.utils import fileutil from azurelinuxagent.ga.exthandlers import get_exthandlers_handler, ExtensionStatusValue, ExtCommandEnvVariable, \ GoalStateStatus, ExtHandlerInstance -from tests.ga.extension_emulator import enable_invocations, extension_emulator, ExtensionCommandNames, Actions, \ +from tests.lib.extension_emulator import enable_invocations, extension_emulator, ExtensionCommandNames, Actions, \ extract_extension_info_from_command -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE, WireProtocolData -from tests.tools import AgentTestCase, mock_sleep, patch +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE, WireProtocolData +from tests.lib.tools import AgentTestCase, mock_sleep, patch class TestMultiConfigExtensionsConfigParsing(AgentTestCase): @@ -761,7 +761,7 @@ def mock_popen(cmd, *_, **kwargs): self.test_data['ext_conf'] = os.path.join(self._MULTI_CONFIG_TEST_DATA, "ext_conf_multi_config_no_dependencies.xml") with self._setup_test_env(mock_manifest=True) as (exthandlers_handler, protocol, no_of_extensions): - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): # Case 1: Check normal scenario - Install/Enable mc_handlers, sc_handler = self.__run_and_assert_generic_case(exthandlers_handler, protocol, no_of_extensions) @@ -924,7 +924,7 @@ def mock_popen(cmd, *_, **kwargs): self.test_data['ext_conf'] = os.path.join(self._MULTI_CONFIG_TEST_DATA, "ext_conf_multi_config_no_dependencies.xml") with self._setup_test_env(mock_manifest=True) as (exthandlers_handler, protocol, no_of_extensions): - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() self.assertEqual(no_of_extensions, @@ -1209,7 +1209,7 @@ def mock_popen(cmd, *_, **kwargs): return original_popen(cmd, *_, **kwargs) with self._setup_test_env(mock_manifest=True) as (exthandlers_handler, protocol, no_of_extensions): - with patch('azurelinuxagent.common.cgroupapi.subprocess.Popen', side_effect=mock_popen): + with patch('azurelinuxagent.ga.cgroupapi.subprocess.Popen', side_effect=mock_popen): exthandlers_handler.run() exthandlers_handler.report_ext_handlers_status() diff --git a/tests/ga/test_periodic_operation.py b/tests/ga/test_periodic_operation.py index 8fd8d32dc..65980a147 100644 --- a/tests/ga/test_periodic_operation.py +++ b/tests/ga/test_periodic_operation.py @@ -17,7 +17,7 @@ import datetime import time from azurelinuxagent.ga.monitor import PeriodicOperation -from tests.tools import AgentTestCase, patch, PropertyMock +from tests.lib.tools import AgentTestCase, patch, PropertyMock class TestPeriodicOperation(AgentTestCase): diff --git a/tests/common/test_persist_firewall_rules.py b/tests/ga/test_persist_firewall_rules.py similarity index 98% rename from tests/common/test_persist_firewall_rules.py rename to tests/ga/test_persist_firewall_rules.py index 307c8536e..5ee397baf 100644 --- a/tests/common/test_persist_firewall_rules.py +++ b/tests/ga/test_persist_firewall_rules.py @@ -25,10 +25,10 @@ import azurelinuxagent.common.conf as conf from azurelinuxagent.common.future import ustr from azurelinuxagent.common.osutil.default import DefaultOSUtil -from azurelinuxagent.common.persist_firewall_rules import PersistFirewallRulesHandler +from azurelinuxagent.ga.persist_firewall_rules import PersistFirewallRulesHandler from azurelinuxagent.common.utils import fileutil, shellutil from azurelinuxagent.common.utils.networkutil import AddFirewallRules, FirewallCmdDirectCommands -from tests.tools import AgentTestCase, MagicMock, patch +from tests.lib.tools import AgentTestCase, MagicMock, patch class TestPersistFirewallRulesHandler(AgentTestCase): @@ -90,9 +90,9 @@ def _get_persist_firewall_rules_handler(self, systemd=True): # Just for these tests, ignoring the mode of mkdir to allow non-sudo tests orig_mkdir = fileutil.mkdir - with patch("azurelinuxagent.common.persist_firewall_rules.fileutil.mkdir", + with patch("azurelinuxagent.ga.persist_firewall_rules.fileutil.mkdir", side_effect=lambda path, **mode: orig_mkdir(path)): - with patch("azurelinuxagent.common.persist_firewall_rules.get_osutil", return_value=osutil): + with patch("azurelinuxagent.ga.persist_firewall_rules.get_osutil", return_value=osutil): with patch('azurelinuxagent.common.osutil.systemd.is_systemd', return_value=systemd): with patch("azurelinuxagent.common.utils.shellutil.subprocess.Popen", side_effect=self.__mock_popen): yield PersistFirewallRulesHandler(self.__test_dst_ip, self.__test_uid) @@ -343,7 +343,7 @@ def mock_write_file(path, _, *__): test_files = [self._binary_file, self._network_service_unit_file] for file_to_fail in test_files: files_to_fail = [file_to_fail] - with patch("azurelinuxagent.common.persist_firewall_rules.fileutil.write_file", + with patch("azurelinuxagent.ga.persist_firewall_rules.fileutil.write_file", side_effect=mock_write_file): with self.assertRaises(Exception) as context_manager: handler.setup() diff --git a/tests/ga/test_remoteaccess.py b/tests/ga/test_remoteaccess.py index 069931a15..f0e2ff266 100644 --- a/tests/ga/test_remoteaccess.py +++ b/tests/ga/test_remoteaccess.py @@ -17,9 +17,9 @@ import xml from azurelinuxagent.common.protocol.goal_state import GoalState, RemoteAccess # pylint: disable=unused-import -from tests.tools import AgentTestCase, load_data, patch, Mock # pylint: disable=unused-import -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol +from tests.lib.tools import AgentTestCase, load_data, patch, Mock # pylint: disable=unused-import +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol class TestRemoteAccess(AgentTestCase): @@ -34,7 +34,7 @@ def test_parse_remote_access(self): self.assertEqual("2019-01-01", remote_access.user_list.users[0].expiration, "Expiration does not match.") def test_goal_state_with_no_remote_access(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: self.assertIsNone(protocol.client.get_remote_access()) def test_parse_two_remote_access_accounts(self): @@ -75,7 +75,7 @@ def test_parse_zero_remote_access_accounts(self): self.assertEqual(0, len(remote_access.user_list.users), "User count does not match.") def test_update_remote_access_conf_remote_access(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_REMOTE_ACCESS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_REMOTE_ACCESS) as protocol: self.assertIsNotNone(protocol.client.get_remote_access()) self.assertEqual(1, len(protocol.client.get_remote_access().user_list.users)) self.assertEqual('testAccount', protocol.client.get_remote_access().user_list.users[0].name) diff --git a/tests/ga/test_remoteaccess_handler.py b/tests/ga/test_remoteaccess_handler.py index 37187702e..d4f157926 100644 --- a/tests/ga/test_remoteaccess_handler.py +++ b/tests/ga/test_remoteaccess_handler.py @@ -22,9 +22,9 @@ from azurelinuxagent.common.protocol.util import ProtocolUtil from azurelinuxagent.common.protocol.wire import WireProtocol from azurelinuxagent.ga.remoteaccess import RemoteAccessHandler -from tests.tools import AgentTestCase, load_data, patch, clear_singleton_instances -from tests.protocol.mocks import mock_wire_protocol -from tests.protocol.mockwiredata import DATA_FILE, DATA_FILE_REMOTE_ACCESS +from tests.lib.tools import AgentTestCase, load_data, patch, clear_singleton_instances +from tests.lib.mock_wire_protocol import mock_wire_protocol +from tests.lib.wire_protocol_data import DATA_FILE, DATA_FILE_REMOTE_ACCESS class MockOSUtil(DefaultOSUtil): diff --git a/tests/ga/test_report_status.py b/tests/ga/test_report_status.py index c5a20b516..370bcb60f 100644 --- a/tests/ga/test_report_status.py +++ b/tests/ga/test_report_status.py @@ -3,13 +3,15 @@ import json +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from azurelinuxagent.ga.agent_update_handler import get_agent_update_handler from azurelinuxagent.ga.exthandlers import ExtHandlersHandler from azurelinuxagent.ga.update import get_update_handler -from tests.ga.mocks import mock_update_handler -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.tools import AgentTestCase, patch -from tests.protocol import mockwiredata -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates +from tests.lib.mock_update_handler import mock_update_handler +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.tools import AgentTestCase, patch +from tests.lib import wire_protocol_data +from tests.lib.http_request_predicates import HttpRequestPredicates class ReportStatusTestCase(AgentTestCase): @@ -30,73 +32,76 @@ def http_get_handler(url, *_, **__): def on_new_iteration(iteration): fail_goal_state_request[0] = iteration == 2 - with mock_wire_protocol(mockwiredata.DATA_FILE, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE, http_get_handler=http_get_handler) as protocol: exthandlers_handler = ExtHandlersHandler(protocol) with patch.object(exthandlers_handler, "run", wraps=exthandlers_handler.run) as exthandlers_handler_run: with mock_update_handler(protocol, iterations=2, on_new_iteration=on_new_iteration, exthandlers_handler=exthandlers_handler) as update_handler: - update_handler.run(debug=True) - - self.assertEqual(1, exthandlers_handler_run.call_count, "Extensions should have been executed only once.") - self.assertEqual(2, len(protocol.mock_wire_data.status_blobs), "Status should have been reported for the 2 iterations.") - - # - # Verify that we reported status for the extension in the test data - # - first_status = json.loads(protocol.mock_wire_data.status_blobs[0]) - - handler_aggregate_status = first_status.get('aggregateStatus', {}).get("handlerAggregateStatus") - self.assertIsNotNone(handler_aggregate_status, "Could not find the handlerAggregateStatus") - self.assertEqual(1, len(handler_aggregate_status), "Expected 1 extension status. Got: {0}".format(handler_aggregate_status)) - extension_status = handler_aggregate_status[0] - self.assertEqual("OSTCExtensions.ExampleHandlerLinux", extension_status["handlerName"], "The status does not correspond to the test data") - - # - # Verify that we reported the same status (minus timestamps) in the 2 iterations - # - second_status = json.loads(protocol.mock_wire_data.status_blobs[1]) - - def remove_timestamps(x): - if isinstance(x, list): - for v in x: - remove_timestamps(v) - elif isinstance(x, dict): - for k, v in x.items(): - if k == "timestampUTC": - x[k] = '' - else: + with patch("azurelinuxagent.common.version.get_daemon_version", return_value=FlexibleVersion("2.2.53")): + update_handler.run(debug=True) + + self.assertEqual(1, exthandlers_handler_run.call_count, "Extensions should have been executed only once.") + self.assertEqual(2, len(protocol.mock_wire_data.status_blobs), "Status should have been reported for the 2 iterations.") + + # + # Verify that we reported status for the extension in the test data + # + first_status = json.loads(protocol.mock_wire_data.status_blobs[0]) + + handler_aggregate_status = first_status.get('aggregateStatus', {}).get("handlerAggregateStatus") + self.assertIsNotNone(handler_aggregate_status, "Could not find the handlerAggregateStatus") + self.assertEqual(1, len(handler_aggregate_status), "Expected 1 extension status. Got: {0}".format(handler_aggregate_status)) + extension_status = handler_aggregate_status[0] + self.assertEqual("OSTCExtensions.ExampleHandlerLinux", extension_status["handlerName"], "The status does not correspond to the test data") + + # + # Verify that we reported the same status (minus timestamps) in the 2 iterations + # + second_status = json.loads(protocol.mock_wire_data.status_blobs[1]) + + def remove_timestamps(x): + if isinstance(x, list): + for v in x: remove_timestamps(v) + elif isinstance(x, dict): + for k, v in x.items(): + if k == "timestampUTC": + x[k] = '' + else: + remove_timestamps(v) - remove_timestamps(first_status) - remove_timestamps(second_status) + remove_timestamps(first_status) + remove_timestamps(second_status) - self.assertEqual(first_status, second_status) + self.assertEqual(first_status, second_status) def test_report_status_should_log_errors_only_once_per_goal_state(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=False): # skip agent update with patch("azurelinuxagent.ga.update.logger.warn") as logger_warn: - update_handler = get_update_handler() - update_handler._goal_state = protocol.get_goal_state() # these tests skip the initialization of the goal state. so do that here - exthandlers_handler = ExtHandlersHandler(protocol) - update_handler._report_status(exthandlers_handler) - self.assertEqual(0, logger_warn.call_count, "UpdateHandler._report_status() should not report WARNINGS when there are no errors") + with patch("azurelinuxagent.common.version.get_daemon_version", return_value=FlexibleVersion("2.2.53")): + update_handler = get_update_handler() + update_handler._goal_state = protocol.get_goal_state() # these tests skip the initialization of the goal state. so do that here + exthandlers_handler = ExtHandlersHandler(protocol) + agent_update_handler = get_agent_update_handler(protocol) + update_handler._report_status(exthandlers_handler, agent_update_handler) + self.assertEqual(0, logger_warn.call_count, "UpdateHandler._report_status() should not report WARNINGS when there are no errors") - with patch("azurelinuxagent.ga.update.ExtensionsSummary.__init__", side_effect=Exception("TEST EXCEPTION")): # simulate an error during _report_status() - get_warnings = lambda: [args[0] for args, _ in logger_warn.call_args_list if "TEST EXCEPTION" in args[0]] + with patch("azurelinuxagent.ga.update.ExtensionsSummary.__init__", side_effect=Exception("TEST EXCEPTION")): # simulate an error during _report_status() + get_warnings = lambda: [args[0] for args, _ in logger_warn.call_args_list if "TEST EXCEPTION" in args[0]] - update_handler._report_status(exthandlers_handler) - update_handler._report_status(exthandlers_handler) - update_handler._report_status(exthandlers_handler) + update_handler._report_status(exthandlers_handler, agent_update_handler) + update_handler._report_status(exthandlers_handler, agent_update_handler) + update_handler._report_status(exthandlers_handler, agent_update_handler) - self.assertEqual(1, len(get_warnings()), "UpdateHandler._report_status() should report only 1 WARNING when there are multiple errors within the same goal state") + self.assertEqual(1, len(get_warnings()), "UpdateHandler._report_status() should report only 1 WARNING when there are multiple errors within the same goal state") - exthandlers_handler.protocol.mock_wire_data.set_incarnation(999) - update_handler._try_update_goal_state(exthandlers_handler.protocol) - update_handler._report_status(exthandlers_handler) - self.assertEqual(2, len(get_warnings()), "UpdateHandler._report_status() should continue reporting errors after a new goal state") + exthandlers_handler.protocol.mock_wire_data.set_incarnation(999) + update_handler._try_update_goal_state(exthandlers_handler.protocol) + update_handler._report_status(exthandlers_handler, agent_update_handler) + self.assertEqual(2, len(get_warnings()), "UpdateHandler._report_status() should continue reporting errors after a new goal state") def test_update_handler_should_add_fast_track_to_supported_features_when_it_is_supported(self): - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS) as protocol: self._test_supported_features_includes_fast_track(protocol, True) def test_update_handler_should_not_add_fast_track_to_supported_features_when_it_is_not_supported(self): @@ -105,7 +110,7 @@ def http_get_handler(url, *_, **__): return MockHttpResponse(status=404) return None - with mock_wire_protocol(mockwiredata.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE_VM_SETTINGS, http_get_handler=http_get_handler) as protocol: self._test_supported_features_includes_fast_track(protocol, False) def _test_supported_features_includes_fast_track(self, protocol, expected): diff --git a/tests/ga/test_send_telemetry_events.py b/tests/ga/test_send_telemetry_events.py index 005a07b09..a9c87dde9 100644 --- a/tests/ga/test_send_telemetry_events.py +++ b/tests/ga/test_send_telemetry_events.py @@ -42,11 +42,11 @@ from azurelinuxagent.ga.collect_telemetry_events import _CollectAndEnqueueEvents from azurelinuxagent.ga.send_telemetry_events import get_send_telemetry_events_handler from tests.ga.test_monitor import random_generator -from tests.protocol.mocks import MockHttpResponse, mock_wire_protocol -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.protocol.mockwiredata import DATA_FILE -from tests.tools import AgentTestCase, clear_singleton_instances, mock_sleep -from tests.utils.event_logger_tools import EventLoggerTools +from tests.lib.mock_wire_protocol import MockHttpResponse, mock_wire_protocol +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.wire_protocol_data import DATA_FILE +from tests.lib.tools import AgentTestCase, clear_singleton_instances, mock_sleep +from tests.lib.event_logger_tools import EventLoggerTools class TestSendTelemetryEventsHandler(AgentTestCase, HttpRequestPredicates): @@ -368,13 +368,13 @@ def test_it_should_enqueue_and_send_events_properly(self, mock_lib_dir, *_): '' \ '' \ '' \ - '' \ '' \ '' \ '' \ '' \ '' \ '' \ + '' \ '' \ '' \ '' \ @@ -385,7 +385,7 @@ def test_it_should_enqueue_and_send_events_properly(self, mock_lib_dir, *_): '' \ ']]>'.format(AGENT_VERSION, TestSendTelemetryEventsHandler._TEST_EVENT_OPERATION, CURRENT_AGENT, test_opcodename, test_eventtid, test_eventpid, test_taskname, osversion, int(osutil.get_total_mem()), - osutil.get_processor_cores()).encode('utf-8') + osutil.get_processor_cores(), json.dumps({"CpuArchitecture": platform.machine()})).encode('utf-8') self.assertIn(sample_message, collected_event) diff --git a/tests/ga/test_update.py b/tests/ga/test_update.py index e5f15fbd0..6caa21f3c 100644 --- a/tests/ga/test_update.py +++ b/tests/ga/test_update.py @@ -20,6 +20,8 @@ from datetime import datetime, timedelta from threading import current_thread +from azurelinuxagent.ga.guestagent import GuestAgent, GuestAgentError, \ + AGENT_ERROR_FILE from tests.common.osutil.test_default import TestOSUtil import azurelinuxagent.common.osutil.default as osutil @@ -27,34 +29,34 @@ from azurelinuxagent.common import conf from azurelinuxagent.common.event import EVENTS_DIRECTORY, WALAEventOperation -from azurelinuxagent.common.exception import ProtocolError, UpdateError, HttpError, \ +from azurelinuxagent.common.exception import HttpError, \ ExitException, AgentMemoryExceededException from azurelinuxagent.common.future import ustr, httpclient -from azurelinuxagent.common.persist_firewall_rules import PersistFirewallRulesHandler +from azurelinuxagent.ga.persist_firewall_rules import PersistFirewallRulesHandler from azurelinuxagent.common.protocol.hostplugin import HostPluginProtocol from azurelinuxagent.common.protocol.restapi import VMAgentFamily, \ ExtHandlerPackage, ExtHandlerPackageList, Extension, VMStatus, ExtHandlerStatus, ExtensionStatus, \ VMAgentUpdateStatuses from azurelinuxagent.common.protocol.util import ProtocolUtil -from azurelinuxagent.common.utils import fileutil, textutil, timeutil +from azurelinuxagent.common.utils import fileutil, textutil, timeutil, shellutil from azurelinuxagent.common.utils.archive import ARCHIVE_DIRECTORY_NAME, AGENT_STATUS_FILE from azurelinuxagent.common.utils.flexible_version import FlexibleVersion from azurelinuxagent.common.utils.networkutil import FirewallCmdDirectCommands, AddFirewallRules from azurelinuxagent.common.version import AGENT_PKG_GLOB, AGENT_DIR_GLOB, AGENT_NAME, AGENT_DIR_PATTERN, \ - AGENT_VERSION, CURRENT_AGENT, CURRENT_VERSION, set_daemon_version, \ - __DAEMON_VERSION_ENV_VARIABLE as DAEMON_VERSION_ENV_VARIABLE + AGENT_VERSION, CURRENT_AGENT, CURRENT_VERSION, set_daemon_version, __DAEMON_VERSION_ENV_VARIABLE as DAEMON_VERSION_ENV_VARIABLE from azurelinuxagent.ga.exthandlers import ExtHandlersHandler, ExtHandlerInstance, HandlerEnvironment, ExtensionStatusValue -from azurelinuxagent.ga.update import GuestAgent, GuestAgentError, MAX_FAILURE, AGENT_MANIFEST_FILE, \ - get_update_handler, ORPHAN_POLL_INTERVAL, AGENT_PARTITION_FILE, AGENT_ERROR_FILE, ORPHAN_WAIT_INTERVAL, \ +from azurelinuxagent.ga.update import \ + get_update_handler, ORPHAN_POLL_INTERVAL, AGENT_PARTITION_FILE, ORPHAN_WAIT_INTERVAL, \ CHILD_LAUNCH_RESTART_MAX, CHILD_HEALTH_INTERVAL, GOAL_STATE_PERIOD_EXTENSIONS_DISABLED, UpdateHandler, \ - READONLY_FILE_GLOBS, ExtensionsSummary, AgentUpgradeType -from tests.ga.mocks import mock_update_handler -from tests.protocol.mocks import mock_wire_protocol, MockHttpResponse -from tests.protocol.mockwiredata import DATA_FILE, DATA_FILE_MULTIPLE_EXT, DATA_FILE_VM_SETTINGS -from tests.tools import AgentTestCase, AgentTestCaseWithGetVmSizeMock, data_dir, DEFAULT, patch, load_bin_data, Mock, MagicMock, \ + READONLY_FILE_GLOBS, ExtensionsSummary +from tests.lib.mock_update_handler import mock_update_handler +from tests.lib.mock_wire_protocol import mock_wire_protocol, MockHttpResponse +from tests.lib.wire_protocol_data import DATA_FILE, DATA_FILE_MULTIPLE_EXT, DATA_FILE_VM_SETTINGS +from tests.lib.tools import AgentTestCase, AgentTestCaseWithGetVmSizeMock, data_dir, DEFAULT, patch, load_bin_data, Mock, MagicMock, \ clear_singleton_instances, is_python_version_26_or_34, skip_if_predicate_true -from tests.protocol import mockwiredata -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates +from tests.lib import wire_protocol_data +from tests.lib.http_request_predicates import HttpRequestPredicates + NO_ERROR = { "last_failure": 0.0, @@ -99,7 +101,7 @@ def faux_logger(): @contextlib.contextmanager -def _get_update_handler(iterations=1, test_data=None, protocol=None): +def _get_update_handler(iterations=1, test_data=None, protocol=None, autoupdate_enabled=True): """ This function returns a mocked version of the UpdateHandler object to be used for testing. It will only run the main loop [iterations] no of times. @@ -110,10 +112,10 @@ def _get_update_handler(iterations=1, test_data=None, protocol=None): with patch.object(HostPluginProtocol, "is_default_channel", False): if protocol is None: with mock_wire_protocol(test_data) as mock_protocol: - with mock_update_handler(mock_protocol, iterations=iterations, autoupdate_enabled=True) as update_handler: + with mock_update_handler(mock_protocol, iterations=iterations, autoupdate_enabled=autoupdate_enabled) as update_handler: yield update_handler, mock_protocol else: - with mock_update_handler(protocol, iterations=iterations, autoupdate_enabled=True) as update_handler: + with mock_update_handler(protocol, iterations=iterations, autoupdate_enabled=autoupdate_enabled) as update_handler: yield update_handler, protocol @@ -133,11 +135,16 @@ def setUpClass(cls): source = os.path.join(data_dir, "ga", sample_agent_zip) target = os.path.join(UpdateTestCase._agent_zip_dir, test_agent_zip) shutil.copyfile(source, target) + # The update_handler inherently calls agent update handler, which in turn calls daemon version. So now daemon version logic has fallback if env variable is not set. + # The fallback calls popen which is not mocked. So we set the env variable to avoid the fallback. + # This will not change any of the test validations. At the ene of all update test validations, we reset the env variable. + set_daemon_version("1.2.3.4") @classmethod def tearDownClass(cls): super(UpdateTestCase, cls).tearDownClass() shutil.rmtree(UpdateTestCase._test_suite_tmp_dir) + os.environ.pop(DAEMON_VERSION_ENV_VARIABLE) @staticmethod def _get_agent_pkgs(in_dir=None): @@ -315,302 +322,6 @@ def replicate_agents(self, return dst_v -class TestGuestAgentError(UpdateTestCase): - def test_creation(self): - self.assertRaises(TypeError, GuestAgentError) - self.assertRaises(UpdateError, GuestAgentError, None) - - with self.get_error_file(error_data=WITH_ERROR) as path: - err = GuestAgentError(path.name) - err.load() - self.assertEqual(path.name, err.path) - self.assertNotEqual(None, err) - - self.assertEqual(WITH_ERROR["last_failure"], err.last_failure) - self.assertEqual(WITH_ERROR["failure_count"], err.failure_count) - self.assertEqual(WITH_ERROR["was_fatal"], err.was_fatal) - return - - def test_clear(self): - with self.get_error_file(error_data=WITH_ERROR) as path: - err = GuestAgentError(path.name) - err.load() - self.assertEqual(path.name, err.path) - self.assertNotEqual(None, err) - - err.clear() - self.assertEqual(NO_ERROR["last_failure"], err.last_failure) - self.assertEqual(NO_ERROR["failure_count"], err.failure_count) - self.assertEqual(NO_ERROR["was_fatal"], err.was_fatal) - return - - def test_save(self): - err1 = self.create_error() - err1.mark_failure() - err1.mark_failure(is_fatal=True) - - err2 = self.create_error(err1.to_json()) - self.assertEqual(err1.last_failure, err2.last_failure) - self.assertEqual(err1.failure_count, err2.failure_count) - self.assertEqual(err1.was_fatal, err2.was_fatal) - - def test_mark_failure(self): - err = self.create_error() - self.assertFalse(err.is_blacklisted) - - for i in range(0, MAX_FAILURE): # pylint: disable=unused-variable - err.mark_failure() - - # Agent failed >= MAX_FAILURE, it should be blacklisted - self.assertTrue(err.is_blacklisted) - self.assertEqual(MAX_FAILURE, err.failure_count) - return - - def test_mark_failure_permanent(self): - err = self.create_error() - - self.assertFalse(err.is_blacklisted) - - # Fatal errors immediately blacklist - err.mark_failure(is_fatal=True) - self.assertTrue(err.is_blacklisted) - self.assertTrue(err.failure_count < MAX_FAILURE) - return - - def test_str(self): - err = self.create_error(error_data=NO_ERROR) - s = "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( - NO_ERROR["last_failure"], - NO_ERROR["failure_count"], - NO_ERROR["was_fatal"], - NO_ERROR["reason"]) - self.assertEqual(s, str(err)) - - err = self.create_error(error_data=WITH_ERROR) - s = "Last Failure: {0}, Total Failures: {1}, Fatal: {2}, Reason: {3}".format( - WITH_ERROR["last_failure"], - WITH_ERROR["failure_count"], - WITH_ERROR["was_fatal"], - WITH_ERROR["reason"]) - self.assertEqual(s, str(err)) - return - - -class TestGuestAgent(UpdateTestCase): - def setUp(self): - UpdateTestCase.setUp(self) - self.copy_agents(self._get_agent_file_path()) - self.agent_path = os.path.join(self.tmp_dir, self._get_agent_name()) - - def test_creation(self): - with self.assertRaises(UpdateError): - GuestAgent.from_installed_agent("A very bad file name") - - with self.assertRaises(UpdateError): - GuestAgent.from_installed_agent("{0}-a.bad.version".format(AGENT_NAME)) - - self.expand_agents() - - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertNotEqual(None, agent) - self.assertEqual(self._get_agent_name(), agent.name) - self.assertEqual(self._get_agent_version(), agent.version) - - self.assertEqual(self.agent_path, agent.get_agent_dir()) - - path = os.path.join(self.agent_path, AGENT_MANIFEST_FILE) - self.assertEqual(path, agent.get_agent_manifest_path()) - - self.assertEqual( - os.path.join(self.agent_path, AGENT_ERROR_FILE), - agent.get_agent_error_file()) - - path = ".".join((os.path.join(conf.get_lib_dir(), self._get_agent_name()), "zip")) - self.assertEqual(path, agent.get_agent_pkg_path()) - - self.assertTrue(agent.is_downloaded) - self.assertFalse(agent.is_blacklisted) - self.assertTrue(agent.is_available) - - def test_clear_error(self): - self.expand_agents() - - agent = GuestAgent.from_installed_agent(self.agent_path) - agent.mark_failure(is_fatal=True) - - self.assertTrue(agent.error.last_failure > 0.0) - self.assertEqual(1, agent.error.failure_count) - self.assertTrue(agent.is_blacklisted) - self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) - - agent.clear_error() - self.assertEqual(0.0, agent.error.last_failure) - self.assertEqual(0, agent.error.failure_count) - self.assertFalse(agent.is_blacklisted) - self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) - - def test_is_available(self): - self.expand_agents() - - agent = GuestAgent.from_installed_agent(self.agent_path) - - self.assertTrue(agent.is_available) - agent.mark_failure(is_fatal=True) - self.assertFalse(agent.is_available) - - def test_is_blacklisted(self): - self.expand_agents() - - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertFalse(agent.is_blacklisted) - self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) - - agent.mark_failure(is_fatal=True) - self.assertTrue(agent.is_blacklisted) - self.assertEqual(agent.is_blacklisted, agent.error.is_blacklisted) - - def test_is_downloaded(self): - self.expand_agents() - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertTrue(agent.is_downloaded) - - def test_mark_failure(self): - agent = GuestAgent.from_installed_agent(self.agent_path) - - agent.mark_failure() - self.assertEqual(1, agent.error.failure_count) - - agent.mark_failure(is_fatal=True) - self.assertEqual(2, agent.error.failure_count) - self.assertTrue(agent.is_blacklisted) - - def test_load_manifest(self): - self.expand_agents() - agent = GuestAgent.from_installed_agent(self.agent_path) - agent._load_manifest() - self.assertEqual(agent.manifest.get_enable_command(), - agent.get_agent_cmd()) - - def test_load_manifest_missing(self): - self.expand_agents() - agent = GuestAgent.from_installed_agent(self.agent_path) - os.remove(agent.get_agent_manifest_path()) - self.assertRaises(UpdateError, agent._load_manifest) - - def test_load_manifest_is_empty(self): - self.expand_agents() - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertTrue(os.path.isfile(agent.get_agent_manifest_path())) - - with open(agent.get_agent_manifest_path(), "w") as file: # pylint: disable=redefined-builtin - json.dump(EMPTY_MANIFEST, file) - self.assertRaises(UpdateError, agent._load_manifest) - - def test_load_manifest_is_malformed(self): - self.expand_agents() - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertTrue(os.path.isfile(agent.get_agent_manifest_path())) - - with open(agent.get_agent_manifest_path(), "w") as file: # pylint: disable=redefined-builtin - file.write("This is not JSON data") - self.assertRaises(UpdateError, agent._load_manifest) - - def test_load_error(self): - agent = GuestAgent.from_installed_agent(self.agent_path) - agent.error = None - - agent._load_error() - self.assertTrue(agent.error is not None) - - def test_download(self): - self.remove_agents() - self.assertFalse(os.path.isdir(self.agent_path)) - - agent_uri = 'https://foo.blob.core.windows.net/bar/OSTCExtensions.WALinuxAgent__1.0.0' - - def http_get_handler(uri, *_, **__): - if uri == agent_uri: - response = load_bin_data(self._get_agent_file_name(), self._agent_zip_dir) - return MockHttpResponse(status=httpclient.OK, body=response) - return None - - pkg = ExtHandlerPackage(version=str(self._get_agent_version())) - pkg.uris.append(agent_uri) - - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: - protocol.set_http_handlers(http_get_handler=http_get_handler) - agent = GuestAgent.from_agent_package(pkg, protocol, False) - - self.assertTrue(os.path.isdir(agent.get_agent_dir())) - self.assertTrue(agent.is_downloaded) - - def test_download_fail(self): - self.remove_agents() - self.assertFalse(os.path.isdir(self.agent_path)) - - agent_uri = 'https://foo.blob.core.windows.net/bar/OSTCExtensions.WALinuxAgent__1.0.0' - - def http_get_handler(uri, *_, **__): - if uri in (agent_uri, 'http://168.63.129.16:32526/extensionArtifact'): - return MockHttpResponse(status=httpclient.SERVICE_UNAVAILABLE) - return None - - agent_version = self._get_agent_version() - pkg = ExtHandlerPackage(version=str(agent_version)) - pkg.uris.append(agent_uri) - - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: - protocol.set_http_handlers(http_get_handler=http_get_handler) - with patch("azurelinuxagent.ga.update.add_event") as add_event: - agent = GuestAgent.from_agent_package(pkg, protocol, False) - - self.assertFalse(os.path.isfile(self.agent_path)) - - messages = [kwargs['message'] for _, kwargs in add_event.call_args_list if kwargs['op'] == 'Install' and kwargs['is_success'] == False] - self.assertEqual(1, len(messages), "Expected exactly 1 install error/ Got: {0}".format(add_event.call_args_list)) - self.assertIn(str.format('[UpdateError] Unable to download Agent WALinuxAgent-{0}', agent_version), messages[0], "The install error does not include the expected message") - - self.assertFalse(agent.is_blacklisted, "Download failures should not blacklist the Agent") - - def test_invalid_agent_package_does_not_blacklist_the_agent(self): - agent_uri = 'https://foo.blob.core.windows.net/bar/OSTCExtensions.WALinuxAgent__9.9.9.9' - - def http_get_handler(uri, *_, **__): - if uri in (agent_uri, 'http://168.63.129.16:32526/extensionArtifact'): - response = load_bin_data("ga/WALinuxAgent-9.9.9.9-no_manifest.zip") - return MockHttpResponse(status=httpclient.OK, body=response) - return None - - pkg = ExtHandlerPackage(version="9.9.9.9") - pkg.uris.append(agent_uri) - - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: - protocol.set_http_handlers(http_get_handler=http_get_handler) - agent = GuestAgent.from_agent_package(pkg, protocol, False) - - self.assertFalse(agent.is_blacklisted, "The agent should not be blacklisted if unable to unpack/download") - self.assertFalse(os.path.exists(agent.get_agent_dir()), "Agent directory should be cleaned up") - - @patch("azurelinuxagent.ga.update.GuestAgent._download") - def test_ensure_download_skips_blacklisted(self, mock_download): - agent = GuestAgent.from_installed_agent(self.agent_path) - self.assertEqual(0, mock_download.call_count) - - agent.clear_error() - agent.mark_failure(is_fatal=True) - self.assertTrue(agent.is_blacklisted) - - pkg = ExtHandlerPackage(version=str(self._get_agent_version())) - pkg.uris.append(None) - # _download is mocked so there will be no http request; passing a None protocol - agent = GuestAgent.from_agent_package(pkg, None, False) - - self.assertEqual(1, agent.error.failure_count) - self.assertTrue(agent.error.was_fatal) - self.assertTrue(agent.is_blacklisted) - self.assertEqual(0, mock_download.call_count) - - class TestUpdate(UpdateTestCase): def setUp(self): UpdateTestCase.setUp(self) @@ -622,14 +333,11 @@ def setUp(self): self.update_handler._goal_state = Mock() self.update_handler._goal_state.extensions_goal_state = Mock() self.update_handler._goal_state.extensions_goal_state.source = "Fabric" - # Since ProtocolUtil is a singleton per thread, we need to clear it to ensure that the test cases do not reuse # a previous state clear_singleton_instances(ProtocolUtil) def test_creation(self): - self.assertEqual(None, self.update_handler.last_attempt_time) - self.assertEqual(0, len(self.update_handler.agents)) self.assertEqual(None, self.update_handler.child_agent) @@ -853,9 +561,6 @@ def test_get_latest_agent(self): def test_get_latest_agent_excluded(self): self.prepare_agent(AGENT_VERSION) - self.assertFalse(self._test_upgrade_available( - versions=self.agent_versions(), - count=1)) self.assertEqual(None, self.update_handler.get_latest_agent_greater_than_daemon()) def test_get_latest_agent_no_updates(self): @@ -981,12 +686,13 @@ def _test_run_latest(self, mock_child=None, mock_time=None, child_args=None): def test_run_latest(self): self.prepare_agents() - agent = self.update_handler.get_latest_agent_greater_than_daemon() - args, kwargs = self._test_run_latest() - args = args[0] - cmds = textutil.safe_shlex_split(agent.get_agent_cmd()) - if cmds[0].lower() == "python": - cmds[0] = sys.executable + with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=True): + agent = self.update_handler.get_latest_agent_greater_than_daemon() + args, kwargs = self._test_run_latest() + args = args[0] + cmds = textutil.safe_shlex_split(agent.get_agent_cmd()) + if cmds[0].lower() == "python": + cmds[0] = sys.executable self.assertEqual(args, cmds) self.assertTrue(len(args) > 1) @@ -1096,7 +802,8 @@ def test_run_latest_exception_blacklists(self): verify_string = "Force blacklisting: {0}".format(str(uuid.uuid4())) with patch('azurelinuxagent.ga.update.UpdateHandler.get_latest_agent_greater_than_daemon', return_value=latest_agent): - self._test_run_latest(mock_child=ChildMock(side_effect=Exception(verify_string))) + with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=True): + self._test_run_latest(mock_child=ChildMock(side_effect=Exception(verify_string))) self.assertFalse(latest_agent.is_available) self.assertTrue(latest_agent.error.is_blacklisted) @@ -1192,85 +899,6 @@ def test_shutdown_ignores_exceptions(self): except Exception as e: # pylint: disable=unused-variable self.assertTrue(False, "Unexpected exception") # pylint: disable=redundant-unittest-assert - def _test_upgrade_available( - self, - base_version=FlexibleVersion(AGENT_VERSION), - protocol=None, - versions=None, - count=20): - - if protocol is None: - protocol = self._create_protocol(count=count, versions=versions) - - self.update_handler.protocol_util = protocol - self.update_handler._goal_state = protocol.get_goal_state() - self.update_handler._goal_state.extensions_goal_state.is_outdated = False - conf.get_autoupdate_gafamily = Mock(return_value=protocol.family) - - return self.update_handler._download_agent_if_upgrade_available(protocol, base_version=base_version) - - def test_upgrade_available_returns_true_on_first_use(self): - self.assertTrue(self._test_upgrade_available()) - - def test_upgrade_available_handles_missing_family(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_missing_family.xml" - - with mock_wire_protocol(data_file) as protocol: - self.update_handler.protocol_util = protocol - with patch('azurelinuxagent.common.logger.warn') as mock_logger: - with patch('azurelinuxagent.common.protocol.goal_state.GoalState.fetch_agent_manifest', side_effect=ProtocolError): - self.assertFalse(self.update_handler._download_agent_if_upgrade_available(protocol, base_version=CURRENT_VERSION)) - self.assertEqual(0, mock_logger.call_count) - - def test_upgrade_available_includes_old_agents(self): - self.prepare_agents() - - old_version = self.agent_versions()[-1] - old_count = old_version.version[-1] - - self.replicate_agents(src_v=old_version, count=old_count, increment=-1) - all_count = len(self.agent_versions()) - - self.assertTrue(self._test_upgrade_available(versions=self.agent_versions())) - self.assertEqual(all_count, len(self.update_handler.agents)) - - def test_upgrade_available_purges_old_agents(self): - self.prepare_agents() - agent_count = self.agent_count() - self.assertEqual(20, agent_count) - - agent_versions = self.agent_versions()[:3] - self.assertTrue(self._test_upgrade_available(versions=agent_versions)) - self.assertEqual(len(agent_versions), len(self.update_handler.agents)) - - # Purging always keeps the running agent - if CURRENT_VERSION not in agent_versions: - agent_versions.append(CURRENT_VERSION) - self.assertEqual(agent_versions, self.agent_versions()) - - def test_upgrade_available_skips_if_too_frequent(self): - conf.get_autoupdate_frequency = Mock(return_value=10000) - self.update_handler.last_attempt_time = time.time() - self.assertFalse(self._test_upgrade_available()) - - def test_upgrade_available_skips_when_no_new_versions(self): - self.prepare_agents() - base_version = self.agent_versions()[0] + 1 - self.assertFalse(self._test_upgrade_available(base_version=base_version)) - - def test_upgrade_available_skips_when_no_versions(self): - self.assertFalse(self._test_upgrade_available(protocol=ProtocolMock())) - - def test_upgrade_available_sorts(self): - self.prepare_agents() - self._test_upgrade_available() - - v = FlexibleVersion("100000") - for a in self.update_handler.agents: - self.assertTrue(v > a.version) - v = a.version - def test_write_pid_file(self): for n in range(1112): fileutil.write_file(os.path.join(self.tmp_dir, str(n) + "_waagent.pid"), ustr(n + 1)) @@ -1295,7 +923,7 @@ def test_update_happens_when_extensions_disabled(self): behavior never changes. """ with patch('azurelinuxagent.common.conf.get_extensions_enabled', return_value=False): - with patch('azurelinuxagent.ga.update.UpdateHandler._download_agent_if_upgrade_available', return_value=True) as download_agent: + with patch('azurelinuxagent.ga.agent_update_handler.AgentUpdateHandler.run') as download_agent: with mock_wire_protocol(DATA_FILE) as protocol: with mock_update_handler(protocol, autoupdate_enabled=True) as update_handler: update_handler.run() @@ -1354,11 +982,10 @@ def match_expected_info(): match_unexpected_errors() # Match on errors first, they can provide more info. match_expected_info() - def test_it_should_recreate_handler_env_on_service_startup(self): iterations = 5 - with _get_update_handler(iterations) as (update_handler, protocol): + with _get_update_handler(iterations, autoupdate_enabled=False) as (update_handler, protocol): update_handler.run(debug=True) expected_handler = self._get_test_ext_handler_instance(protocol) @@ -1375,7 +1002,7 @@ def test_it_should_recreate_handler_env_on_service_startup(self): # re-runnning the update handler. Then,ensure that the HandlerEnvironment file is recreated with eventsFolder # flag in HandlerEnvironment.json file. self._add_write_permission_to_goal_state_files() - with _get_update_handler(iterations=1) as (update_handler, protocol): + with _get_update_handler(iterations=1, autoupdate_enabled=False) as (update_handler, protocol): with patch("azurelinuxagent.common.agent_supported_feature._ETPFeature.is_supported", True): update_handler.run(debug=True) @@ -1573,7 +1200,7 @@ def test_it_should_not_set_dns_tcp_iptable_if_drop_and_accept_available(self): @contextlib.contextmanager def _setup_test_for_ext_event_dirs_retention(self): try: - with _get_update_handler(test_data=DATA_FILE_MULTIPLE_EXT) as (update_handler, protocol): + with _get_update_handler(test_data=DATA_FILE_MULTIPLE_EXT, autoupdate_enabled=False) as (update_handler, protocol): with patch("azurelinuxagent.common.agent_supported_feature._ETPFeature.is_supported", True): update_handler.run(debug=True) expected_events_dirs = glob.glob(os.path.join(conf.get_ext_log_dir(), "*", EVENTS_DIRECTORY)) @@ -1623,49 +1250,50 @@ def test_it_should_recreate_extension_event_directories_for_existing_extensions_ def test_it_should_report_update_status_in_status_blob(self): with mock_wire_protocol(DATA_FILE) as protocol: - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - with patch.object(conf, "get_autoupdate_gafamily", return_value="Prod"): + with patch.object(conf, "get_autoupdate_gafamily", return_value="Prod"): + with patch("azurelinuxagent.common.conf.get_enable_ga_versioning", return_value=True): with patch("azurelinuxagent.common.logger.warn") as patch_warn: protocol.aggregate_status = None protocol.incarnation = 1 - def mock_http_put(url, *args, **_): + def get_handler(url, **kwargs): + if HttpRequestPredicates.is_agent_package_request(url): + return MockHttpResponse(status=httpclient.SERVICE_UNAVAILABLE) + return protocol.mock_wire_data.mock_http_get(url, **kwargs) + + def put_handler(url, *args, **_): if HttpRequestPredicates.is_host_plugin_status_request(url): # Skip reading the HostGA request data as its encoded return MockHttpResponse(status=500) protocol.aggregate_status = json.loads(args[0]) return MockHttpResponse(status=201) - def update_goal_state_and_run_handler(): + def update_goal_state_and_run_handler(autoupdate_enabled=True): protocol.incarnation += 1 protocol.mock_wire_data.set_incarnation(protocol.incarnation) self._add_write_permission_to_goal_state_files() - with _get_update_handler(iterations=1, protocol=protocol) as (update_handler, _): + with _get_update_handler(iterations=1, protocol=protocol, autoupdate_enabled=autoupdate_enabled) as (update_handler, _): update_handler.run(debug=True) self.assertEqual(0, update_handler.get_exit_code(), "Exit code should be 0; List of all warnings logged by the agent: {0}".format( patch_warn.call_args_list)) - protocol.set_http_handlers(http_put_handler=mock_http_put) - - # Case 1: No requested version in GS; updateStatus should not be reported - update_goal_state_and_run_handler() - self.assertFalse("updateStatus" in protocol.aggregate_status['aggregateStatus']['guestAgentStatus'], - "updateStatus should not be reported if not asked in GS") + protocol.set_http_handlers(http_get_handler=get_handler, http_put_handler=put_handler) - # Case 2: Requested version in GS != Current Version; updateStatus should be error - protocol.mock_wire_data.set_extension_config("wire/ext_conf_requested_version.xml") + # Case 1: rsm version missing in GS when vm opt-in for rsm upgrades; report missing rsm version error + protocol.mock_wire_data.set_extension_config("wire/ext_conf_version_missing_in_agent_family.xml") update_goal_state_and_run_handler() self.assertTrue("updateStatus" in protocol.aggregate_status['aggregateStatus']['guestAgentStatus'], - "updateStatus should be in status blob. Warns: {0}".format(patch_warn.call_args_list)) + "updateStatus should be reported") update_status = protocol.aggregate_status['aggregateStatus']['guestAgentStatus']["updateStatus"] self.assertEqual(VMAgentUpdateStatuses.Error, update_status['status'], "Status should be an error") - self.assertEqual(update_status['expectedVersion'], "9.9.9.10", "incorrect version reported") self.assertEqual(update_status['code'], 1, "incorrect code reported") + self.assertIn("missing version property. So, skipping agent update", update_status['formattedMessage']['message'], "incorrect message reported") - # Case 3: Requested version in GS == Current Version; updateStatus should be Success - protocol.mock_wire_data.set_extension_config_requested_version(str(CURRENT_VERSION)) + # Case 2: rsm version in GS == Current Version; updateStatus should be Success + protocol.mock_wire_data.set_extension_config("wire/ext_conf_rsm_version.xml") + protocol.mock_wire_data.set_version_in_agent_family(str(CURRENT_VERSION)) update_goal_state_and_run_handler() self.assertTrue("updateStatus" in protocol.aggregate_status['aggregateStatus']['guestAgentStatus'], "updateStatus should be reported if asked in GS") @@ -1674,11 +1302,16 @@ def update_goal_state_and_run_handler(): self.assertEqual(update_status['expectedVersion'], str(CURRENT_VERSION), "incorrect version reported") self.assertEqual(update_status['code'], 0, "incorrect code reported") - # Case 4: Requested version removed in GS; no updateStatus should be reported - protocol.mock_wire_data.reload() + # Case 3: rsm version in GS != Current Version; update fail and report error + protocol.mock_wire_data.set_extension_config("wire/ext_conf_rsm_version.xml") + protocol.mock_wire_data.set_version_in_agent_family("5.2.0.1") update_goal_state_and_run_handler() - self.assertFalse("updateStatus" in protocol.aggregate_status['aggregateStatus']['guestAgentStatus'], - "updateStatus should not be reported if not asked in GS") + self.assertTrue("updateStatus" in protocol.aggregate_status['aggregateStatus']['guestAgentStatus'], + "updateStatus should be in status blob. Warns: {0}".format(patch_warn.call_args_list)) + update_status = protocol.aggregate_status['aggregateStatus']['guestAgentStatus']["updateStatus"] + self.assertEqual(VMAgentUpdateStatuses.Error, update_status['status'], "Status should be an error") + self.assertEqual(update_status['expectedVersion'], "5.2.0.1", "incorrect version reported") + self.assertEqual(update_status['code'], 1, "incorrect code reported") def test_it_should_wait_to_fetch_first_goal_state(self): with _get_update_handler() as (update_handler, protocol): @@ -1707,26 +1340,63 @@ def get_handler(url, **kwargs): "Fetching the goal state recovered from previous errors." in args[0]] self.assertTrue(len(info_msgs) > 0, "Agent should've logged a message when recovered from GS errors") - def test_it_should_reset_legacy_blacklisted_agents_on_process_start(self): - # Add some good agents - self.prepare_agents(count=10) - good_agents = [agent.name for agent in self.agents()] - - # Add a set of blacklisted agents - self.prepare_agents(count=20, is_available=False) - for agent in self.agents(): - # Assert the test environment is correctly set - if agent.name not in good_agents: - self.assertTrue(agent.is_blacklisted, "Agent {0} should be blacklisted".format(agent.name)) - else: - self.assertFalse(agent.is_blacklisted, "Agent {0} should not be blacklisted".format(agent.name)) - with _get_update_handler() as (update_handler, _): - update_handler.run(debug=True) - self.assertEqual(20, self.agent_count(), "All agents should be available on disk") - # Ensure none of the agents are blacklisted - for agent in self.agents(): - self.assertFalse(agent.is_blacklisted, "Legacy Agent should not be blacklisted") +class TestUpdateWaitForCloudInit(AgentTestCase): + @staticmethod + @contextlib.contextmanager + def create_mock_run_command(delay=None): + def run_command_mock(cmd, *args, **kwargs): + if cmd == ["cloud-init", "status", "--wait"]: + if delay is not None: + original_run_command(['sleep', str(delay)], *args, **kwargs) + return "cloud-init completed" + return original_run_command(cmd, *args, **kwargs) + original_run_command = shellutil.run_command + + with patch("azurelinuxagent.ga.update.shellutil.run_command", side_effect=run_command_mock) as run_command_patch: + yield run_command_patch + + def test_it_should_not_wait_for_cloud_init_by_default(self): + update_handler = UpdateHandler() + with self.create_mock_run_command() as run_command_patch: + update_handler._wait_for_cloud_init() + self.assertTrue(run_command_patch.call_count == 0, "'cloud-init status --wait' should not be called by default") + + def test_it_should_wait_for_cloud_init_when_requested(self): + update_handler = UpdateHandler() + with patch("azurelinuxagent.ga.update.conf.get_wait_for_cloud_init", return_value=True): + with self.create_mock_run_command() as run_command_patch: + update_handler._wait_for_cloud_init() + self.assertEqual(1, run_command_patch.call_count, "'cloud-init status --wait' should have be called once") + + @skip_if_predicate_true(lambda: sys.version_info[0] == 2, "Timeouts are not supported on Python 2") + def test_it_should_enforce_timeout_waiting_for_cloud_init(self): + update_handler = UpdateHandler() + with patch("azurelinuxagent.ga.update.conf.get_wait_for_cloud_init", return_value=True): + with patch("azurelinuxagent.ga.update.conf.get_wait_for_cloud_init_timeout", return_value=1): + with self.create_mock_run_command(delay=5): + with patch("azurelinuxagent.ga.update.logger.error") as mock_logger: + update_handler._wait_for_cloud_init() + call_args = [args for args, _ in mock_logger.call_args_list if "An error occurred while waiting for cloud-init" in args[0]] + self.assertTrue( + len(call_args) == 1 and len(call_args[0]) == 1 and "command timeout" in call_args[0][0], + "Expected a timeout waiting for cloud-init. Log calls: {0}".format(mock_logger.call_args_list)) + + def test_update_handler_should_wait_for_cloud_init_after_agent_update_and_before_extension_processing(self): + method_calls = [] + + agent_update_handler = Mock() + agent_update_handler.run = lambda *_, **__: method_calls.append("AgentUpdateHandler.run()") + + exthandlers_handler = Mock() + exthandlers_handler.run = lambda *_, **__: method_calls.append("ExtHandlersHandler.run()") + + with mock_wire_protocol(DATA_FILE) as protocol: + with mock_update_handler(protocol, iterations=1, agent_update_handler=agent_update_handler, exthandlers_handler=exthandlers_handler) as update_handler: + with patch('azurelinuxagent.ga.update.UpdateHandler._wait_for_cloud_init', side_effect=lambda *_, **__: method_calls.append("UpdateHandler._wait_for_cloud_init()")): + update_handler.run() + + self.assertListEqual(["AgentUpdateHandler.run()", "UpdateHandler._wait_for_cloud_init()", "ExtHandlersHandler.run()"], method_calls, "Wait for cloud-init should happen after agent update and before extension processing") class UpdateHandlerRunTestCase(AgentTestCase): @@ -1776,11 +1446,6 @@ def _test_run(self, autoupdate_enabled=False, check_daemon_running=False, expect def test_run(self): self._test_run() - def test_run_stops_if_update_available(self): - with patch('azurelinuxagent.ga.update.UpdateHandler._download_agent_if_upgrade_available', return_value=True): - update_handler = self._test_run(autoupdate_enabled=True) - self.assertEqual(0, update_handler.get_iterations_completed()) - def test_run_stops_if_orphaned(self): with patch('os.getppid', return_value=1): update_handler = self._test_run(check_daemon_running=True) @@ -1791,7 +1456,7 @@ def test_run_clears_sentinel_on_successful_exit(self): self.assertFalse(os.path.isfile(update_handler._sentinel_file_path())) def test_run_leaves_sentinel_on_unsuccessful_exit(self): - with patch('azurelinuxagent.ga.update.UpdateHandler._download_agent_if_upgrade_available', side_effect=Exception): + with patch('azurelinuxagent.ga.agent_update_handler.AgentUpdateHandler.run', side_effect=Exception): update_handler = self._test_run(autoupdate_enabled=True,expected_exit_code=1) self.assertTrue(os.path.isfile(update_handler._sentinel_file_path())) @@ -1803,20 +1468,19 @@ def test_run_emits_restart_event(self): class TestAgentUpgrade(UpdateTestCase): @contextlib.contextmanager - def create_conf_mocks(self, hotfix_frequency, normal_frequency): + def create_conf_mocks(self, autoupdate_frequency, hotfix_frequency, normal_frequency): # Disabling extension processing to speed up tests as this class deals with testing agent upgrades with patch("azurelinuxagent.common.conf.get_extensions_enabled", return_value=False): - with patch("azurelinuxagent.common.conf.get_autoupdate_frequency", return_value=0.001): - with patch("azurelinuxagent.common.conf.get_hotfix_upgrade_frequency", - return_value=hotfix_frequency): - with patch("azurelinuxagent.common.conf.get_normal_upgrade_frequency", - return_value=normal_frequency): + with patch("azurelinuxagent.common.conf.get_autoupdate_frequency", return_value=autoupdate_frequency): + with patch("azurelinuxagent.common.conf.get_self_update_hotfix_frequency", return_value=hotfix_frequency): + with patch("azurelinuxagent.common.conf.get_self_update_regular_frequency", return_value=normal_frequency): with patch("azurelinuxagent.common.conf.get_autoupdate_gafamily", return_value="Prod"): - yield + with patch("azurelinuxagent.common.conf.get_enable_ga_versioning", return_value=True): + yield @contextlib.contextmanager - def __get_update_handler(self, iterations=1, test_data=None, hotfix_frequency=1.0, normal_frequency=2.0, - reload_conf=None): + def __get_update_handler(self, iterations=1, test_data=None, + reload_conf=None, autoupdate_frequency=0.001, hotfix_frequency=1.0, normal_frequency=2.0): test_data = DATA_FILE if test_data is None else test_data @@ -1842,32 +1506,23 @@ def put_handler(url, *args, **_): return MockHttpResponse(status=201) protocol.set_http_handlers(http_get_handler=get_handler, http_put_handler=put_handler) - with self.create_conf_mocks(hotfix_frequency, normal_frequency): - with patch("azurelinuxagent.ga.update.add_event") as mock_telemetry: + with self.create_conf_mocks(autoupdate_frequency, hotfix_frequency, normal_frequency): + with patch("azurelinuxagent.common.event.EventLogger.add_event") as mock_telemetry: update_handler._protocol = protocol yield update_handler, mock_telemetry def __assert_exit_code_successful(self, update_handler): self.assertEqual(0, update_handler.get_exit_code(), "Exit code should be 0") - def __assert_upgrade_telemetry_emitted_for_requested_version(self, mock_telemetry, upgrade=True, version="99999.0.0.0"): + def __assert_upgrade_telemetry_emitted(self, mock_telemetry, upgrade=True, version="9.9.9.10"): upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if - 'Exiting current process to {0} to the request Agent version {1}'.format( + 'Current Agent {0} completed all update checks, exiting current process to {1} to the new Agent version {2}'.format(CURRENT_VERSION, "upgrade" if upgrade else "downgrade", version) in kwarg['message'] and kwarg[ 'op'] == WALAEventOperation.AgentUpgrade] self.assertEqual(1, len(upgrade_event_msgs), "Did not find the event indicating that the agent was upgraded. Got: {0}".format( mock_telemetry.call_args_list)) - def __assert_upgrade_telemetry_emitted(self, mock_telemetry, upgrade_type=AgentUpgradeType.Normal): - upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if - '{0} Agent upgrade discovered, updating to WALinuxAgent-99999.0.0.0 -- exiting'.format( - upgrade_type) in kwarg['message'] and kwarg[ - 'op'] == WALAEventOperation.AgentUpgrade] - self.assertEqual(1, len(upgrade_event_msgs), - "Did not find the event indicating that the agent was upgraded. Got: {0}".format( - mock_telemetry.call_args_list)) - def __assert_agent_directories_available(self, versions): for version in versions: self.assertTrue(os.path.exists(self.agent_dir(version)), "Agent directory {0} not found".format(version)) @@ -1879,11 +1534,6 @@ def __assert_agent_directories_exist_and_others_dont_exist(self, versions): self.assertFalse(any(other_agents), "All other agents should be purged from agent dir: {0}".format(other_agents)) - def __assert_no_agent_upgrade_telemetry(self, mock_telemetry): - self.assertEqual(0, len([kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if - "Agent upgrade discovered, updating to" in kwarg['message'] and kwarg[ - 'op'] == WALAEventOperation.AgentUpgrade]), "Unwanted upgrade") - def __assert_ga_version_in_status(self, aggregate_status, version=str(CURRENT_VERSION)): self.assertIsNotNone(aggregate_status, "Status should be reported") self.assertEqual(aggregate_status['aggregateStatus']['guestAgentStatus']['version'], version, @@ -1892,169 +1542,107 @@ def __assert_ga_version_in_status(self, aggregate_status, version=str(CURRENT_VE "Guest Agent should be reported as Ready") def test_it_should_upgrade_agent_on_process_start_if_auto_upgrade_enabled(self): - with self.__get_update_handler(iterations=10) as (update_handler, mock_telemetry): - + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + with self.__get_update_handler(test_data=data_file, iterations=10) as (update_handler, mock_telemetry): update_handler.run(debug=True) self.__assert_exit_code_successful(update_handler) self.assertEqual(1, update_handler.get_iterations(), "Update handler should've exited after the first run") - self.__assert_agent_directories_available(versions=["99999.0.0.0"]) + self.__assert_agent_directories_available(versions=["9.9.9.10"]) self.__assert_upgrade_telemetry_emitted(mock_telemetry) - def test_it_should_download_new_agents_and_not_auto_upgrade_if_not_permitted(self): + def test_it_should_not_update_agent_with_rsm_if_gs_not_updated_in_next_attempts(self): no_of_iterations = 10 data_file = DATA_FILE.copy() - data_file['ga_manifest'] = "wire/ga_manifest_no_upgrade.xml" - - def reload_conf(url, protocol): - mock_wire_data = protocol.mock_wire_data - # This function reloads the conf mid-run to mimic an actual customer scenario - if HttpRequestPredicates.is_ga_manifest_request(url) and mock_wire_data.call_counts["manifest_of_ga.xml"] >= no_of_iterations/2: - reload_conf.call_count += 1 - # Ensure the first set of versions were downloaded as part of the first manifest - self.__assert_agent_directories_available(versions=["1.0.0", "1.1.0", "1.2.0"]) - # As per our current agent upgrade model, we don't rely on an incarnation update to upgrade the agent. Mocking the same - mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" - mock_wire_data.reload() - - reload_conf.call_count = 0 - - with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, hotfix_frequency=10, - normal_frequency=10, reload_conf=reload_conf) as (update_handler, mock_telemetry): + data_file['ext_conf'] = "wire/ext_conf_rsm_version.xml" + + self.prepare_agents(1) + test_frequency = 10 + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, + autoupdate_frequency=test_frequency) as (update_handler, _): + # Given version which will fail on first attempt, then rsm shouldn't make any futher attempts since GS is not updated + update_handler._protocol.mock_wire_data.set_version_in_agent_family("5.2.1.0") + update_handler._protocol.mock_wire_data.set_incarnation(2) update_handler.run(debug=True) - self.assertGreater(reload_conf.call_count, 0, "Ensure the conf reload was called") self.__assert_exit_code_successful(update_handler) self.assertEqual(no_of_iterations, update_handler.get_iterations(), "Update handler should've run its course") - # Ensure the new agent versions were also downloaded once the manifest was updated - self.__assert_agent_directories_available(versions=["2.0.0", "2.1.0", "99999.0.0.0"]) - self.__assert_no_agent_upgrade_telemetry(mock_telemetry) - - def test_it_should_upgrade_agent_in_given_time_window_if_permitted(self): - data_file = DATA_FILE.copy() - data_file['ga_manifest'] = "wire/ga_manifest_no_upgrade.xml" - - def reload_conf(url, protocol): - mock_wire_data = protocol.mock_wire_data - # This function reloads the conf mid-run to mimic an actual customer scenario - if HttpRequestPredicates.is_ga_manifest_request(url) and mock_wire_data.call_counts["manifest_of_ga.xml"] >= 2: - reload_conf.call_count += 1 - # Ensure no new agent available so far - self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), "New agent directory should not be found") - # As per our current agent upgrade model, we don't rely on an incarnation update to upgrade the agent. Mocking the same - mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" - mock_wire_data.reload() - - reload_conf.call_count = 0 - test_normal_frequency = 0.1 - with self.__get_update_handler(iterations=50, test_data=data_file, reload_conf=reload_conf, - normal_frequency=test_normal_frequency) as (update_handler, mock_telemetry): - start_time = time.time() - update_handler.run(debug=True) - diff = time.time() - start_time - - self.assertGreater(reload_conf.call_count, 0, "Ensure the conf reload was called") - self.__assert_exit_code_successful(update_handler) - self.assertGreaterEqual(update_handler.get_iterations(), 3, - "Update handler should've run at least until the new GA was available") - # A bare-bone check to ensure that the agent waited for the new agent at least for the preset frequency time - self.assertGreater(diff, test_normal_frequency, "The test run should be at least greater than the set frequency") - self.__assert_agent_directories_available(versions=["99999.0.0.0"]) - self.__assert_upgrade_telemetry_emitted(mock_telemetry) + self.assertFalse(os.path.exists(self.agent_dir("5.2.0.1")), + "New agent directory should not be found") + self.assertGreaterEqual(update_handler._protocol.mock_wire_data.call_counts["manifest_of_ga.xml"], 1, + "only 1 agent manifest call should've been made - 1 per incarnation") def test_it_should_not_auto_upgrade_if_auto_update_disabled(self): - with self.__get_update_handler(iterations=10) as (update_handler, mock_telemetry): + with self.__get_update_handler(iterations=10) as (update_handler, _): with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=False): update_handler.run(debug=True) self.__assert_exit_code_successful(update_handler) self.assertGreaterEqual(update_handler.get_iterations(), 10, "Update handler should've run 10 times") - self.__assert_no_agent_upgrade_telemetry(mock_telemetry) self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), "New agent directory should not be found") - def test_it_should_not_auto_upgrade_if_corresponding_time_not_elapsed(self): - # On Normal upgrade, should not upgrade if Hotfix time elapsed - no_of_iterations = 10 - data_file = DATA_FILE.copy() - data_file['ga_manifest'] = "wire/ga_manifest_no_upgrade.xml" - - def reload_conf(url, protocol): - mock_wire_data = protocol.mock_wire_data - # This function reloads the conf mid-run to mimic an actual customer scenario - if HttpRequestPredicates.is_ga_manifest_request(url) and mock_wire_data.call_counts["manifest_of_ga.xml"] >= no_of_iterations / 2: - reload_conf.call_count += 1 - # As per our current agent upgrade model, we don't rely on an incarnation update to upgrade the agent. Mocking the same - mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" - mock_wire_data.reload() - - reload_conf.call_count = 0 - - with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, hotfix_frequency=0.01, - normal_frequency=10, reload_conf=reload_conf) as (update_handler, mock_telemetry): + def test_it_should_download_only_rsm_version_if_available(self): + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): update_handler.run(debug=True) - self.assertGreater(reload_conf.call_count, 0, "Ensure the conf reload was called") - self.__assert_exit_code_successful(update_handler) - self.assertEqual(no_of_iterations, update_handler.get_iterations(), "Update handler didn't run completely") - self.__assert_no_agent_upgrade_telemetry(mock_telemetry) - upgrade_event_msgs = [kwarg['message'] for _, kwarg in mock_telemetry.call_args_list if - kwarg['op'] == WALAEventOperation.AgentUpgrade] - self.assertGreater(len([msg for msg in upgrade_event_msgs if - 'Discovered new {0} upgrade WALinuxAgent-99999.0.0.0; Will upgrade on or after'.format( - AgentUpgradeType.Normal) in msg]), 0, "Error message not propagated properly") - - def test_it_should_download_only_requested_version_if_available(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" + self.__assert_exit_code_successful(update_handler) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="9.9.9.10") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10"]) + + def test_it_should_download_largest_version_if_ga_versioning_disabled(self): + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): + with patch.object(conf, "get_enable_ga_versioning", return_value=False): update_handler.run(debug=True) - self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted_for_requested_version(mock_telemetry, version="9.9.9.10") - self.__assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10"]) + self.__assert_exit_code_successful(update_handler) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="99999.0.0.0") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["99999.0.0.0"]) - def test_it_should_cleanup_all_agents_except_requested_version_and_current_version(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" + def test_it_should_cleanup_all_agents_except_rsm_version_and_current_version(self): + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" # Set the test environment by adding 20 random agents to the agent directory self.prepare_agents() self.assertEqual(20, self.agent_count(), "Agent directories not set properly") with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler.run(debug=True) + update_handler.run(debug=True) - self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted_for_requested_version(mock_telemetry, version="9.9.9.10") - self.__assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10", str(CURRENT_VERSION)]) + self.__assert_exit_code_successful(update_handler) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="9.9.9.10") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["9.9.9.10", str(CURRENT_VERSION)]) - def test_it_should_not_update_if_requested_version_not_found_in_manifest(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_missing_requested_version.xml" + def test_it_should_not_update_if_rsm_version_not_found_in_manifest(self): + self.prepare_agents(1) + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_version_missing_in_manifest.xml" with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler.run(debug=True) + update_handler.run(debug=True) - self.__assert_exit_code_successful(update_handler) - self.__assert_no_agent_upgrade_telemetry(mock_telemetry) - agent_msgs = [kwarg for _, kwarg in mock_telemetry.call_args_list if - kwarg['op'] in (WALAEventOperation.AgentUpgrade, WALAEventOperation.Download)] - # This will throw if corresponding message not found so not asserting on that - requested_version_found = next(kwarg for kwarg in agent_msgs if - "Found requested version in manifest: 5.2.1.0 for goal state incarnation_1" in kwarg['message']) - self.assertTrue(requested_version_found['is_success'], - "The requested version found op should be reported as a success") - - skipping_update = next(kwarg for kwarg in agent_msgs if - "No matching package found in the agent manifest for requested version: 5.2.1.0 in goal state incarnation_1, skipping agent update" in kwarg['message']) - self.assertEqual(skipping_update['version'], FlexibleVersion("5.2.1.0"), - "The not found message should be reported from requested agent version") - self.assertFalse(skipping_update['is_success'], "The not found op should be reported as a failure") - - def test_it_should_only_try_downloading_requested_version_on_new_incarnation(self): + self.__assert_exit_code_successful(update_handler) + self.__assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + agent_msgs = [kwarg for _, kwarg in mock_telemetry.call_args_list if + kwarg['op'] in (WALAEventOperation.AgentUpgrade, WALAEventOperation.Download)] + # This will throw if corresponding message not found so not asserting on that + rsm_version_found = next(kwarg for kwarg in agent_msgs if + "New agent version:5.2.1.0 requested by RSM in Goal state incarnation_1, will update the agent before processing the goal state" in kwarg['message']) + self.assertTrue(rsm_version_found['is_success'], + "The rsm version found op should be reported as a success") + + skipping_update = next(kwarg for kwarg in agent_msgs if + "No matching package found in the agent manifest for version: 5.2.1.0 in goal state incarnation: incarnation_1, skipping agent update" in kwarg['message']) + self.assertEqual(skipping_update['version'], str(CURRENT_VERSION), + "The not found message should be reported from current agent version") + self.assertFalse(skipping_update['is_success'], "The not found op should be reported as a failure") + + def test_it_should_try_downloading_rsm_version_on_new_incarnation(self): no_of_iterations = 1000 # Set the test environment by adding 20 random agents to the agent directory @@ -2069,10 +1657,10 @@ def reload_conf(url, protocol): "goalstate"] >= 10 and mock_wire_data.call_counts["goalstate"] < 15: # Ensure we didn't try to download any agents except during the incarnation change - self.__assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + self.__assert_agent_directories_available(versions=[str(CURRENT_VERSION)]) - # Update the requested version to "99999.0.0.0" - update_handler._protocol.mock_wire_data.set_extension_config_requested_version("99999.0.0.0") + # Update the rsm version to "99999.0.0.0" + update_handler._protocol.mock_wire_data.set_version_in_agent_family("99999.0.0.0") reload_conf.call_count += 1 self._add_write_permission_to_goal_state_files() reload_conf.incarnation += 1 @@ -2081,25 +1669,23 @@ def reload_conf(url, protocol): reload_conf.call_count = 0 reload_conf.incarnation = 2 - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" - with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf, - normal_frequency=0.01, hotfix_frequency=0.01) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler._protocol.mock_wire_data.set_extension_config_requested_version(str(CURRENT_VERSION)) - update_handler._protocol.mock_wire_data.set_incarnation(2) - update_handler.run(debug=True) + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf) as (update_handler, mock_telemetry): + update_handler._protocol.mock_wire_data.set_version_in_agent_family(str(CURRENT_VERSION)) + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) self.assertGreaterEqual(reload_conf.call_count, 1, "Reload conf not updated as expected") self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted_for_requested_version(mock_telemetry) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="99999.0.0.0") self.__assert_agent_directories_exist_and_others_dont_exist(versions=["99999.0.0.0", str(CURRENT_VERSION)]) self.assertEqual(update_handler._protocol.mock_wire_data.call_counts['agentArtifact'], 1, "only 1 agent should've been downloaded - 1 per incarnation") - self.assertEqual(update_handler._protocol.mock_wire_data.call_counts["manifest_of_ga.xml"], 1, + self.assertGreaterEqual(update_handler._protocol.mock_wire_data.call_counts["manifest_of_ga.xml"], 1, "only 1 agent manifest call should've been made - 1 per incarnation") - def test_it_should_fallback_to_old_update_logic_if_requested_version_not_available(self): + def test_it_should_update_to_largest_version_if_rsm_version_not_available(self): no_of_iterations = 100 # Set the test environment by adding 20 random agents to the agent directory @@ -2114,12 +1700,12 @@ def reload_conf(url, protocol): "goalstate"] >= 5: reload_conf.call_count += 1 - # By this point, the GS with requested version should've been executed. Verify that - self.__assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + # By this point, the GS with rsm version should've been executed. Verify that + self.__assert_agent_directories_available(versions=[str(CURRENT_VERSION)]) - # Update the ext-conf and incarnation and remove requested versions from GS, - # this should download all versions requested in config - mock_wire_data.data_files["ext_conf"] = "wire/ext_conf.xml" + # Update the ga_manifest and incarnation to send largest version manifest + # this should download largest version requested in config + mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" mock_wire_data.reload() self._add_write_permission_to_goal_state_files() reload_conf.incarnation += 1 @@ -2128,54 +1714,130 @@ def reload_conf(url, protocol): reload_conf.call_count = 0 reload_conf.incarnation = 2 - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf.xml" + data_file["ga_manifest"] = "wire/ga_manifest_no_upgrade.xml" + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf) as (update_handler, mock_telemetry): + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) + + self.assertGreater(reload_conf.call_count, 0, "Reload conf not updated") + self.__assert_exit_code_successful(update_handler) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="99999.0.0.0") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["99999.0.0.0", str(CURRENT_VERSION)]) + + def test_it_should_not_update_largest_version_if_time_window_not_elapsed(self): + no_of_iterations = 20 + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + def reload_conf(url, protocol): + mock_wire_data = protocol.mock_wire_data + + # This function reloads the conf mid-run to mimic an actual customer scenario + if HttpRequestPredicates.is_goal_state_request(url) and mock_wire_data.call_counts[ + "goalstate"] >= 5: + reload_conf.call_count += 1 + + self.__assert_agent_directories_available(versions=[str(CURRENT_VERSION)]) + + # Update the ga_manifest and incarnation to send largest version manifest + mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" + mock_wire_data.reload() + self._add_write_permission_to_goal_state_files() + reload_conf.incarnation += 1 + mock_wire_data.set_incarnation(reload_conf.incarnation) + + reload_conf.call_count = 0 + reload_conf.incarnation = 2 + + data_file = wire_protocol_data.DATA_FILE.copy() + # This is to fail the agent update at first attempt so that agent doesn't go through update + data_file["ga_manifest"] = "wire/ga_manifest_no_uris.xml" with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf, - normal_frequency=0.001) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler._protocol.mock_wire_data.set_extension_config_requested_version(str(CURRENT_VERSION)) - update_handler._protocol.mock_wire_data.set_incarnation(2) - update_handler.run(debug=True) + hotfix_frequency=10, normal_frequency=10) as (update_handler, _): + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) self.assertGreater(reload_conf.call_count, 0, "Reload conf not updated") self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted(mock_telemetry) - self.__assert_agent_directories_exist_and_others_dont_exist( - versions=["1.0.0", "1.1.0", "1.2.0", "2.0.0", "2.1.0", "9.9.9.10", "99999.0.0.0", str(CURRENT_VERSION)]) + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") - def test_it_should_not_download_anything_if_requested_version_is_current_version_and_delete_all_agents(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" + def test_it_should_update_largest_version_if_time_window_elapsed(self): + no_of_iterations = 20 # Set the test environment by adding 20 random agents to the agent directory self.prepare_agents() self.assertEqual(20, self.agent_count(), "Agent directories not set properly") - with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler._protocol.mock_wire_data.set_extension_config_requested_version(str(CURRENT_VERSION)) - update_handler._protocol.mock_wire_data.set_incarnation(2) - update_handler.run(debug=True) + def reload_conf(url, protocol): + mock_wire_data = protocol.mock_wire_data + + # This function reloads the conf mid-run to mimic an actual customer scenario + if HttpRequestPredicates.is_goal_state_request(url) and mock_wire_data.call_counts[ + "goalstate"] >= 5: + reload_conf.call_count += 1 + + self.__assert_agent_directories_available(versions=[str(CURRENT_VERSION)]) + + # Update the ga_manifest and incarnation to send largest version manifest + mock_wire_data.data_files["ga_manifest"] = "wire/ga_manifest.xml" + mock_wire_data.reload() + self._add_write_permission_to_goal_state_files() + reload_conf.incarnation += 1 + mock_wire_data.set_incarnation(reload_conf.incarnation) + + reload_conf.call_count = 0 + reload_conf.incarnation = 2 + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ga_manifest"] = "wire/ga_manifest_no_uris.xml" + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf, + hotfix_frequency=0.001, normal_frequency=0.001) as (update_handler, mock_telemetry): + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) + + self.assertGreater(reload_conf.call_count, 0, "Reload conf not updated") + self.__assert_exit_code_successful(update_handler) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="99999.0.0.0") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["99999.0.0.0", str(CURRENT_VERSION)]) + + def test_it_should_not_download_anything_if_rsm_version_is_current_version(self): + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" + + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + + with self.__get_update_handler(test_data=data_file) as (update_handler, _): + update_handler._protocol.mock_wire_data.set_version_in_agent_family(str(CURRENT_VERSION)) + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) self.__assert_exit_code_successful(update_handler) - self.__assert_no_agent_upgrade_telemetry(mock_telemetry) - self.__assert_agent_directories_exist_and_others_dont_exist(versions=[str(CURRENT_VERSION)]) + self.assertFalse(os.path.exists(self.agent_dir("99999.0.0.0")), + "New agent directory should not be found") - def test_it_should_skip_wait_to_update_if_requested_version_available(self): + def test_it_should_skip_wait_to_update_immediately_if_rsm_version_available(self): no_of_iterations = 100 def reload_conf(url, protocol): mock_wire_data = protocol.mock_wire_data # This function reloads the conf mid-run to mimic an actual customer scenario + # Setting the rsm request to be sent after some iterations if HttpRequestPredicates.is_goal_state_request(url) and mock_wire_data.call_counts["goalstate"] >= 5: reload_conf.call_count += 1 # Assert GA version from status to ensure agent is running fine from the current version self.__assert_ga_version_in_status(protocol.aggregate_status) - # Update the ext-conf and incarnation and add requested version from GS - mock_wire_data.data_files["ext_conf"] = "wire/ext_conf_requested_version.xml" + # Update the ext-conf and incarnation and add rsm version from GS + mock_wire_data.data_files["ext_conf"] = "wire/ext_conf_rsm_version.xml" data_file['ga_manifest'] = "wire/ga_manifest.xml" mock_wire_data.reload() self._add_write_permission_to_goal_state_files() @@ -2183,77 +1845,81 @@ def reload_conf(url, protocol): reload_conf.call_count = 0 - data_file = mockwiredata.DATA_FILE.copy() + data_file = wire_protocol_data.DATA_FILE.copy() data_file['ga_manifest'] = "wire/ga_manifest_no_upgrade.xml" - with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf, - normal_frequency=10, hotfix_frequency=10) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler.run(debug=True) + # Setting the prod frequency to mimic a real scenario + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf, autoupdate_frequency=6000) as (update_handler, mock_telemetry): + update_handler._protocol.mock_wire_data.set_version_in_ga_manifest(str(CURRENT_VERSION)) + update_handler._protocol.mock_wire_data.set_incarnation(20) + update_handler.run(debug=True) self.assertGreater(reload_conf.call_count, 0, "Reload conf not updated") self.assertLess(update_handler.get_iterations(), no_of_iterations, - "The code should've exited as soon as requested version was found") + "The code should've exited as soon as rsm version was found") self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted_for_requested_version(mock_telemetry, version="9.9.9.10") + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="9.9.9.10") - def test_it_should_blacklist_current_agent_on_downgrade(self): + def test_it_should_mark_current_agent_as_bad_version_on_downgrade(self): # Create Agent directory for current agent self.prepare_agents(count=1) self.assertTrue(os.path.exists(self.agent_dir(CURRENT_VERSION))) self.assertFalse(next(agent for agent in self.agents() if agent.version == CURRENT_VERSION).is_blacklisted, "The current agent should not be blacklisted") - downgraded_version = "1.2.0" + downgraded_version = "2.5.0" - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" + data_file = wire_protocol_data.DATA_FILE.copy() + data_file["ext_conf"] = "wire/ext_conf_rsm_version.xml" with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler._protocol.mock_wire_data.set_extension_config_requested_version(downgraded_version) - update_handler._protocol.mock_wire_data.set_incarnation(2) - try: - set_daemon_version("1.0.0.0") - update_handler.run(debug=True) - finally: - os.environ.pop(DAEMON_VERSION_ENV_VARIABLE) + update_handler._protocol.mock_wire_data.set_version_in_agent_family(downgraded_version) + update_handler._protocol.mock_wire_data.set_incarnation(2) + update_handler.run(debug=True) self.__assert_exit_code_successful(update_handler) - self.__assert_upgrade_telemetry_emitted_for_requested_version(mock_telemetry, upgrade=False, + self.__assert_upgrade_telemetry_emitted(mock_telemetry, upgrade=False, version=downgraded_version) current_agent = next(agent for agent in self.agents() if agent.version == CURRENT_VERSION) self.assertTrue(current_agent.is_blacklisted, "The current agent should be blacklisted") - self.assertEqual(current_agent.error.reason, "Blacklisting the agent {0} since a downgrade was requested in the GoalState, " + self.assertEqual(current_agent.error.reason, "Marking the agent {0} as bad version since a downgrade was requested in the GoalState, " "suggesting that we really don't want to execute any extensions using this version".format(CURRENT_VERSION), "Invalid reason specified for blacklisting agent") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=[downgraded_version, str(CURRENT_VERSION)]) - def test_it_should_not_downgrade_below_daemon_version(self): - data_file = mockwiredata.DATA_FILE.copy() - data_file["ext_conf"] = "wire/ext_conf_requested_version.xml" - with self.__get_update_handler(test_data=data_file) as (update_handler, mock_telemetry): - with patch.object(conf, "get_enable_ga_versioning", return_value=True): - update_handler._protocol.mock_wire_data.set_extension_config_requested_version("1.0.0.0") - update_handler._protocol.mock_wire_data.set_incarnation(2) + def test_it_should_do_self_update_if_vm_opt_out_rsm_upgrades_later(self): + no_of_iterations = 100 - try: - set_daemon_version("1.2.3.4") - update_handler.run(debug=True) - finally: - os.environ.pop(DAEMON_VERSION_ENV_VARIABLE) + # Set the test environment by adding 20 random agents to the agent directory + self.prepare_agents() + self.assertEqual(20, self.agent_count(), "Agent directories not set properly") + def reload_conf(url, protocol): + mock_wire_data = protocol.mock_wire_data + + # This function reloads the conf mid-run to mimic an actual customer scenario + if HttpRequestPredicates.is_goal_state_request(url) and mock_wire_data.call_counts["goalstate"] >= 5: + reload_conf.call_count += 1 + + # Assert GA version from status to ensure agent is running fine from the current version + self.__assert_ga_version_in_status(protocol.aggregate_status) + + # Update is_vm_enabled_for_rsm_upgrades flag to False + update_handler._protocol.mock_wire_data.set_extension_config_is_vm_enabled_for_rsm_upgrades("False") + self._add_write_permission_to_goal_state_files() + mock_wire_data.set_incarnation(2) + + reload_conf.call_count = 0 + + data_file = wire_protocol_data.DATA_FILE.copy() + data_file['ext_conf'] = "wire/ext_conf_rsm_version.xml" + with self.__get_update_handler(iterations=no_of_iterations, test_data=data_file, reload_conf=reload_conf) as (update_handler, mock_telemetry): + update_handler._protocol.mock_wire_data.set_version_in_agent_family(str(CURRENT_VERSION)) + update_handler._protocol.mock_wire_data.set_incarnation(20) + update_handler.run(debug=True) + self.assertGreater(reload_conf.call_count, 0, "Reload conf not updated") + self.assertLess(update_handler.get_iterations(), no_of_iterations, + "The code should've exited as soon as version was found") self.__assert_exit_code_successful(update_handler) - upgrade_msgs = [kwarg for _, kwarg in mock_telemetry.call_args_list if - kwarg['op'] == WALAEventOperation.AgentUpgrade] - # This will throw if corresponding message not found so not asserting on that - requested_version_found = next(kwarg for kwarg in upgrade_msgs if - "Found requested version in manifest: 1.0.0.0 for goal state incarnation_2" in kwarg[ - 'message']) - self.assertTrue(requested_version_found['is_success'], - "The requested version found op should be reported as a success") - - skipping_update = next(kwarg for kwarg in upgrade_msgs if - "Can't process the upgrade as the requested version: 1.0.0.0 is < current daemon version: 1.2.3.4" in - kwarg['message']) - self.assertFalse(skipping_update['is_success'], "Failed Event should be reported as a failure") - self.__assert_ga_version_in_status(update_handler._protocol.aggregate_status) + self.__assert_upgrade_telemetry_emitted(mock_telemetry, version="99999.0.0.0") + self.__assert_agent_directories_exist_and_others_dont_exist(versions=["99999.0.0.0", str(CURRENT_VERSION)]) @patch('azurelinuxagent.ga.update.get_collect_telemetry_events_handler') @@ -2287,12 +1953,13 @@ def iterator(*_, **__): mock_is_running.__get__ = Mock(side_effect=iterator) with patch('azurelinuxagent.ga.exthandlers.get_exthandlers_handler'): with patch('azurelinuxagent.ga.remoteaccess.get_remote_access_handler'): - with patch('azurelinuxagent.ga.update.initialize_event_logger_vminfo_common_parameters'): - with patch('azurelinuxagent.common.cgroupapi.CGroupsApi.cgroups_supported', return_value=False): # skip all cgroup stuff - with patch('azurelinuxagent.ga.update.is_log_collection_allowed', return_value=True): - with patch('time.sleep'): - with patch('sys.exit'): - self.update_handler.run() + with patch('azurelinuxagent.ga.agent_update_handler.get_agent_update_handler'): + with patch('azurelinuxagent.ga.update.initialize_event_logger_vminfo_common_parameters'): + with patch('azurelinuxagent.ga.cgroupapi.CGroupsApi.cgroups_supported', return_value=False): # skip all cgroup stuff + with patch('azurelinuxagent.ga.update.is_log_collection_allowed', return_value=True): + with patch('time.sleep'): + with patch('sys.exit'): + self.update_handler.run() def _setup_mock_thread_and_start_test_run(self, mock_thread, is_alive=True, invocations=0): thread = MagicMock() @@ -2460,11 +2127,11 @@ class TryUpdateGoalStateTestCase(HttpRequestPredicates, AgentTestCase): """ def test_it_should_return_true_on_success(self): update_handler = get_update_handler() - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: self.assertTrue(update_handler._try_update_goal_state(protocol), "try_update_goal_state should have succeeded") def test_it_should_return_false_on_failure(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: def http_get_handler(url, *_, **__): if self.is_goal_state_request(url): return HttpError('Exception to fake an error retrieving the goal state') @@ -2476,7 +2143,7 @@ def http_get_handler(url, *_, **__): def test_it_should_update_the_goal_state(self): update_handler = get_update_handler() - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: protocol.mock_wire_data.set_incarnation(12345) # the first goal state should produce an update @@ -2493,7 +2160,7 @@ def test_it_should_update_the_goal_state(self): self.assertEqual(update_handler._goal_state.incarnation, '6789', "The goal state was not updated (received unexpected incarnation)") def test_it_should_log_errors_only_when_the_error_state_changes(self): - with mock_wire_protocol(mockwiredata.DATA_FILE) as protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as protocol: def http_get_handler(url, *_, **__): if self.is_goal_state_request(url): if fail_goal_state_request: @@ -2587,7 +2254,7 @@ def _create_update_handler(): @contextlib.contextmanager -def _mock_exthandlers_handler(extension_statuses=None): +def _mock_exthandlers_handler(extension_statuses=None, save_to_history=False): """ Creates an ExtHandlersHandler that doesn't actually handle any extensions, but that returns status for 1 extension. The returned ExtHandlersHandler uses a mock WireProtocol, and both the run() and report_ext_handlers_status() are @@ -2602,14 +2269,13 @@ def create_vm_status(extension_status): vm_status.vmAgent.extensionHandlers[0].extension_status.status = extension_status return vm_status - with mock_wire_protocol(DATA_FILE) as protocol: + with mock_wire_protocol(DATA_FILE, save_to_history=save_to_history) as protocol: exthandlers_handler = ExtHandlersHandler(protocol) exthandlers_handler.run = Mock() if extension_statuses is None: exthandlers_handler.report_ext_handlers_status = Mock(return_value=create_vm_status(ExtensionStatusValue.success)) else: exthandlers_handler.report_ext_handlers_status = Mock(side_effect=[create_vm_status(s) for s in extension_statuses]) - exthandlers_handler.get_ext_handlers_status_debug_info = Mock(return_value='') yield exthandlers_handler @@ -2622,34 +2288,41 @@ def test_it_should_process_goal_state_only_on_new_goal_state(self): update_handler = _create_update_handler() remote_access_handler = Mock() remote_access_handler.run = Mock() + agent_update_handler = Mock() + agent_update_handler.run = Mock() # process a goal state - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(1, exthandlers_handler.run.call_count, "exthandlers_handler.run() should have been called on the first goal state") self.assertEqual(1, exthandlers_handler.report_ext_handlers_status.call_count, "exthandlers_handler.report_ext_handlers_status() should have been called on the first goal state") self.assertEqual(1, remote_access_handler.run.call_count, "remote_access_handler.run() should have been called on the first goal state") + self.assertEqual(1, agent_update_handler.run.call_count, "agent_update_handler.run() should have been called on the first goal state") # process the same goal state - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(1, exthandlers_handler.run.call_count, "exthandlers_handler.run() should have not been called on the same goal state") self.assertEqual(2, exthandlers_handler.report_ext_handlers_status.call_count, "exthandlers_handler.report_ext_handlers_status() should have been called on the same goal state") self.assertEqual(1, remote_access_handler.run.call_count, "remote_access_handler.run() should not have been called on the same goal state") + self.assertEqual(2, agent_update_handler.run.call_count, "agent_update_handler.run() should have been called on the same goal state") # process a new goal state exthandlers_handler.protocol.mock_wire_data.set_incarnation(999) exthandlers_handler.protocol.client.update_goal_state() - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(2, exthandlers_handler.run.call_count, "exthandlers_handler.run() should have been called on a new goal state") self.assertEqual(3, exthandlers_handler.report_ext_handlers_status.call_count, "exthandlers_handler.report_ext_handlers_status() should have been called on a new goal state") self.assertEqual(2, remote_access_handler.run.call_count, "remote_access_handler.run() should have been called on a new goal state") + self.assertEqual(3, agent_update_handler.run.call_count, "agent_update_handler.run() should have been called on the new goal state") def test_it_should_write_the_agent_status_to_the_history_folder(self): - with _mock_exthandlers_handler() as exthandlers_handler: + with _mock_exthandlers_handler(save_to_history=True) as exthandlers_handler: update_handler = _create_update_handler() remote_access_handler = Mock() remote_access_handler.run = Mock() + agent_update_handler = Mock() + agent_update_handler.run = Mock() - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) incarnation = exthandlers_handler.protocol.get_goal_state().incarnation matches = glob.glob(os.path.join(conf.get_lib_dir(), ARCHIVE_DIRECTORY_NAME, "*_{0}".format(incarnation))) @@ -2665,7 +2338,7 @@ def _prepare_fast_track_goal_state(): invokes HostPluginProtocol.fetch_vm_settings() to save the Fast Track status to disk """ # Do a query for the vmSettings; this would retrieve a FastTrack goal state and keep track of its timestamp - mock_wire_data_file = mockwiredata.DATA_FILE_VM_SETTINGS.copy() + mock_wire_data_file = wire_protocol_data.DATA_FILE_VM_SETTINGS.copy() with mock_wire_protocol(mock_wire_data_file) as protocol: protocol.mock_wire_data.set_etag("0123456789") _ = protocol.client.get_host_plugin().fetch_vm_settings() @@ -2703,9 +2376,11 @@ def test_it_should_clear_the_timestamp_for_the_most_recent_fast_track_goal_state raise Exception("The test setup did not save the Fast Track state") with patch("azurelinuxagent.common.conf.get_enable_fast_track", return_value=False): - with mock_wire_protocol(data_file) as protocol: - with mock_update_handler(protocol) as update_handler: - update_handler.run() + with patch("azurelinuxagent.common.version.get_daemon_version", + return_value=FlexibleVersion("2.2.53")): + with mock_wire_protocol(data_file) as protocol: + with mock_update_handler(protocol) as update_handler: + update_handler.run() self.assertEqual(HostPluginProtocol.get_fast_track_timestamp(), timeutil.create_timestamp(datetime.min), "The Fast Track state was not cleared") @@ -2763,7 +2438,7 @@ class HeartbeatTestCase(AgentTestCase): @patch("azurelinuxagent.ga.update.add_event") def test_telemetry_heartbeat_creates_event(self, patch_add_event, patch_info, *_): - with mock_wire_protocol(mockwiredata.DATA_FILE) as mock_protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as mock_protocol: update_handler = get_update_handler() update_handler.last_telemetry_heartbeat = datetime.utcnow() - timedelta(hours=1) @@ -2778,11 +2453,11 @@ class AgentMemoryCheckTestCase(AgentTestCase): @patch("azurelinuxagent.common.logger.info") @patch("azurelinuxagent.ga.update.add_event") def test_check_agent_memory_usage_raises_exit_exception(self, patch_add_event, patch_info, *_): - with patch("azurelinuxagent.common.cgroupconfigurator.CGroupConfigurator._Impl.check_agent_memory_usage", side_effect=AgentMemoryExceededException()): + with patch("azurelinuxagent.ga.cgroupconfigurator.CGroupConfigurator._Impl.check_agent_memory_usage", side_effect=AgentMemoryExceededException()): with patch('azurelinuxagent.common.conf.get_enable_agent_memory_usage_check', return_value=True): with self.assertRaises(ExitException) as context_manager: update_handler = get_update_handler() - + update_handler._last_check_memory_usage_time = time.time() - 24 * 60 update_handler._check_agent_memory_usage() self.assertEqual(1, patch_add_event.call_count) self.assertTrue(any("Check on agent memory usage" in call_args[0] @@ -2794,10 +2469,10 @@ def test_check_agent_memory_usage_raises_exit_exception(self, patch_add_event, p @patch("azurelinuxagent.common.logger.warn") @patch("azurelinuxagent.ga.update.add_event") def test_check_agent_memory_usage_fails(self, patch_add_event, patch_warn, *_): - with patch("azurelinuxagent.common.cgroupconfigurator.CGroupConfigurator._Impl.check_agent_memory_usage", side_effect=Exception()): + with patch("azurelinuxagent.ga.cgroupconfigurator.CGroupConfigurator._Impl.check_agent_memory_usage", side_effect=Exception()): with patch('azurelinuxagent.common.conf.get_enable_agent_memory_usage_check', return_value=True): update_handler = get_update_handler() - + update_handler._last_check_memory_usage_time = time.time() - 24 * 60 update_handler._check_agent_memory_usage() self.assertTrue(any("Error checking the agent's memory usage" in call_args[0] for call_args in patch_warn.call_args), @@ -2813,6 +2488,15 @@ def test_check_agent_memory_usage_fails(self, patch_add_event, patch_warn, *_): add_events[0]["message"], "The error message is not correct when memory usage check failed") + @patch("azurelinuxagent.ga.cgroupconfigurator.CGroupConfigurator._Impl.check_agent_memory_usage") + @patch("azurelinuxagent.ga.update.add_event") + def test_check_agent_memory_usage_not_called(self, patch_add_event, patch_memory_usage, *_): + # This test ensures that agent not called immediately on startup, instead waits for CHILD_LAUNCH_INTERVAL + with patch('azurelinuxagent.common.conf.get_enable_agent_memory_usage_check', return_value=True): + update_handler = get_update_handler() + update_handler._check_agent_memory_usage() + self.assertEqual(0, patch_memory_usage.call_count) + self.assertEqual(0, patch_add_event.call_count) class GoalStateIntervalTestCase(AgentTestCase): def test_initial_goal_state_period_should_default_to_goal_state_period(self): @@ -2850,16 +2534,17 @@ def test_update_handler_should_use_the_initial_goal_state_period_until_the_goal_ with patch('azurelinuxagent.common.conf.get_goal_state_period', return_value=goal_state_period): with _mock_exthandlers_handler([ExtensionStatusValue.transitioning, ExtensionStatusValue.success]) as exthandlers_handler: remote_access_handler = Mock() + agent_update_handler = Mock() update_handler = _create_update_handler() self.assertEqual(initial_goal_state_period, update_handler._goal_state_period, "Expected the initial goal state period") # the extension is transisioning, so we should still be using the initial goal state period - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(initial_goal_state_period, update_handler._goal_state_period, "Expected the initial goal state period when the extension is transitioning") # the goal state converged (the extension succeeded), so we should switch to the regular goal state period - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(goal_state_period, update_handler._goal_state_period, "Expected the regular goal state period after the goal state converged") def test_update_handler_should_switch_to_the_regular_goal_state_period_when_the_goal_state_does_not_converges(self): @@ -2868,17 +2553,18 @@ def test_update_handler_should_switch_to_the_regular_goal_state_period_when_the_ with patch('azurelinuxagent.common.conf.get_goal_state_period', return_value=goal_state_period): with _mock_exthandlers_handler([ExtensionStatusValue.transitioning, ExtensionStatusValue.transitioning]) as exthandlers_handler: remote_access_handler = Mock() + agent_update_handler = Mock() update_handler = _create_update_handler() self.assertEqual(initial_goal_state_period, update_handler._goal_state_period, "Expected the initial goal state period") # the extension is transisioning, so we should still be using the initial goal state period - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(initial_goal_state_period, update_handler._goal_state_period, "Expected the initial goal state period when the extension is transitioning") # a new goal state arrives before the current goal state converged (the extension is transitioning), so we should switch to the regular goal state period exthandlers_handler.protocol.mock_wire_data.set_incarnation(100) - update_handler._process_goal_state(exthandlers_handler, remote_access_handler) + update_handler._process_goal_state(exthandlers_handler, remote_access_handler, agent_update_handler) self.assertEqual(goal_state_period, update_handler._goal_state_period, "Expected the regular goal state period when the goal state does not converge") @@ -2931,4 +2617,4 @@ def test_inequality_operator_should_return_false_on_items_with_same_value(self): if __name__ == '__main__': - unittest.main() + unittest.main() \ No newline at end of file diff --git a/tests/utils/__init__.py b/tests/lib/__init__.py similarity index 100% rename from tests/utils/__init__.py rename to tests/lib/__init__.py diff --git a/tests/utils/cgroups_tools.py b/tests/lib/cgroups_tools.py similarity index 100% rename from tests/utils/cgroups_tools.py rename to tests/lib/cgroups_tools.py diff --git a/tests/utils/event_logger_tools.py b/tests/lib/event_logger_tools.py similarity index 89% rename from tests/utils/event_logger_tools.py rename to tests/lib/event_logger_tools.py index 626d71d9e..5150cebd5 100644 --- a/tests/utils/event_logger_tools.py +++ b/tests/lib/event_logger_tools.py @@ -19,9 +19,9 @@ import platform import azurelinuxagent.common.event as event from azurelinuxagent.common.version import DISTRO_NAME, DISTRO_VERSION, DISTRO_CODE_NAME -import tests.tools as tools -from tests.protocol import mockwiredata -from tests.protocol.mocks import mock_wire_protocol +import tests.lib.tools as tools +from tests.lib import wire_protocol_data +from tests.lib.mock_wire_protocol import mock_wire_protocol class EventLoggerTools(object): @@ -37,7 +37,7 @@ class EventLoggerTools(object): def initialize_event_logger(event_dir): """ Initializes the event logger using mock data for the common parameters; the goal state fields are taken - from mockwiredata.DATA_FILE and the IMDS fields from mock_imds_data. + from wire_protocol_data.DATA_FILE and the IMDS fields from mock_imds_data. """ if not os.path.exists(event_dir): os.mkdir(event_dir) @@ -53,7 +53,7 @@ def initialize_event_logger(event_dir): mock_imds_client = tools.Mock() mock_imds_client.get_compute = tools.Mock(return_value=mock_imds_info) - with mock_wire_protocol(mockwiredata.DATA_FILE) as mock_protocol: + with mock_wire_protocol(wire_protocol_data.DATA_FILE) as mock_protocol: with tools.patch("azurelinuxagent.common.event.get_imds_client", return_value=mock_imds_client): event.initialize_event_logger_vminfo_common_parameters(mock_protocol) diff --git a/tests/ga/extension_emulator.py b/tests/lib/extension_emulator.py similarity index 98% rename from tests/ga/extension_emulator.py rename to tests/lib/extension_emulator.py index dafd365df..5b4f69d14 100644 --- a/tests/ga/extension_emulator.py +++ b/tests/lib/extension_emulator.py @@ -27,10 +27,10 @@ from azurelinuxagent.common.utils import fileutil from azurelinuxagent.ga.exthandlers import ExtHandlerInstance, ExtCommandEnvVariable -from tests.tools import Mock, patch -from tests.protocol.mockwiredata import WireProtocolData -from tests.protocol.mocks import MockHttpResponse -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates +from tests.lib.tools import Mock, patch +from tests.lib.wire_protocol_data import WireProtocolData +from tests.lib.mock_wire_protocol import MockHttpResponse +from tests.lib.http_request_predicates import HttpRequestPredicates class ExtensionCommandNames(object): @@ -107,7 +107,7 @@ def enable_invocations(*emulators): def generate_put_handler(*emulators): """ Create a HTTP handler to store status blobs for each provided emulator. - For use with tests.protocol.mocks.mock_wire_protocol. + For use with tests.lib.mocks.mock_wire_protocol. """ def mock_put_handler(url, *args, **_): diff --git a/tests/protocol/HttpRequestPredicates.py b/tests/lib/http_request_predicates.py similarity index 100% rename from tests/protocol/HttpRequestPredicates.py rename to tests/lib/http_request_predicates.py diff --git a/tests/utils/miscellaneous_tools.py b/tests/lib/miscellaneous_tools.py similarity index 100% rename from tests/utils/miscellaneous_tools.py rename to tests/lib/miscellaneous_tools.py diff --git a/tests/common/mock_cgroup_environment.py b/tests/lib/mock_cgroup_environment.py similarity index 96% rename from tests/common/mock_cgroup_environment.py rename to tests/lib/mock_cgroup_environment.py index e38471060..3b51dce8f 100644 --- a/tests/common/mock_cgroup_environment.py +++ b/tests/lib/mock_cgroup_environment.py @@ -17,8 +17,8 @@ # import contextlib import os -from tests.tools import patch, data_dir -from tests.common.mock_environment import MockEnvironment, MockCommand +from tests.lib.tools import patch, data_dir +from tests.lib.mock_environment import MockEnvironment, MockCommand _MOCKED_COMMANDS = [ MockCommand(r"^systemctl --version$", @@ -104,6 +104,7 @@ class UnitFilePaths: extension_service_memory_accounting = '/lib/systemd/system/extension.service.d/13-MemoryAccounting.conf' extension_service_memory_limit = '/lib/systemd/system/extension.service.d/14-MemoryLimit.conf' + @contextlib.contextmanager def mock_cgroup_environment(tmp_dir): """ @@ -116,7 +117,7 @@ def mock_cgroup_environment(tmp_dir): (os.path.join(data_dir, 'init', 'azure-vmextensions.slice'), UnitFilePaths.vmextensions) ] - with patch('azurelinuxagent.common.cgroupapi.CGroupsApi.cgroups_supported', return_value=True): + with patch('azurelinuxagent.ga.cgroupapi.CGroupsApi.cgroups_supported', return_value=True): with patch('azurelinuxagent.common.osutil.systemd.is_systemd', return_value=True): with MockEnvironment(tmp_dir, commands=_MOCKED_COMMANDS, paths=_MOCKED_PATHS, files=_MOCKED_FILES, data_files=data_files) as mock: yield mock diff --git a/tests/common/mock_command.py b/tests/lib/mock_command.py similarity index 100% rename from tests/common/mock_command.py rename to tests/lib/mock_command.py diff --git a/tests/common/mock_environment.py b/tests/lib/mock_environment.py similarity index 99% rename from tests/common/mock_environment.py rename to tests/lib/mock_environment.py index bedce0900..8f5682cf8 100644 --- a/tests/common/mock_environment.py +++ b/tests/lib/mock_environment.py @@ -22,7 +22,7 @@ from azurelinuxagent.common.future import ustr from azurelinuxagent.common.utils import fileutil -from tests.tools import patch, patch_builtin +from tests.lib.tools import patch, patch_builtin class MockCommand: diff --git a/tests/ga/mocks.py b/tests/lib/mock_update_handler.py similarity index 58% rename from tests/ga/mocks.py rename to tests/lib/mock_update_handler.py index 6fbc63d7d..03d7a4452 100644 --- a/tests/ga/mocks.py +++ b/tests/lib/mock_update_handler.py @@ -18,10 +18,12 @@ import contextlib from mock import PropertyMock + +from azurelinuxagent.ga.agent_update_handler import AgentUpdateHandler from azurelinuxagent.ga.exthandlers import ExtHandlersHandler from azurelinuxagent.ga.remoteaccess import RemoteAccessHandler from azurelinuxagent.ga.update import UpdateHandler, get_update_handler -from tests.tools import patch, Mock, mock_sleep +from tests.lib.tools import patch, Mock, mock_sleep @contextlib.contextmanager @@ -30,6 +32,7 @@ def mock_update_handler(protocol, on_new_iteration=lambda _: None, exthandlers_handler=None, remote_access_handler=None, + agent_update_handler=None, autoupdate_enabled=False, check_daemon_running=False, start_background_threads=False, @@ -71,6 +74,9 @@ def is_running(*args): # mock for property UpdateHandler.is_running, which cont if remote_access_handler is None: remote_access_handler = RemoteAccessHandler(protocol) + if agent_update_handler is None: + agent_update_handler = AgentUpdateHandler(protocol) + cleanup_functions = [] def patch_object(target, attribute): @@ -80,39 +86,40 @@ def patch_object(target, attribute): try: with patch("azurelinuxagent.ga.exthandlers.get_exthandlers_handler", return_value=exthandlers_handler): - with patch("azurelinuxagent.ga.remoteaccess.get_remote_access_handler", return_value=remote_access_handler): - with patch("azurelinuxagent.common.conf.get_autoupdate_enabled", return_value=autoupdate_enabled): - with patch.object(UpdateHandler, "is_running", PropertyMock(side_effect=is_running)): - with patch('azurelinuxagent.ga.update.time.sleep', side_effect=lambda _: mock_sleep(0.001)) as sleep: - with patch('sys.exit', side_effect=lambda _: 0) as mock_exit: - if not check_daemon_running: - patch_object(UpdateHandler, "_check_daemon_running") - if not start_background_threads: - patch_object(UpdateHandler, "_start_threads") - if not check_background_threads: - patch_object(UpdateHandler, "_check_threads_running") - - def get_exit_code(): - if mock_exit.call_count == 0: - raise Exception("The UpdateHandler did not exit") - if mock_exit.call_count != 1: - raise Exception("The UpdateHandler exited multiple times ({0})".format(mock_exit.call_count)) - args, _ = mock_exit.call_args - return args[0] - - def get_iterations(): - return iteration_count[0] - - def get_iterations_completed(): - return sleep.call_count - - update_handler = get_update_handler() - update_handler.protocol_util.get_protocol = Mock(return_value=protocol) - update_handler.get_exit_code = get_exit_code - update_handler.get_iterations = get_iterations - update_handler.get_iterations_completed = get_iterations_completed - - yield update_handler + with patch("azurelinuxagent.ga.update.get_agent_update_handler", return_value=agent_update_handler): + with patch("azurelinuxagent.ga.remoteaccess.get_remote_access_handler", return_value=remote_access_handler): + with patch("azurelinuxagent.ga.update.conf.get_autoupdate_enabled", return_value=autoupdate_enabled): + with patch.object(UpdateHandler, "is_running", PropertyMock(side_effect=is_running)): + with patch('azurelinuxagent.ga.update.time.sleep', side_effect=lambda _: mock_sleep(0.001)) as sleep: + with patch('sys.exit', side_effect=lambda _: 0) as mock_exit: + if not check_daemon_running: + patch_object(UpdateHandler, "_check_daemon_running") + if not start_background_threads: + patch_object(UpdateHandler, "_start_threads") + if not check_background_threads: + patch_object(UpdateHandler, "_check_threads_running") + + def get_exit_code(): + if mock_exit.call_count == 0: + raise Exception("The UpdateHandler did not exit") + if mock_exit.call_count != 1: + raise Exception("The UpdateHandler exited multiple times ({0})".format(mock_exit.call_count)) + args, _ = mock_exit.call_args + return args[0] + + def get_iterations(): + return iteration_count[0] + + def get_iterations_completed(): + return sleep.call_count + + update_handler = get_update_handler() + update_handler.protocol_util.get_protocol = Mock(return_value=protocol) + update_handler.get_exit_code = get_exit_code + update_handler.get_iterations = get_iterations + update_handler.get_iterations_completed = get_iterations_completed + + yield update_handler finally: for f in cleanup_functions: f() diff --git a/tests/protocol/mocks.py b/tests/lib/mock_wire_protocol.py similarity index 93% rename from tests/protocol/mocks.py rename to tests/lib/mock_wire_protocol.py index b74138888..78cbc59e2 100644 --- a/tests/protocol/mocks.py +++ b/tests/lib/mock_wire_protocol.py @@ -17,18 +17,18 @@ import contextlib from azurelinuxagent.common.protocol.wire import WireProtocol from azurelinuxagent.common.utils import restutil -from tests.tools import patch -from tests.protocol import mockwiredata +from tests.lib.tools import patch +from tests.lib import wire_protocol_data @contextlib.contextmanager -def mock_wire_protocol(mock_wire_data_file, http_get_handler=None, http_post_handler=None, http_put_handler=None, do_not_mock=lambda method, url: False, fail_on_unknown_request=True): +def mock_wire_protocol(mock_wire_data_file, http_get_handler=None, http_post_handler=None, http_put_handler=None, do_not_mock=lambda method, url: False, fail_on_unknown_request=True, save_to_history=False): """ Creates a WireProtocol object that handles requests to the WireServer, the Host GA Plugin, and some requests to storage (requests that provide mock data - in mockwiredata.py). + in wire_protocol_data.py). The data returned by those requests is read from the files specified by 'mock_wire_data_file' (which must follow the structure of the data - files defined in tests/protocol/mockwiredata.py). + files defined in tests/protocol/wire_protocol_data.py). The caller can also provide handler functions for specific HTTP methods using the http_*_handler arguments. The return value of the handler function is interpreted similarly to the "return_value" argument of patch(): if it is an exception the exception is raised or, if it is @@ -38,6 +38,8 @@ def mock_wire_protocol(mock_wire_data_file, http_get_handler=None, http_post_han The 'do_not_mock' lambda can be used to skip the mocks for specific requests; if the lambda returns True, the mocks won't be applied and the original common.utils.restutil.http_request will be invoked instead. + The 'save_to_history' parameter is passed thru in the call to WireProtocol.detect(). + The returned protocol object maintains a list of "tracked" urls. When a handler function returns a value than is not None the url for the request is automatically added to the tracked list. The handler function can add other items to this list using the track_url() method on the mock. @@ -135,7 +137,7 @@ def stop(): # create the protocol object # protocol = WireProtocol(restutil.KNOWN_WIRESERVER_IP) - protocol.mock_wire_data = mockwiredata.WireProtocolData(mock_wire_data_file) + protocol.mock_wire_data = wire_protocol_data.WireProtocolData(mock_wire_data_file) protocol.start = start protocol.stop = stop protocol.track_url = lambda url: tracked_urls.append(url) # pylint: disable=unnecessary-lambda @@ -147,7 +149,7 @@ def stop(): # go do it try: protocol.start() - protocol.detect() + protocol.detect(save_to_history=save_to_history) yield protocol finally: protocol.stop() diff --git a/tests/tools.py b/tests/lib/tools.py similarity index 99% rename from tests/tools.py rename to tests/lib/tools.py index 85d460d37..11bd80191 100644 --- a/tests/tools.py +++ b/tests/lib/tools.py @@ -38,6 +38,8 @@ from azurelinuxagent.common.utils import fileutil from azurelinuxagent.common.version import PY_VERSION_MAJOR +import tests + try: from unittest.mock import Mock, patch, MagicMock, ANY, DEFAULT, call, PropertyMock # pylint: disable=unused-import @@ -46,7 +48,7 @@ except ImportError: from mock import Mock, patch, MagicMock, ANY, DEFAULT, call, PropertyMock -test_dir = os.path.dirname(os.path.abspath(__file__)) +test_dir = tests.__path__[0] data_dir = os.path.join(test_dir, "data") debug = False @@ -179,7 +181,6 @@ def setUp(self): self.tmp_dir = tempfile.mkdtemp(prefix=prefix) self.test_file = 'test_file' - conf.get_autoupdate_enabled = Mock(return_value=True) conf.get_lib_dir = Mock(return_value=self.tmp_dir) ext_log_dir = os.path.join(self.tmp_dir, "azure") diff --git a/tests/protocol/mockwiredata.py b/tests/lib/wire_protocol_data.py similarity index 96% rename from tests/protocol/mockwiredata.py rename to tests/lib/wire_protocol_data.py index 196ed32db..6854bdcc5 100644 --- a/tests/protocol/mockwiredata.py +++ b/tests/lib/wire_protocol_data.py @@ -21,8 +21,8 @@ from azurelinuxagent.common.utils import timeutil from azurelinuxagent.common.utils.textutil import parse_doc, find, findall -from tests.protocol.HttpRequestPredicates import HttpRequestPredicates -from tests.tools import load_bin_data, load_data, MagicMock, Mock +from tests.lib.http_request_predicates import HttpRequestPredicates +from tests.lib.tools import load_bin_data, load_data, MagicMock, Mock from azurelinuxagent.common.protocol.imds import IMDS_ENDPOINT from azurelinuxagent.common.exception import HttpError, ResourceGoneError from azurelinuxagent.common.future import httpclient @@ -460,5 +460,14 @@ def set_manifest_version(self, version): def set_extension_config(self, ext_conf_file): self.ext_conf = load_data(ext_conf_file) - def set_extension_config_requested_version(self, version): + def set_ga_manifest(self, ga_manifest): + self.ga_manifest = load_data(ga_manifest) + + def set_version_in_agent_family(self, version): self.ext_conf = WireProtocolData.replace_xml_element_value(self.ext_conf, "Version", version) + + def set_extension_config_is_vm_enabled_for_rsm_upgrades(self, is_vm_enabled_for_rsm_upgrades): + self.ext_conf = WireProtocolData.replace_xml_element_value(self.ext_conf, "IsVMEnabledForRSMUpgrades", is_vm_enabled_for_rsm_upgrades) + + def set_version_in_ga_manifest(self, version): + self.ga_manifest = WireProtocolData.replace_xml_element_value(self.ga_manifest, "Version", version) diff --git a/tests/pa/test_deprovision.py b/tests/pa/test_deprovision.py index 8680366a3..9970a249e 100644 --- a/tests/pa/test_deprovision.py +++ b/tests/pa/test_deprovision.py @@ -23,7 +23,7 @@ from azurelinuxagent.pa.deprovision import get_deprovision_handler from azurelinuxagent.pa.deprovision.default import DeprovisionHandler -from tests.tools import AgentTestCase, distros, Mock, patch +from tests.lib.tools import AgentTestCase, distros, Mock, patch class TestDeprovision(AgentTestCase): diff --git a/tests/pa/test_provision.py b/tests/pa/test_provision.py index 59de0e97b..66c525dff 100644 --- a/tests/pa/test_provision.py +++ b/tests/pa/test_provision.py @@ -28,7 +28,7 @@ from azurelinuxagent.pa.provision.cloudinit import CloudInitProvisionHandler from azurelinuxagent.pa.provision.default import ProvisionHandler from azurelinuxagent.common.utils import fileutil -from tests.tools import AgentTestCase, distros, load_data, MagicMock, Mock, patch +from tests.lib.tools import AgentTestCase, distros, load_data, MagicMock, Mock, patch class TestProvision(AgentTestCase): diff --git a/tests/test_agent.py b/tests/test_agent.py index f0f773f05..f892f090e 100644 --- a/tests/test_agent.py +++ b/tests/test_agent.py @@ -18,15 +18,17 @@ import os.path from azurelinuxagent.agent import parse_args, Agent, usage, AgentCommands -from azurelinuxagent.common import cgroupconfigurator, conf, logcollector -from azurelinuxagent.common.cgroupapi import SystemdCgroupsApi +from azurelinuxagent.common import conf +from azurelinuxagent.ga import logcollector, cgroupconfigurator +from azurelinuxagent.ga.cgroupapi import SystemdCgroupsApi from azurelinuxagent.common.utils import fileutil from azurelinuxagent.ga.collect_logs import CollectLogsHandler -from tests.tools import AgentTestCase, data_dir, Mock, patch +from tests.lib.tools import AgentTestCase, data_dir, Mock, patch EXPECTED_CONFIGURATION = \ """AutoUpdate.Enabled = True AutoUpdate.GAFamily = Prod +AutoUpdate.UpdateToLatestVersion = True Autoupdate.Frequency = 3600 DVD.MountPoint = /mnt/cdrom/secure Debug.AgentCpuQuota = 50 @@ -42,7 +44,7 @@ Debug.CgroupMonitorExtensionName = Microsoft.Azure.Monitor.AzureMonitorLinuxAgent Debug.EnableAgentMemoryUsageCheck = False Debug.EnableFastTrack = True -Debug.EnableGAVersioning = False +Debug.EnableGAVersioning = True Debug.EtpCollectionPeriod = 300 Debug.FirewallRulesLogPeriod = 86400 DetectScvmmEnv = False @@ -51,6 +53,8 @@ Extensions.Enabled = True Extensions.GoalStatePeriod = 6 Extensions.InitialGoalStatePeriod = 6 +Extensions.WaitForCloudInit = False +Extensions.WaitForCloudInitTimeout = 3600 HttpProxy.Host = None HttpProxy.Port = None Lib.Dir = /var/lib/waagent @@ -231,7 +235,7 @@ def test_calls_collect_logs_with_proper_mode(self, mock_log_collector, *args): @patch("azurelinuxagent.agent.LogCollector") def test_calls_collect_logs_on_valid_cgroups(self, mock_log_collector): try: - CollectLogsHandler.enable_cgroups_validation() + CollectLogsHandler.enable_monitor_cgroups_check() mock_log_collector.run = Mock() def mock_cgroup_paths(*args, **kwargs): @@ -246,12 +250,12 @@ def mock_cgroup_paths(*args, **kwargs): mock_log_collector.assert_called_once() finally: - CollectLogsHandler.disable_cgroups_validation() + CollectLogsHandler.disable_monitor_cgroups_check() @patch("azurelinuxagent.agent.LogCollector") def test_doesnt_call_collect_logs_on_invalid_cgroups(self, mock_log_collector): try: - CollectLogsHandler.enable_cgroups_validation() + CollectLogsHandler.enable_monitor_cgroups_check() mock_log_collector.run = Mock() def mock_cgroup_paths(*args, **kwargs): @@ -270,7 +274,7 @@ def mock_cgroup_paths(*args, **kwargs): mock_exit.assert_called_once_with(logcollector.INVALID_CGROUPS_ERRCODE) self.assertEqual(exit_error, re) finally: - CollectLogsHandler.disable_cgroups_validation() + CollectLogsHandler.disable_monitor_cgroups_check() def test_it_should_parse_setup_firewall_properly(self): diff --git a/tests_e2e/GuestAgentDcrTestExtension/GuestAgentDcrTest.py b/tests_e2e/GuestAgentDcrTestExtension/GuestAgentDcrTest.py new file mode 100644 index 000000000..df6c1b517 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/GuestAgentDcrTest.py @@ -0,0 +1,123 @@ +#!/usr/bin/env python +# pylint: disable=all +from __future__ import print_function + +from Utils.WAAgentUtil import waagent +import Utils.HandlerUtil as Util +import sys +import re +import traceback +import os +import datetime + +ExtensionShortName = "GADcrTestExt" +OperationFileName = "operations-{0}.log" + + +def install(): + operation = "install" + status = "success" + msg = "Installed successfully" + + hutil = parse_context(operation) + hutil.log("Start to install.") + hutil.log(msg) + hutil.do_exit(0, operation, status, '0', msg) + + +def enable(): + # Global Variables definition + operation = "enable" + status = "success" + msg = "Enabled successfully." + + # Operations.append(operation) + hutil = parse_context(operation) + hutil.log("Start to enable.") + public_settings = hutil.get_public_settings() + name = public_settings.get("name") + if name: + name = "Name: {0}".format(name) + hutil.log(name) + msg = "{0} {1}".format(msg, name) + print(name) + else: + hutil.error("The name in public settings is not provided.") + # msg = msg % ','.join(Operations) + hutil.log(msg) + hutil.do_exit(0, operation, status, '0', msg) + + +def disable(): + operation = "disable" + status = "success" + msg = "Disabled successfully." + + # Operations.append(operation) + hutil = parse_context(operation) + hutil.log("Start to disable.") + # msg % ','.join(Operations) + hutil.log(msg) + hutil.do_exit(0, operation, status, '0', msg) + + +def uninstall(): + operation = "uninstall" + status = "success" + msg = "Uninstalled successfully." + + # Operations.append(operation) + hutil = parse_context(operation) + hutil.log("Start to uninstall.") + # msg % ','.join(Operations) + hutil.log(msg) + hutil.do_exit(0, operation, status, '0', msg) + + +def update(): + operation = "update" + status = "success" + msg = "Updated successfully." + + # Operations.append(operation) + hutil = parse_context(operation) + hutil.log("Start to update.") + # msg % ','.join(Operations) + hutil.log(msg) + hutil.do_exit(0, operation, status, '0', msg) + + +def parse_context(operation): + hutil = Util.HandlerUtility(waagent.Log, waagent.Error) + hutil.do_parse_context(operation) + op_log = os.path.join(hutil.get_log_dir(), OperationFileName.format(hutil.get_extension_version())) + with open(op_log, 'a+') as oplog_handler: + oplog_handler.write("Date:{0}; Operation:{1}; SeqNo:{2}\n" + .format(datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"), + operation, hutil.get_seq_no())) + return hutil + + +def main(): + waagent.LoggerInit('/var/log/waagent.log', '/dev/stdout') + waagent.Log("%s started to handle." % (ExtensionShortName)) + + try: + for a in sys.argv[1:]: + if re.match("^([-/]*)(disable)", a): + disable() + elif re.match("^([-/]*)(uninstall)", a): + uninstall() + elif re.match("^([-/]*)(install)", a): + install() + elif re.match("^([-/]*)(enable)", a): + enable() + elif re.match("^([-/]*)(update)", a): + update() + except Exception as e: + err_msg = "Failed with error: {0}, {1}".format(e, traceback.format_exc()) + waagent.Error(err_msg) + + +if __name__ == '__main__': + main() diff --git a/tests_e2e/GuestAgentDcrTestExtension/HandlerManifest.json b/tests_e2e/GuestAgentDcrTestExtension/HandlerManifest.json new file mode 100644 index 000000000..398aab864 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/HandlerManifest.json @@ -0,0 +1,14 @@ +[{ + "name": "GuestAgentDcrTestExtension", + "version": 1.0, + "handlerManifest": { + "installCommand": "./GuestAgentDcrTest.py --install", + "uninstallCommand": "./GuestAgentDcrTest.py --uninstall", + "updateCommand": "./GuestAgentDcrTest.py --update", + "enableCommand": "./GuestAgentDcrTest.py --enable", + "disableCommand": "./GuestAgentDcrTest.py --disable", + "updateMode": "UpdateWithoutInstall", + "rebootAfterInstall": false, + "reportHeartbeat": false + } +}] diff --git a/tests_e2e/GuestAgentDcrTestExtension/Makefile b/tests_e2e/GuestAgentDcrTestExtension/Makefile new file mode 100644 index 000000000..d766ef63a --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Makefile @@ -0,0 +1,8 @@ +default: build + +build: + $(eval NAME = $(shell grep -Pom1 "(?<=)[^<]+" manifest.xml)) + $(eval VERSION = $(shell grep -Pom1 "(?<=)[^<]+" manifest.xml)) + + @echo "Building '$(NAME)-$(VERSION).zip' ..." + zip -r9 $(NAME)-$(VERSION).zip * diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/HandlerUtil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/HandlerUtil.py new file mode 100755 index 000000000..56343f2e5 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/HandlerUtil.py @@ -0,0 +1,387 @@ +# +# Handler library for Linux IaaS +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +""" +JSON def: +HandlerEnvironment.json +[{ + "name": "ExampleHandlerLinux", + "seqNo": "seqNo", + "version": "1.0", + "handlerEnvironment": { + "logFolder": "", + "configFolder": "", + "statusFolder": "", + "heartbeatFile": "", + + } +}] + +Example ./config/1.settings +"{"runtimeSettings":[{"handlerSettings":{"protectedSettingsCertThumbprint":"1BE9A13AA1321C7C515EF109746998BAB6D86FD1","protectedSettings": +"MIIByAYJKoZIhvcNAQcDoIIBuTCCAbUCAQAxggFxMIIBbQIBADBVMEExPzA9BgoJkiaJk/IsZAEZFi9XaW5kb3dzIEF6dXJlIFNlcnZpY2UgTWFuYWdlbWVudCBmb3IgR+nhc6VHQTQpCiiV2zANBgkqhkiG9w0BAQEFAASCAQCKr09QKMGhwYe+O4/a8td+vpB4eTR+BQso84cV5KCAnD6iUIMcSYTrn9aveY6v6ykRLEw8GRKfri2d6tvVDggUrBqDwIgzejGTlCstcMJItWa8Je8gHZVSDfoN80AEOTws9Fp+wNXAbSuMJNb8EnpkpvigAWU2v6pGLEFvSKC0MCjDTkjpjqciGMcbe/r85RG3Zo21HLl0xNOpjDs/qqikc/ri43Y76E/Xv1vBSHEGMFprPy/Hwo3PqZCnulcbVzNnaXN3qi/kxV897xGMPPC3IrO7Nc++AT9qRLFI0841JLcLTlnoVG1okPzK9w6ttksDQmKBSHt3mfYV+skqs+EOMDsGCSqGSIb3DQEHATAUBggqhkiG9w0DBwQITgu0Nu3iFPuAGD6/QzKdtrnCI5425fIUy7LtpXJGmpWDUA==","publicSettings":{"port":"3000"}}}]}" + + +Example HeartBeat +{ +"version": 1.0, + "heartbeat" : { + "status": "ready", + "code": 0, + "Message": "Sample Handler running. Waiting for a new configuration from user." + } +} +Example Status Report: +[{"version":"1.0","timestampUTC":"2014-05-29T04:20:13Z","status":{"name":"Chef Extension Handler","operation":"chef-client-run","status":"success","code":0,"formattedMessage":{"lang":"en-US","message":"Chef-client run success"}}}] + +""" + +import os +import os.path +import sys +import imp +import base64 +import json +import time +import re + +from xml.etree import ElementTree +from os.path import join +from Utils.WAAgentUtil import waagent +from waagent import LoggerInit + +DateTimeFormat = "%Y-%m-%dT%H:%M:%SZ" + +MANIFEST_XML = "manifest.xml" + + +class HandlerContext: + def __init__(self, name): + self._name = name + self._version = '0.0' + self._config_dir = None + self._log_dir = None + self._log_file = None + self._status_dir = None + self._heartbeat_file = None + self._seq_no = -1 + self._status_file = None + self._settings_file = None + self._config = None + return + + +class HandlerUtility: + def __init__(self, log, error, s_name=None, l_name=None, extension_version=None, logFileName='extension.log', + console_logger=None, file_logger=None): + self._log = log + self._log_to_con = console_logger + self._log_to_file = file_logger + self._error = error + self._logFileName = logFileName + if s_name is None or l_name is None or extension_version is None: + (l_name, s_name, extension_version) = self._get_extension_info() + + self._short_name = s_name + self._extension_version = extension_version + self._log_prefix = '[%s-%s] ' % (l_name, extension_version) + + def get_extension_version(self): + return self._extension_version + + def _get_log_prefix(self): + return self._log_prefix + + def _get_extension_info(self): + if os.path.isfile(MANIFEST_XML): + return self._get_extension_info_manifest() + + ext_dir = os.path.basename(os.getcwd()) + (long_name, version) = ext_dir.split('-') + short_name = long_name.split('.')[-1] + + return long_name, short_name, version + + def _get_extension_info_manifest(self): + with open(MANIFEST_XML) as fh: + doc = ElementTree.parse(fh) + namespace = doc.find('{http://schemas.microsoft.com/windowsazure}ProviderNameSpace').text + short_name = doc.find('{http://schemas.microsoft.com/windowsazure}Type').text + version = doc.find('{http://schemas.microsoft.com/windowsazure}Version').text + + long_name = "%s.%s" % (namespace, short_name) + return (long_name, short_name, version) + + def _get_current_seq_no(self, config_folder): + seq_no = -1 + cur_seq_no = -1 + freshest_time = None + for subdir, dirs, files in os.walk(config_folder): + for file in files: + try: + cur_seq_no = int(os.path.basename(file).split('.')[0]) + if (freshest_time == None): + freshest_time = os.path.getmtime(join(config_folder, file)) + seq_no = cur_seq_no + else: + current_file_m_time = os.path.getmtime(join(config_folder, file)) + if (current_file_m_time > freshest_time): + freshest_time = current_file_m_time + seq_no = cur_seq_no + except ValueError: + continue + return seq_no + + def log(self, message): + self._log(self._get_log_prefix() + message) + + def log_to_console(self, message): + if self._log_to_con is not None: + self._log_to_con(self._get_log_prefix() + message) + else: + self.error("Unable to log to console, console log method not set") + + def log_to_file(self, message): + if self._log_to_file is not None: + self._log_to_file(self._get_log_prefix() + message) + else: + self.error("Unable to log to file, file log method not set") + + def error(self, message): + self._error(self._get_log_prefix() + message) + + @staticmethod + def redact_protected_settings(content): + redacted_tmp = re.sub('"protectedSettings":\s*"[^"]+=="', '"protectedSettings": "*** REDACTED ***"', content) + redacted = re.sub('"protectedSettingsCertThumbprint":\s*"[^"]+"', '"protectedSettingsCertThumbprint": "*** REDACTED ***"', redacted_tmp) + return redacted + + def _parse_config(self, ctxt): + config = None + try: + config = json.loads(ctxt) + except: + self.error('JSON exception decoding ' + HandlerUtility.redact_protected_settings(ctxt)) + + if config is None: + self.error("JSON error processing settings file:" + HandlerUtility.redact_protected_settings(ctxt)) + else: + handlerSettings = config['runtimeSettings'][0]['handlerSettings'] + if 'protectedSettings' in handlerSettings and \ + 'protectedSettingsCertThumbprint' in handlerSettings and \ + handlerSettings['protectedSettings'] is not None and \ + handlerSettings["protectedSettingsCertThumbprint"] is not None: + protectedSettings = handlerSettings['protectedSettings'] + thumb = handlerSettings['protectedSettingsCertThumbprint'] + cert = waagent.LibDir + '/' + thumb + '.crt' + pkey = waagent.LibDir + '/' + thumb + '.prv' + unencodedSettings = base64.standard_b64decode(protectedSettings) + openSSLcmd = "openssl smime -inform DER -decrypt -recip {0} -inkey {1}" + cleartxt = waagent.RunSendStdin(openSSLcmd.format(cert, pkey), unencodedSettings)[1] + if cleartxt is None: + self.error("OpenSSL decode error using thumbprint " + thumb) + self.do_exit(1, "Enable", 'error', '1', 'Failed to decrypt protectedSettings') + jctxt = '' + try: + jctxt = json.loads(cleartxt) + except: + self.error('JSON exception decoding ' + HandlerUtility.redact_protected_settings(cleartxt)) + handlerSettings['protectedSettings']=jctxt + self.log('Config decoded correctly.') + return config + + def do_parse_context(self, operation): + _context = self.try_parse_context() + if not _context: + self.do_exit(1, operation, 'error', '1', operation + ' Failed') + return _context + + def try_parse_context(self): + self._context = HandlerContext(self._short_name) + handler_env = None + config = None + ctxt = None + code = 0 + # get the HandlerEnvironment.json. According to the extension handler spec, it is always in the ./ directory + self.log('cwd is ' + os.path.realpath(os.path.curdir)) + handler_env_file = './HandlerEnvironment.json' + if not os.path.isfile(handler_env_file): + self.error("Unable to locate " + handler_env_file) + return None + ctxt = waagent.GetFileContents(handler_env_file) + if ctxt == None: + self.error("Unable to read " + handler_env_file) + try: + handler_env = json.loads(ctxt) + except: + pass + if handler_env == None: + self.log("JSON error processing " + handler_env_file) + return None + if type(handler_env) == list: + handler_env = handler_env[0] + + self._context._name = handler_env['name'] + self._context._version = str(handler_env['version']) + self._context._config_dir = handler_env['handlerEnvironment']['configFolder'] + self._context._log_dir = handler_env['handlerEnvironment']['logFolder'] + + self._context._log_file = os.path.join(handler_env['handlerEnvironment']['logFolder'], self._logFileName) + self._change_log_file() + self._context._status_dir = handler_env['handlerEnvironment']['statusFolder'] + self._context._heartbeat_file = handler_env['handlerEnvironment']['heartbeatFile'] + self._context._seq_no = self._get_current_seq_no(self._context._config_dir) + if self._context._seq_no < 0: + self.error("Unable to locate a .settings file!") + return None + self._context._seq_no = str(self._context._seq_no) + self.log('sequence number is ' + self._context._seq_no) + self._context._status_file = os.path.join(self._context._status_dir, self._context._seq_no + '.status') + self._context._settings_file = os.path.join(self._context._config_dir, self._context._seq_no + '.settings') + self.log("setting file path is" + self._context._settings_file) + ctxt = None + ctxt = waagent.GetFileContents(self._context._settings_file) + if ctxt == None: + error_msg = 'Unable to read ' + self._context._settings_file + '. ' + self.error(error_msg) + return None + + self.log("JSON config: " + HandlerUtility.redact_protected_settings(ctxt)) + self._context._config = self._parse_config(ctxt) + return self._context + + def _change_log_file(self): + self.log("Change log file to " + self._context._log_file) + LoggerInit(self._context._log_file, '/dev/stdout') + self._log = waagent.Log + self._error = waagent.Error + + def set_verbose_log(self, verbose): + if (verbose == "1" or verbose == 1): + self.log("Enable verbose log") + LoggerInit(self._context._log_file, '/dev/stdout', verbose=True) + else: + self.log("Disable verbose log") + LoggerInit(self._context._log_file, '/dev/stdout', verbose=False) + + def is_seq_smaller(self): + return int(self._context._seq_no) <= self._get_most_recent_seq() + + def save_seq(self): + self._set_most_recent_seq(self._context._seq_no) + self.log("set most recent sequence number to " + self._context._seq_no) + + def exit_if_enabled(self, remove_protected_settings=False): + self.exit_if_seq_smaller(remove_protected_settings) + + def exit_if_seq_smaller(self, remove_protected_settings): + if(self.is_seq_smaller()): + self.log("Current sequence number, " + self._context._seq_no + ", is not greater than the sequnce number of the most recent executed configuration. Exiting...") + sys.exit(0) + self.save_seq() + + if remove_protected_settings: + self.scrub_settings_file() + + def _get_most_recent_seq(self): + if (os.path.isfile('mrseq')): + seq = waagent.GetFileContents('mrseq') + if (seq): + return int(seq) + + return -1 + + def is_current_config_seq_greater_inused(self): + return int(self._context._seq_no) > self._get_most_recent_seq() + + def get_inused_config_seq(self): + return self._get_most_recent_seq() + + def set_inused_config_seq(self, seq): + self._set_most_recent_seq(seq) + + def _set_most_recent_seq(self, seq): + waagent.SetFileContents('mrseq', str(seq)) + + def do_status_report(self, operation, status, status_code, message): + self.log("{0},{1},{2},{3}".format(operation, status, status_code, message)) + tstamp = time.strftime(DateTimeFormat, time.gmtime()) + stat = [{ + "version": self._context._version, + "timestampUTC": tstamp, + "status": { + "name": self._context._name, + "operation": operation, + "status": status, + "code": status_code, + "formattedMessage": { + "lang": "en-US", + "message": message + } + } + }] + stat_rept = json.dumps(stat) + if self._context._status_file: + tmp = "%s.tmp" % (self._context._status_file) + with open(tmp, 'w+') as f: + f.write(stat_rept) + os.rename(tmp, self._context._status_file) + + def do_heartbeat_report(self, heartbeat_file, status, code, message): + # heartbeat + health_report = '[{"version":"1.0","heartbeat":{"status":"' + status + '","code":"' + code + '","Message":"' + message + '"}}]' + if waagent.SetFileContents(heartbeat_file, health_report) == None: + self.error('Unable to wite heartbeat info to ' + heartbeat_file) + + def do_exit(self, exit_code, operation, status, code, message): + try: + self.do_status_report(operation, status, code, message) + except Exception as e: + self.log("Can't update status: " + str(e)) + sys.exit(exit_code) + + def get_name(self): + return self._context._name + + def get_seq_no(self): + return self._context._seq_no + + def get_log_dir(self): + return self._context._log_dir + + def get_handler_settings(self): + if (self._context._config != None): + return self._context._config['runtimeSettings'][0]['handlerSettings'] + return None + + def get_protected_settings(self): + if (self._context._config != None): + return self.get_handler_settings().get('protectedSettings') + return None + + def get_public_settings(self): + handlerSettings = self.get_handler_settings() + if (handlerSettings != None): + return self.get_handler_settings().get('publicSettings') + return None + + def scrub_settings_file(self): + content = waagent.GetFileContents(self._context._settings_file) + redacted = HandlerUtility.redact_protected_settings(content) + + waagent.SetFileContents(self._context._settings_file, redacted) \ No newline at end of file diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/LogUtil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/LogUtil.py new file mode 100755 index 000000000..71c200cec --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/LogUtil.py @@ -0,0 +1,50 @@ +# Logging utilities +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +import os +import os.path +import string +import sys + +OutputSize = 4 * 1024 + + +def tail(log_file, output_size = OutputSize): + pos = min(output_size, os.path.getsize(log_file)) + with open(log_file, "r") as log: + log.seek(0, os.SEEK_END) + log.seek(log.tell() - pos, os.SEEK_SET) + buf = log.read(output_size) + buf = filter(lambda x: x in string.printable, buf) + + # encoding works different for between interpreter version, we are keeping separate implementation to ensure + # backward compatibility + if sys.version_info[0] == 3: + buf = ''.join(list(buf)).encode('ascii', 'ignore').decode("ascii", "ignore") + elif sys.version_info[0] == 2: + buf = buf.decode("ascii", "ignore") + + return buf + + +def get_formatted_log(summary, stdout, stderr): + msg_format = ("{0}\n" + "---stdout---\n" + "{1}\n" + "---errout---\n" + "{2}\n") + return msg_format.format(summary, stdout, stderr) \ No newline at end of file diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/ScriptUtil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/ScriptUtil.py new file mode 100755 index 000000000..3987cc04c --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/ScriptUtil.py @@ -0,0 +1,140 @@ +# Script utilities +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +import os +import os.path +import time +import subprocess +import traceback +import string +import shlex +import sys + +from Utils import LogUtil +from Utils.WAAgentUtil import waagent + +DefaultStdoutFile = "stdout" +DefaultErroutFile = "errout" + + +def run_command(hutil, args, cwd, operation, extension_short_name, version, exit_after_run=True, interval=30, + std_out_file_name=DefaultStdoutFile, std_err_file_name=DefaultErroutFile): + std_out_file = os.path.join(cwd, std_out_file_name) + err_out_file = os.path.join(cwd, std_err_file_name) + std_out = None + err_out = None + try: + std_out = open(std_out_file, "w") + err_out = open(err_out_file, "w") + start_time = time.time() + child = subprocess.Popen(args, + cwd=cwd, + stdout=std_out, + stderr=err_out) + time.sleep(1) + while child.poll() is None: + msg = "Command is running..." + msg_with_cmd_output = LogUtil.get_formatted_log(msg, LogUtil.tail(std_out_file), LogUtil.tail(err_out_file)) + msg_without_cmd_output = msg + " Stdout/Stderr omitted from output." + + hutil.log_to_file(msg_with_cmd_output) + hutil.log_to_console(msg_without_cmd_output) + hutil.do_status_report(operation, 'transitioning', '0', msg_without_cmd_output) + time.sleep(interval) + + exit_code = child.returncode + if child.returncode and child.returncode != 0: + msg = "Command returned an error." + msg_with_cmd_output = LogUtil.get_formatted_log(msg, LogUtil.tail(std_out_file), LogUtil.tail(err_out_file)) + msg_without_cmd_output = msg + " Stdout/Stderr omitted from output." + + hutil.error(msg_without_cmd_output) + waagent.AddExtensionEvent(name=extension_short_name, + op=operation, + isSuccess=False, + version=version, + message="(01302)" + msg_without_cmd_output) + else: + msg = "Command is finished." + msg_with_cmd_output = LogUtil.get_formatted_log(msg, LogUtil.tail(std_out_file), LogUtil.tail(err_out_file)) + msg_without_cmd_output = msg + " Stdout/Stderr omitted from output." + + hutil.log_to_file(msg_with_cmd_output) + hutil.log_to_console(msg_without_cmd_output) + waagent.AddExtensionEvent(name=extension_short_name, + op=operation, + isSuccess=True, + version=version, + message="(01302)" + msg_without_cmd_output) + end_time = time.time() + waagent.AddExtensionEvent(name=extension_short_name, + op=operation, + isSuccess=True, + version=version, + message=("(01304)Command execution time: " + "{0}s").format(str(end_time - start_time))) + + log_or_exit(hutil, exit_after_run, exit_code, operation, msg_with_cmd_output) + except Exception as e: + error_msg = ("Failed to launch command with error: {0}," + "stacktrace: {1}").format(e, traceback.format_exc()) + hutil.error(error_msg) + waagent.AddExtensionEvent(name=extension_short_name, + op=operation, + isSuccess=False, + version=version, + message="(01101)" + error_msg) + exit_code = 1 + msg = 'Launch command failed: {0}'.format(e) + + log_or_exit(hutil, exit_after_run, exit_code, operation, msg) + finally: + if std_out: + std_out.close() + if err_out: + err_out.close() + return exit_code + + +# do_exit calls sys.exit which raises an exception so we do not call it from the finally block +def log_or_exit(hutil, exit_after_run, exit_code, operation, msg): + status = 'success' if exit_code == 0 else 'failed' + if exit_after_run: + hutil.do_exit(exit_code, operation, status, str(exit_code), msg) + else: + hutil.do_status_report(operation, status, str(exit_code), msg) + + +def parse_args(cmd): + cmd = filter(lambda x: x in string.printable, cmd) + + # encoding works different for between interpreter version, we are keeping separate implementation to ensure + # backward compatibility + if sys.version_info[0] == 3: + cmd = ''.join(list(cmd)).encode('ascii', 'ignore').decode("ascii", "ignore") + elif sys.version_info[0] == 2: + cmd = cmd.decode("ascii", "ignore") + + args = shlex.split(cmd) + # From python 2.6 to python 2.7.2, shlex.split output UCS-4 result like + # '\x00\x00a'. Temp workaround is to replace \x00 + for idx, val in enumerate(args): + if '\x00' in args[idx]: + args[idx] = args[idx].replace('\x00', '') + return args + + diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/WAAgentUtil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/WAAgentUtil.py new file mode 100755 index 000000000..41ef3bb11 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/WAAgentUtil.py @@ -0,0 +1,140 @@ +# Wrapper module for waagent +# +# waagent is not written as a module. This wrapper module is created +# to use the waagent code as a module. +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +import imp +import os +import os.path + + +# +# The following code will search and load waagent code and expose +# it as a submodule of current module +# +def searchWAAgent(): + # if the extension ships waagent in its package to default to this version first + pkg_agent_path = os.path.join(os.getcwd(), 'waagent') + if os.path.isfile(pkg_agent_path): + return pkg_agent_path + + agentPath = '/usr/sbin/waagent' + if os.path.isfile(agentPath): + return agentPath + + user_paths = os.environ['PYTHONPATH'].split(os.pathsep) + for user_path in user_paths: + agentPath = os.path.join(user_path, 'waagent') + if os.path.isfile(agentPath): + return agentPath + return None + + +waagent = None +agentPath = searchWAAgent() +if agentPath: + waagent = imp.load_source('waagent', agentPath) +else: + raise Exception("Can't load waagent.") + +if not hasattr(waagent, "AddExtensionEvent"): + """ + If AddExtensionEvent is not defined, provide a dummy impl. + """ + + + def _AddExtensionEvent(*args, **kwargs): + pass + + + waagent.AddExtensionEvent = _AddExtensionEvent + +if not hasattr(waagent, "WALAEventOperation"): + class _WALAEventOperation: + HeartBeat = "HeartBeat" + Provision = "Provision" + Install = "Install" + UnIsntall = "UnInstall" + Disable = "Disable" + Enable = "Enable" + Download = "Download" + Upgrade = "Upgrade" + Update = "Update" + + + waagent.WALAEventOperation = _WALAEventOperation + +# Better deal with the silly waagent typo, in anticipation of a proper fix of the typo later on waagent +if not hasattr(waagent.WALAEventOperation, 'Uninstall'): + if hasattr(waagent.WALAEventOperation, 'UnIsntall'): + waagent.WALAEventOperation.Uninstall = waagent.WALAEventOperation.UnIsntall + else: # This shouldn't happen, but just in case... + waagent.WALAEventOperation.Uninstall = 'Uninstall' + + +def GetWaagentHttpProxyConfigString(): + """ + Get http_proxy and https_proxy from waagent config. + Username and password is not supported now. + This code is adopted from /usr/sbin/waagent + """ + host = None + port = None + try: + waagent.Config = waagent.ConfigurationProvider( + None) # Use default waagent conf file (most likely /etc/waagent.conf) + + host = waagent.Config.get("HttpProxy.Host") + port = waagent.Config.get("HttpProxy.Port") + except Exception as e: + # waagent.ConfigurationProvider(None) will throw an exception on an old waagent + # Has to silently swallow because logging is not yet available here + # and we don't want to bring that in here. Also if the call fails, then there's + # no proxy config in waagent.conf anyway, so it's safe to silently swallow. + pass + + result = '' + if host is not None: + result = "http://" + host + if port is not None: + result += ":" + port + + return result + + +waagent.HttpProxyConfigString = GetWaagentHttpProxyConfigString() + +# end: waagent http proxy config stuff + +__ExtensionName__ = None + + +def InitExtensionEventLog(name): + global __ExtensionName__ + __ExtensionName__ = name + + +def AddExtensionEvent(name=__ExtensionName__, + op=waagent.WALAEventOperation.Enable, + isSuccess=False, + message=None): + if name is not None: + waagent.AddExtensionEvent(name=name, + op=op, + isSuccess=isSuccess, + message=message) diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/MockUtil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/MockUtil.py new file mode 100755 index 000000000..8c8c24271 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/MockUtil.py @@ -0,0 +1,44 @@ +#!/usr/bin/env python +# +# Sample Extension +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +# TODO: These tests were copied as reference - they are not currently running + +class MockUtil(): + def __init__(self, test): + self.test = test + + def get_log_dir(self): + return "/tmp" + + def log(self, msg): + print(msg) + + def error(self, msg): + print(msg) + + def get_seq_no(self): + return "0" + + def do_status_report(self, operation, status, status_code, message): + self.test.assertNotEqual(None, message) + self.last = "do_status_report" + + def do_exit(self,exit_code,operation,status,code,message): + self.test.assertNotEqual(None, message) + self.last = "do_exit" diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/env.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/env.py new file mode 100755 index 000000000..fa447fcc6 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/env.py @@ -0,0 +1,24 @@ +#!/usr/bin/env python +# +# Sample Extension +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import os + +#append installer directory to sys.path +root = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) +sys.path.append(root) diff --git a/azurelinuxagent/distro/__init__.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/mock.sh old mode 100644 new mode 100755 similarity index 79% rename from azurelinuxagent/distro/__init__.py rename to tests_e2e/GuestAgentDcrTestExtension/Utils/test/mock.sh index de7be3364..da2fec539 --- a/azurelinuxagent/distro/__init__.py +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/mock.sh @@ -1,4 +1,6 @@ -# Copyright 2018 Microsoft Corporation +#!/bin/bash +# +# Copyright 2014 Microsoft Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -11,7 +13,11 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# -# Requires Python 2.6+ and Openssl 1.0+ -# +echo "Start..." +sleep 0.1 +echo "Running" +>&2 echo "Warning" +sleep 0.1 +echo "Finished" +exit $1 diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_logutil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_logutil.py new file mode 100755 index 000000000..163ad7a91 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_logutil.py @@ -0,0 +1,35 @@ +#!/usr/bin/env python +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +# TODO: These tests were copied as reference - they are not currently running + +import unittest +import LogUtil as lu + + +class TestLogUtil(unittest.TestCase): + def test_tail(self): + with open("/tmp/testtail", "w+") as F: + F.write(u"abcdefghijklmnopqrstu\u6211vwxyz".encode("utf-8")) + tail = lu.tail("/tmp/testtail", 2) + self.assertEquals("yz", tail) + + tail = lu.tail("/tmp/testtail") + self.assertEquals("abcdefghijklmnopqrstuvwxyz", tail) + +if __name__ == '__main__': + unittest.main() diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_null_protected_settings.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_null_protected_settings.py new file mode 100755 index 000000000..bbb6dbbd6 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_null_protected_settings.py @@ -0,0 +1,48 @@ +#!/usr/bin/env python +# +# Sample Extension +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +# TODO: These tests were copied as reference - they are not currently running + +import unittest +import HandlerUtil as Util + +def mock_log(*args, **kwargs): + pass + +class TestNullProtectedSettings(unittest.TestCase): + def test_null_protected_settings(self): + hutil = Util.HandlerUtility(mock_log, mock_log, "UnitTest", "HandlerUtil.UnitTest", "0.0.1") + config = hutil._parse_config(Settings) + handlerSettings = config['runtimeSettings'][0]['handlerSettings'] + self.assertEquals(handlerSettings["protectedSettings"], None) + +Settings="""\ +{ + "runtimeSettings":[{ + "handlerSettings":{ + "protectedSettingsCertThumbprint":null, + "protectedSettings":null, + "publicSettings":{} + } + }] +} +""" + +if __name__ == '__main__': + unittest.main() diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_redacted_settings.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_redacted_settings.py new file mode 100644 index 000000000..d3ed63ba7 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_redacted_settings.py @@ -0,0 +1,47 @@ +#!/usr/bin/env python +# +# Tests for redacted settings +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +# TODO: These tests were copied as reference - they are not currently running + +import unittest +import Utils.HandlerUtil as Util + + +class TestRedactedProtectedSettings(unittest.TestCase): + + def test_redacted_protected_settings(self): + redacted = Util.HandlerUtility.redact_protected_settings(settings_original) + self.assertIn('"protectedSettings": "*** REDACTED ***"', redacted) + self.assertIn('"protectedSettingsCertThumbprint": "*** REDACTED ***"', redacted) + + +settings_original = """\ +{ + "runtimeSettings": [{ + "handlerSettings": { + "protectedSettingsCertThumbprint": "9310D2O49D7216D4A1CEDCE9D8A7CE5DBD7FB7BF", + "protectedSettings": "MIIC4AYJKoZIhvcNAQcWoIIB0TCDEc0CAQAxggFpMIIBZQIBADBNMDkxNzA1BgoJkiaJk/IsZAEZFidXaW5kb3dzIEF6dXJlIENSUCBDZXJ0aWZpY2F0ZSBHZW5lcmF0b3ICEB8f7DyzHLGjSDLnEWd4YeAwDQYJKoZIhvcNAQEBBQAEggEAiZj2gQtT4MpdTaEH8rUVFB/8Ucc8OxGFWu8VKbIdoHLKp1WcDb7Vlzv6fHLBIccgXGuR1XHTvtlD4QiKpSet341tPPug/R5ZtLSRz1pqtXZdrFcuuSxOa6ib/+la5ukdygcVwkEnmNSQaiipPKyqPH2JsuhmGCdXFiKwCSTrgGE6GyCBtaK9KOf48V/tYXHnDGrS9q5a1gRF5KVI2B26UYSO7V7pXjzYCd/Sp9yGj7Rw3Kqf9Lpix/sPuqWjV6e2XFlD3YxaHSeHVnLI/Bkz2E6Ri8yfPYus52r/mECXPL2YXqY9dGyrlKKIaD9AuzMyvvy1A74a9VBq7zxQQ4adEzBbBgkqhkiG9w0BBwEwFAYIKoZIhvcNAwcECDyEf4mRrmWJgDhW4j2nRNTJU4yXxocQm/PhAr39Um7n0pgI2Cn28AabYtsHWjKqr8Al9LX6bKm8cnmnLjqTntphCw==", + "publicSettings": {} + } + }] +} +""" + +if __name__ == '__main__': + unittest.main() diff --git a/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_scriptutil.py b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_scriptutil.py new file mode 100755 index 000000000..4f84cefb2 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/Utils/test/test_scriptutil.py @@ -0,0 +1,55 @@ +#!/usr/bin/env python +# +# Copyright 2014 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=all + +# TODO: These tests were copied as reference - they are not currently running + +import os +import os.path +import env +import ScriptUtil as su +import unittest +from MockUtil import MockUtil + +class TestScriptUtil(unittest.TestCase): + def test_parse_args(self): + print(__file__) + cmd = u'sh foo.bar.sh -af bar --foo=bar | more \u6211' + args = su.parse_args(cmd.encode('utf-8')) + self.assertNotEquals(None, args) + self.assertNotEquals(0, len(args)) + print(args) + + def test_run_command(self): + hutil = MockUtil(self) + test_script = "mock.sh" + os.chdir(os.path.join(env.root, "test")) + exit_code = su.run_command(hutil, ["sh", test_script, "0"], os.getcwd(), 'RunScript-0', 'TestExtension', '1.0', True, 0.1) + self.assertEquals(0, exit_code) + self.assertEquals("do_exit", hutil.last) + exit_code = su.run_command(hutil, ["sh", test_script, "75"], os.getcwd(), 'RunScript-1', 'TestExtension', '1.0', False, 0.1) + self.assertEquals(75, exit_code) + self.assertEquals("do_status_report", hutil.last) + + def test_log_or_exit(self): + hutil = MockUtil(self) + su.log_or_exit(hutil, True, 0, 'LogOrExit-0', 'Message1') + self.assertEquals("do_exit", hutil.last) + su.log_or_exit(hutil, False, 0, 'LogOrExit-1', 'Message2') + self.assertEquals("do_status_report", hutil.last) + +if __name__ == '__main__': + unittest.main() diff --git a/tests_e2e/GuestAgentDcrTestExtension/manifest.xml b/tests_e2e/GuestAgentDcrTestExtension/manifest.xml new file mode 100644 index 000000000..a4b0c755f --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/manifest.xml @@ -0,0 +1,17 @@ + + + Microsoft.Azure.TestExtensions + GuestAgentDcrTest + 1.4.1 + + VmRole + + Microsoft Azure Guest Agent test Extension for testing Linux Virtual Machines in DCR + true + https://github.com/larohra/GuestAgentDcrTestExtension/blob/master/LICENSE + http://www.microsoft.com/privacystatement/en-us/OnlineServices/Default.aspx + https://github.com/larohra/GuestAgentDcrTestExtension + true + Linux + Microsoft + diff --git a/tests_e2e/GuestAgentDcrTestExtension/references b/tests_e2e/GuestAgentDcrTestExtension/references new file mode 100644 index 000000000..442153ec8 --- /dev/null +++ b/tests_e2e/GuestAgentDcrTestExtension/references @@ -0,0 +1,2 @@ +# TODO: Investigate the use of this file +Utils/ diff --git a/tests_e2e/orchestrator/docker/Dockerfile b/tests_e2e/orchestrator/docker/Dockerfile index a748ff0b8..597e57418 100644 --- a/tests_e2e/orchestrator/docker/Dockerfile +++ b/tests_e2e/orchestrator/docker/Dockerfile @@ -7,7 +7,7 @@ # # docker run --rm -it -v /home/nam/src/WALinuxAgent:/home/waagent/WALinuxAgent waagenttests bash --login # -FROM ubuntu:latest +FROM mcr.microsoft.com/cbl-mariner/base/core:2.0 LABEL description="Test environment for WALinuxAgent" SHELL ["/bin/bash", "-c"] @@ -18,34 +18,39 @@ SHELL ["/bin/bash", "-c"] USER root RUN \ - apt-get update && \ - \ + tdnf -y update && \ + # mariner packages can be found in this repository https://cvedashboard.azurewebsites.net/#/packages \ # \ # Install basic dependencies \ # \ - apt-get install -y git python3.10 python3.10-dev wget bzip2 && \ - ln /usr/bin/python3.10 /usr/bin/python3 && \ + tdnf -y install git python3 python3-devel wget bzip2 ca-certificates && \ \ # \ # Install LISA dependencies \ # \ - apt-get install -y git gcc libgirepository1.0-dev libcairo2-dev qemu-utils libvirt-dev \ - python3-pip python3-venv && \ + tdnf install -y git gcc gobject-introspection-devel cairo-devel pkg-config python3-devel libvirt-devel \ + cairo-gobject binutils kernel-headers glibc-devel python3-pip python3-virtualenv && \ \ # \ # Install test dependencies \ # \ - apt-get install -y zip && \ + tdnf -y install zip tar && \ \ # \ # Create user waagent, which is used to execute the tests \ # \ groupadd waagent && \ useradd --shell /bin/bash --create-home -g waagent waagent && \ + \ + # \ + # Install the Azure CLI \ + # \ + tdnf -y install azure-cli && \ + tdnf clean all && \ : # -# Do the Poetry and LISA setup as waagent +# Install LISA as user waagent # USER waagent @@ -53,11 +58,16 @@ RUN \ export PATH="$HOME/.local/bin:$PATH" && \ \ # \ - # Install LISA \ + # Install LISA. \ + # \ + # (note that we use a specific commit, which is the version of LISA that has been verified to work with our \ + # tests; when taking a new LISA version, make sure to verify that the tests work OK before pushing the \ + # Docker image to our registry) \ # \ cd $HOME && \ git clone https://github.com/microsoft/lisa.git && \ cd lisa && \ + git checkout 2c16e32001fdefb9572dff61241451b648259dbf && \ \ python3 -m pip install --upgrade pip && \ python3 -m pip install --editable .[azure,libvirt] --config-settings editable_mode=compat && \ @@ -65,15 +75,23 @@ RUN \ # \ # Install additional test dependencies \ # \ - python3 -m pip install distro msrestazure && \ - python3 -m pip install azure-mgmt-compute --upgrade && \ + # (note that we update azure-mgmt-compute to 29.1.0 - LISA installs 26.1; this is needed in order to access \ + # osProfile.linuxConfiguration.enableVMAgentPlatformUpdates in the VM model - that property is used by some \ + # tests, such as Agent versioning) \ + # \ + python3 -m pip install distro msrestazure pytz && \ + python3 -m pip install azure-mgmt-compute==29.1.0 --upgrade && \ \ # \ # Download Pypy to a known location, from which it will be installed to the test VMs. \ # \ - mkdir $HOME/bin && \ - wget https://downloads.python.org/pypy/pypy3.7-v7.3.5-linux64.tar.bz2 -O /tmp/pypy3.7-x64.tar.bz2 && \ - wget https://downloads.python.org/pypy/pypy3.7-v7.3.5-aarch64.tar.bz2 -O /tmp/pypy3.7-arm64.tar.bz2 && \ + wget https://dcrdata.blob.core.windows.net/python/pypy3.7-x64.tar.bz2 -O /tmp/pypy3.7-x64.tar.bz2 && \ + wget https://dcrdata.blob.core.windows.net/python/pypy3.7-arm64.tar.bz2 -O /tmp/pypy3.7-arm64.tar.bz2 && \ + \ + # \ + # Install pudb, which can be useful to debug issues in the image \ + # \ + python3 -m pip install pudb && \ \ # \ # The setup for the tests depends on a few paths; add those to the profile \ diff --git a/tests_e2e/orchestrator/lib/agent_junit.py b/tests_e2e/orchestrator/lib/agent_junit.py index a8ff8eb6c..2e09c73d7 100644 --- a/tests_e2e/orchestrator/lib/agent_junit.py +++ b/tests_e2e/orchestrator/lib/agent_junit.py @@ -29,6 +29,7 @@ from lisa.messages import ( # pylint: disable=E0401 MessageBase, TestResultMessage, + TestStatus ) @@ -36,6 +37,7 @@ @dataclass class AgentJUnitSchema(schema.Notifier): path: str = "agent.junit.xml" + include_subtest: bool = True class AgentJUnit(JUnit): @@ -48,19 +50,27 @@ def type_schema(cls) -> Type[schema.TypedSchema]: return AgentJUnitSchema def _received_message(self, message: MessageBase) -> None: - # The Agent sends its own TestResultMessage and marks them as "AgentTestResultMessage"; for the - # test results sent by LISA itself, we change the suite name to "_Runbook_" in order to separate them - # from actual test results. + # The Agent sends its own TestResultMessages setting their type as "AgentTestResultMessage". + # Any other message types are sent by LISA. if isinstance(message, TestResultMessage) and message.type != "AgentTestResultMessage": if "Unexpected error in AgentTestSuite" in message.message: # Ignore these errors, they are already reported as AgentTestResultMessages return + if "TestFailedException" in message.message: + # Ignore these errors, they are already reported as test failures + return + # Change the suite name to "_Runbook_" for LISA messages in order to separate them + # from actual test results. message.suite_full_name = "_Runbook_" message.suite_name = message.suite_full_name image = message.information.get('image') if image is not None: - # NOTE: message.information['environment'] is similar to "[generated_2]" and can be correlated + # NOTE: The value of message.information['environment'] is similar to "[generated_2]" and can be correlated # with the main LISA log to find the specific VM for the message. message.full_name = f"{image} [{message.information['environment']}]" message.name = message.full_name + # LISA silently skips tests on situations that should be errors (e.g. trying to create a test VM using an image that is not available). + # Mark these messages as failed so that the JUnit report shows them as errors. + if message.status == TestStatus.SKIPPED: + message.status = TestStatus.FAILED super()._received_message(message) diff --git a/tests_e2e/orchestrator/lib/agent_test_loader.py b/tests_e2e/orchestrator/lib/agent_test_loader.py index a0f0bfaaf..11e665c13 100644 --- a/tests_e2e/orchestrator/lib/agent_test_loader.py +++ b/tests_e2e/orchestrator/lib/agent_test_loader.py @@ -15,6 +15,7 @@ # limitations under the License. # import importlib.util +import re # E0401: Unable to import 'yaml' (import-error) import yaml # pylint: disable=E0401 @@ -22,7 +23,24 @@ from typing import Any, Dict, List, Type import tests_e2e -from tests_e2e.tests.lib.agent_test import AgentTest +from tests_e2e.tests.lib.agent_test import AgentTest, AgentVmTest, AgentVmssTest + + +class TestInfo(object): + """ + Description of a test + """ + # The class that implements the test + test_class: Type[AgentVmTest] + # If True, an error in the test blocks the execution of the test suite (defaults to False) + blocks_suite: bool + + @property + def name(self) -> str: + return self.test_class.__name__ + + def __str__(self): + return self.name class TestSuiteInfo(object): @@ -32,13 +50,23 @@ class TestSuiteInfo(object): # The name of the test suite name: str # The tests that comprise the suite - tests: List[Type[AgentTest]] + tests: List[TestInfo] # Images or image sets (as defined in images.yml) on which the suite must run. images: List[str] - # The location (region) on which the suite must run; if empty, the suite can run on any location - location: str + # The locations (regions) on which the suite must run; if empty, the suite can run on any location + locations: List[str] # Whether this suite must run on its own test VM owns_vm: bool + # If True, the suite must run on a scale set (instead of a single VM) + executes_on_scale_set: bool + # Whether to install the test Agent on the test VM + install_test_agent: bool + # Customization for the ARM template used when creating the test VM + template: str + # skip test suite if the test not supposed to run on specific clouds + skip_on_clouds: List[str] + # skip test suite if test suite not suppose to run on specific images + skip_on_images: List[str] def __str__(self): return self.name @@ -48,7 +76,7 @@ class VmImageInfo(object): # The URN of the image (publisher, offer, version separated by spaces) urn: str # Indicates that the image is available only on those locations. If empty, the image should be available in all locations - locations: List[str] + locations: Dict[str, List[str]] # Indicates that the image is available only for those VM sizes. If empty, the image should be available for all VM sizes vm_sizes: List[str] @@ -60,11 +88,14 @@ class AgentTestLoader(object): """ Loads a given set of test suites from the YAML configuration files. """ - def __init__(self, test_suites: str): + def __init__(self, test_suites: str, cloud: str): """ Loads the specified 'test_suites', which are given as a string of comma-separated suite names or a YAML description of a single test_suite. + The 'cloud' parameter indicates the cloud on which the tests will run. It is used to validate any restrictions on the test suite and/or + images location. + When given as a comma-separated list, each item must correspond to the name of the YAML files describing s suite (those files are located under the .../WALinuxAgent/tests_e2e/test_suites directory). For example, if test_suites == "agent_bvt, fast_track" then this method will load files agent_bvt.yml and fast_track.yml. @@ -78,6 +109,7 @@ def __init__(self, test_suites: str): - "bvts/vm_access.py" """ self.__test_suites: List[TestSuiteInfo] = self._load_test_suites(test_suites) + self.__cloud: str = cloud self.__images: Dict[str, List[VmImageInfo]] = self._load_images() self._validate() @@ -95,23 +127,54 @@ def images(self) -> Dict[str, List[VmImageInfo]]: """ return self.__images + # Matches a reference to a random subset of images within a set with an optional count: random(, []), e.g. random(endorsed, 3), random(endorsed) + RANDOM_IMAGES_RE = re.compile(r"random\((?P[^,]+)(\s*,\s*(?P\d+))?\)") + def _validate(self): """ Performs some basic validations on the data loaded from the YAML description files """ + def _parse_image(image: str) -> str: + """ + Parses a reference to an image or image set and returns the name of the image or image set + """ + match = AgentTestLoader.RANDOM_IMAGES_RE.match(image) + if match is not None: + return match.group('image_set') + return image + for suite in self.test_suites: # Validate that the images the suite must run on are in images.yml for image in suite.images: + image = _parse_image(image) if image not in self.images: raise Exception(f"Invalid image reference in test suite {suite.name}: Can't find {image} in images.yml") - # If the suite specifies a location, validate that the images it uses are available in that location - if suite.location != '': + # If the suite specifies a cloud and it's location, validate that location string is start with and then validate that the images it uses are available in that location + for suite_location in suite.locations: + if suite_location.startswith(self.__cloud + ":"): + suite_location = suite_location.split(":")[1] + else: + continue for suite_image in suite.images: + suite_image = _parse_image(suite_image) for image in self.images[suite_image]: - if len(image.locations) > 0: - if suite.location not in image.locations: - raise Exception(f"Test suite {suite.name} must be executed in {suite.location}, but <{image.urn}> is not available in that location") + # If the image has a location restriction, validate that it is available on the location the suite must run on + if image.locations: + locations = image.locations.get(self.__cloud) + if locations is not None and not any(suite_location in l for l in locations): + raise Exception(f"Test suite {suite.name} must be executed in {suite_location}, but <{image.urn}> is not available in that location") + + # if the suite specifies skip clouds, validate that cloud used in our tests + for suite_skip_cloud in suite.skip_on_clouds: + if suite_skip_cloud not in ["AzureCloud", "AzureChinaCloud", "AzureUSGovernment"]: + raise Exception(f"Invalid cloud {suite_skip_cloud} for in {suite.name}") + + # if the suite specifies skip images, validate that images used in our tests + for suite_skip_image in suite.skip_on_images: + if suite_skip_image not in self.images: + raise Exception(f"Invalid image reference in test suite {suite.name}: Can't find {suite_skip_image} in images.yml") + @staticmethod def _load_test_suites(test_suites: str) -> List[TestSuiteInfo]: @@ -138,7 +201,7 @@ def _load_test_suite(description_file: Path) -> TestSuiteInfo: """ Loads the description of a TestSuite from its YAML file. - A test suite has 5 properties: name, tests, images, location, and owns-vm. For example: + A test suite is described by the properties listed below. Sample test suite: name: "AgentBvt" tests: @@ -146,24 +209,41 @@ def _load_test_suite(description_file: Path) -> TestSuiteInfo: - "bvts/run_command.py" - "bvts/vm_access.py" images: "endorsed" - location: "eastuseaup" - owns-vm: true + locations: "AzureCloud:eastuseaup" + owns_vm: true + install_test_agent: true + template: "bvts/template.py" + skip_on_clouds: "AzureChinaCloud" + skip_on_images: "ubuntu_2004" * name - A string used to identify the test suite - * tests - A list of the tests in the suite. Each test is specified by the path for its source code relative to - WALinuxAgent/tests_e2e/tests. + * tests - A list of the tests in the suite. Each test can be specified by a string (the path for its source code relative to + WALinuxAgent/tests_e2e/tests), or a dictionary with two items: + * source: the path for its source code relative to WALinuxAgent/tests_e2e/tests + * blocks_suite: [Optional; boolean] If True, a failure on the test will stop execution of the test suite (i.e. the + rest of the tests in the suite will not be executed). By default, a failure on a test does not stop execution of + the test suite. * images - A string, or a list of strings, specifying the images on which the test suite must be executed. Each value can be the name of a single image (e.g."ubuntu_2004"), or the name of an image set (e.g. "endorsed"). The names for images and image sets are defined in WALinuxAgent/tests_e2e/tests_suites/images.yml. - * location - [Optional; string] If given, the test suite must be executed on that location. If not specified, - or set to an empty string, the test suite will be executed in the default location. This is useful + * locations - [Optional; string or list of strings] If given, the test suite must be executed on that cloud location(e.g. "AzureCloud:eastus2euap"). + If not specified, or set to an empty string, the test suite will be executed in the default location. This is useful for test suites that exercise a feature that is enabled only in certain regions. - * owns-vm - [Optional; boolean] By default all suites in a test run are executed on the same test VMs; if this + * owns_vm - [Optional; boolean] By default all suites in a test run are executed on the same test VMs; if this value is set to True, new test VMs will be created and will be used exclusively for this test suite. This is useful for suites that modify the test VMs in such a way that the setup may cause problems in other test suites (for example, some tests targeted to the HGAP block internet access in order to force the agent to use the HGAP). - + * executes_on_scale_set - [Optional; boolean] True indicates that the test runs on a scale set. + * install_test_agent - [Optional; boolean] By default the setup process installs the test Agent on the test VMs; set this property + to False to skip the installation. + * template - [Optional; string] If given, the ARM template for the test VM is customized using the given Python module. + * skip_on_clouds - [Optional; string or list of strings] If given, the test suite will be skipped in the specified cloud(e.g. "AzureCloud"). + If not specified, the test suite will be executed in all the clouds that we use. This is useful + if you want to skip a test suite validation in a particular cloud when certain feature is not available in that cloud. + # skip_on_images - [Optional; string or list of strings] If given, the test suite will be skipped on the specified images or image sets(e.g. "ubuntu_2004"). + If not specified, the test suite will be executed on all the images that we use. This is useful + if you want to skip a test suite validation on a particular images or image sets when certain feature is not available on that image. """ test_suite: Dict[str, Any] = AgentTestLoader._load_file(description_file) @@ -175,9 +255,15 @@ def _load_test_suite(description_file: Path) -> TestSuiteInfo: test_suite_info.name = test_suite["name"] test_suite_info.tests = [] - source_files = [AgentTestLoader._SOURCE_CODE_ROOT/"tests"/t for t in test_suite["tests"]] - for f in source_files: - test_suite_info.tests.extend(AgentTestLoader._load_test_classes(f)) + for test in test_suite["tests"]: + test_info = TestInfo() + if isinstance(test, str): + test_info.test_class = AgentTestLoader._load_test_class(test) + test_info.blocks_suite = False + else: + test_info.test_class = AgentTestLoader._load_test_class(test["source"]) + test_info.blocks_suite = test.get("blocks_suite", False) + test_suite_info.tests.append(test_info) images = test_suite["images"] if isinstance(images, str): @@ -185,24 +271,58 @@ def _load_test_suite(description_file: Path) -> TestSuiteInfo: else: test_suite_info.images = images - test_suite_info.location = test_suite.get("location") - if test_suite_info.location is None: - test_suite_info.location = "" + locations = test_suite.get("locations") + if locations is None: + test_suite_info.locations = [] + else: + if isinstance(locations, str): + test_suite_info.locations = [locations] + else: + test_suite_info.locations = locations + + test_suite_info.owns_vm = "owns_vm" in test_suite and test_suite["owns_vm"] + test_suite_info.install_test_agent = "install_test_agent" not in test_suite or test_suite["install_test_agent"] + test_suite_info.executes_on_scale_set = "executes_on_scale_set" in test_suite and test_suite["executes_on_scale_set"] + test_suite_info.template = test_suite.get("template", "") - test_suite_info.owns_vm = "owns-vm" in test_suite and test_suite["owns-vm"] + # TODO: Add support for custom templates + if test_suite_info.executes_on_scale_set and test_suite_info.template != '': + raise Exception(f"Currently custom templates are not supported on scale sets. [Test suite: {test_suite_info.name}]") + + skip_on_clouds = test_suite.get("skip_on_clouds") + if skip_on_clouds is not None: + if isinstance(skip_on_clouds, str): + test_suite_info.skip_on_clouds = [skip_on_clouds] + else: + test_suite_info.skip_on_clouds = skip_on_clouds + else: + test_suite_info.skip_on_clouds = [] + + skip_on_images = test_suite.get("skip_on_images") + if skip_on_images is not None: + if isinstance(skip_on_images, str): + test_suite_info.skip_on_images = [skip_on_images] + else: + test_suite_info.skip_on_images = skip_on_images + else: + test_suite_info.skip_on_images = [] return test_suite_info @staticmethod - def _load_test_classes(source_file: Path) -> List[Type[AgentTest]]: + def _load_test_class(relative_path: str) -> Type[AgentVmTest]: """ - Takes a 'source_file', which must be a Python module, and returns a list of all the classes derived from AgentTest. + Loads an AgentTest from its source code file, which is given as a path relative to WALinuxAgent/tests_e2e/tests. """ - spec = importlib.util.spec_from_file_location(f"tests_e2e.tests.{source_file.name}", str(source_file)) + full_path: Path = AgentTestLoader._SOURCE_CODE_ROOT/"tests"/relative_path + spec = importlib.util.spec_from_file_location(f"tests_e2e.tests.{relative_path.replace('/', '.').replace('.py', '')}", str(full_path)) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) - # return all the classes in the module that are subclasses of AgentTest but are not AgentTest itself. - return [v for v in module.__dict__.values() if isinstance(v, type) and issubclass(v, AgentTest) and v != AgentTest] + # return all the classes in the module that are subclasses of AgentTest but are not AgentVmTest or AgentVmssTest themselves. + matches = [v for v in module.__dict__.values() if isinstance(v, type) and issubclass(v, AgentTest) and v != AgentVmTest and v != AgentVmssTest] + if len(matches) != 1: + raise Exception(f"Error in {full_path} (each test file must contain exactly one class derived from AgentTest)") + return matches[0] @staticmethod def _load_images() -> Dict[str, List[VmImageInfo]]: @@ -223,14 +343,18 @@ def _load_images() -> Dict[str, List[VmImageInfo]]: i = VmImageInfo() if isinstance(description, str): i.urn = description - i.locations = [] + i.locations = {} i.vm_sizes = [] else: if "urn" not in description: raise Exception(f"Image {name} is missing the 'urn' property: {description}") i.urn = description["urn"] - i.locations = description["locations"] if "locations" in description else [] + i.locations = description["locations"] if "locations" in description else {} i.vm_sizes = description["vm_sizes"] if "vm_sizes" in description else [] + for cloud in i.locations.keys(): + if cloud not in ["AzureCloud", "AzureChinaCloud", "AzureUSGovernment"]: + raise Exception(f"Invalid cloud {cloud} for image {name} in images.yml") + images[name] = [i] # now load the image-sets, mapping them to the images that we just computed diff --git a/tests_e2e/orchestrator/lib/agent_test_suite.py b/tests_e2e/orchestrator/lib/agent_test_suite.py index 0c95daf60..f432c2d4c 100644 --- a/tests_e2e/orchestrator/lib/agent_test_suite.py +++ b/tests_e2e/orchestrator/lib/agent_test_suite.py @@ -14,16 +14,16 @@ # See the License for the specific language governing permissions and # limitations under the License. # -import contextlib import datetime import json import logging +import time import traceback import uuid from pathlib import Path -from threading import current_thread, RLock -from typing import Any, Dict, List +from threading import RLock +from typing import Any, Dict, List, Tuple # Disable those warnings, since 'lisa' is an external, non-standard, dependency # E0401: Unable to import 'lisa' (import-error) @@ -31,7 +31,6 @@ from lisa import ( # pylint: disable=E0401 Environment, Logger, - Node, notifier, simple_requirement, TestCaseMetadata, @@ -40,20 +39,26 @@ ) from lisa.environment import EnvironmentStatus # pylint: disable=E0401 from lisa.messages import TestStatus, TestResultMessage # pylint: disable=E0401 -from lisa.sut_orchestrator import AZURE # pylint: disable=E0401 -from lisa.sut_orchestrator.azure.common import get_node_context, AzureNodeSchema # pylint: disable=E0401 +from lisa.node import LocalNode # pylint: disable=E0401 +from lisa.util.constants import RUN_ID # pylint: disable=E0401 +from lisa.sut_orchestrator.azure.common import get_node_context # pylint: disable=E0401 +from lisa.sut_orchestrator.azure.platform_ import AzurePlatform # pylint: disable=E0401 import makepkg from azurelinuxagent.common.version import AGENT_VERSION + +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient +from tests_e2e.tests.lib.virtual_machine_scale_set_client import VirtualMachineScaleSetClient + +import tests_e2e from tests_e2e.orchestrator.lib.agent_test_loader import TestSuiteInfo -from tests_e2e.tests.lib.agent_log import AgentLog -from tests_e2e.tests.lib.agent_test import TestSkipped -from tests_e2e.tests.lib.agent_test_context import AgentTestContext -from tests_e2e.tests.lib.identifiers import VmIdentifier -from tests_e2e.tests.lib.logging import log -from tests_e2e.tests.lib.logging import set_current_thread_log -from tests_e2e.tests.lib.agent_log import AgentLogRecord -from tests_e2e.tests.lib.shell import run_command +from tests_e2e.tests.lib.agent_log import AgentLog, AgentLogRecord +from tests_e2e.tests.lib.agent_test import TestSkipped, RemoteTestError +from tests_e2e.tests.lib.agent_test_context import AgentTestContext, AgentVmTestContext, AgentVmssTestContext +from tests_e2e.tests.lib.logging import log, set_thread_name, set_current_thread_log +from tests_e2e.tests.lib.network_security_rule import NetworkSecurityRule +from tests_e2e.tests.lib.resource_group_client import ResourceGroupClient +from tests_e2e.tests.lib.shell import run_command, CommandError from tests_e2e.tests.lib.ssh_client import SshClient @@ -81,19 +86,6 @@ def _initialize_lisa_logger(): _initialize_lisa_logger() -# -# Helper to change the current thread name temporarily -# -@contextlib.contextmanager -def _set_thread_name(name: str): - initial_name = current_thread().name - current_thread().name = name - try: - yield - finally: - current_thread().name = initial_name - - # # Possible values for the collect_logs parameter # @@ -103,76 +95,195 @@ class CollectLogs(object): No = 'no' # Never collect logs +# +# Possible values for the keep_environment parameter +# +class KeepEnvironment(object): + Always = 'always' # Do not delete resources created by the test suite + Failed = 'failed' # Skip delete only on test failures + No = 'no' # Always delete resources created by the test suite + + +class TestFailedException(Exception): + def __init__(self, env_name: str, test_cases: List[str]): + msg = "Test suite {0} failed.".format(env_name) + if test_cases: + msg += " Failed tests: " + ','.join(test_cases) + super().__init__(msg) + + +class _TestNode(object): + """ + Name and IP address of a test VM + """ + def __init__(self, name: str, ip_address: str): + self.name = name + self.ip_address = ip_address + + def __str__(self): + return f"{self.name}:{self.ip_address}" + + @TestSuiteMetadata(area="waagent", category="", description="") class AgentTestSuite(LisaTestSuite): """ Manages the setup of test VMs and execution of Agent test suites. This class acts as the interface with the LISA framework, which will invoke the execute() method when a runbook is executed. """ - - class _Context(AgentTestContext): - def __init__(self, vm: VmIdentifier, paths: AgentTestContext.Paths, connection: AgentTestContext.Connection): - super().__init__(vm=vm, paths=paths, connection=connection) - # These are initialized by AgentTestSuite._set_context(). - self.log_path: Path = None - self.lisa_log: Logger = None - self.node: Node = None - self.runbook_name: str = None - self.environment_name: str = None - self.is_vhd: bool = None - self.test_suites: List[AgentTestSuite] = None - self.collect_logs: str = None - self.skip_setup: bool = None - self.ssh_client: SshClient = None - def __init__(self, metadata: TestSuiteMetadata) -> None: super().__init__(metadata) - # The context is initialized by _set_context() via the call to execute() - self.__context: AgentTestSuite._Context = None - - def _initialize(self, node: Node, variables: Dict[str, Any], lisa_working_path: str, lisa_log_path: str, lisa_log: Logger): - connection_info = node.connection_info - node_context = get_node_context(node) - runbook = node.capability.get_extended_runbook(AzureNodeSchema, AZURE) - - self.__context = self._Context( - vm=VmIdentifier( - location=runbook.location, - subscription=node.features._platform.subscription_id, - resource_group=node_context.resource_group_name, - name=node_context.vm_name), - paths=AgentTestContext.Paths( - working_directory=self._get_working_directory(lisa_working_path), - remote_working_directory=Path('/home')/connection_info['username']), - connection=AgentTestContext.Connection( - ip_address=connection_info['address'], - username=connection_info['username'], - private_key_file=connection_info['private_key_file'], - ssh_port=connection_info['port'])) - - self.__context.log_path = self._get_log_path(variables, lisa_log_path) - self.__context.lisa_log = lisa_log - self.__context.node = node - self.__context.is_vhd = self._get_optional_parameter(variables, "c_vhd") != "" - self.__context.environment_name = f"{node.os.name}-vhd" if self.__context.is_vhd else self._get_required_parameter(variables, "c_env_name") - self.__context.test_suites = self._get_required_parameter(variables, "c_test_suites") - self.__context.collect_logs = self._get_required_parameter(variables, "collect_logs") - self.__context.skip_setup = self._get_required_parameter(variables, "skip_setup") - self.__context.ssh_client = SshClient(ip_address=self.__context.vm_ip_address, username=self.__context.username, private_key_file=self.__context.private_key_file) + self._working_directory: Path # Root directory for temporary files + self._log_path: Path # Root directory for log files + self._pypy_x64_path: Path # Path to the Pypy x64 download + self._pypy_arm64_path: Path # Path to the Pypy ARM64 download + self._test_agent_package_path: Path # Path to the package for the test Agent + self._test_source_directory: Path # Root directory of the source code for the end-to-end tests + self._test_tools_tarball_path: Path # Path to the tarball with the tools needed on the test node - @staticmethod - def _get_required_parameter(variables: Dict[str, Any], name: str) -> Any: - value = variables.get(name) - if value is None: - raise Exception(f"The runbook is missing required parameter '{name}'") - return value + self._runbook_name: str # name of the runbook execution, used as prefix on ARM resources created by the AgentTestSuite - @staticmethod - def _get_optional_parameter(variables: Dict[str, Any], name: str, default_value: Any = "") -> Any: - value = variables.get(name) - if value is None: - return default_value - return value + self._lisa_log: Logger # Main log for the LISA run + + self._lisa_environment_name: str # Name assigned by LISA to the test environment, useful for correlation with LISA logs + self._environment_name: str # Name assigned by the AgentTestSuiteCombinator to the test environment + + self._test_suites: List[AgentTestSuite] # Test suites to execute in the environment + + self._cloud: str # Azure cloud where test VMs are located + self._subscription_id: str # Azure subscription where test VMs are located + self._location: str # Azure location (region) where test VMs are located + self._image: str # Image used to create the test VMs; it can be empty if LISA chose the size, or when using an existing VM + + self._is_vhd: bool # True when the test VMs were created by LISA from a VHD; this is usually used to validate a new VHD and the test Agent is not installed + + # username and public SSH key for the admin account used to connect to the test VMs + self._user: str + self._identity_file: str + + # If not empty, adds a Network Security Rule allowing SSH access from the specified IP address to any test VMs created by the test suite. + self._allow_ssh: str + + self._skip_setup: bool # If True, skip the setup of the test VMs + self._collect_logs: str # Whether to collect logs from the test VMs (one of 'always', 'failed', or 'no') + self._keep_environment: str # Whether to skip deletion of the resources created by the test suite (one of 'always', 'failed', or 'no') + + # Resource group and VM/VMSS for the test machines. self._vm_name and self._vmss_name are mutually exclusive, only one of them will be set. + self._resource_group_name: str + self._vm_name: str + self._vm_ip_address: str + self._vmss_name: str + + self._test_nodes: List[_TestNode] # VMs or scale set instances the tests will run on + + # Whether to create and delete a scale set. + self._create_scale_set: bool + self._delete_scale_set: bool + + # + # Test suites within the same runbook may be executed concurrently, and we need to keep track of how many resource + # groups are being created. We use this lock and counter to allow only 1 thread to increment the resource group + # count. + # + _rg_count_lock = RLock() + _rg_count = 0 + + def _initialize(self, environment: Environment, variables: Dict[str, Any], lisa_working_path: str, lisa_log_path: str, lisa_log: Logger): + """ + Initializes the AgentTestSuite from the data passed as arguments by LISA. + + NOTE: All the interface with LISA should be confined to this method. The rest of the test code should not have any dependencies on LISA. + """ + self._working_directory = self._get_working_directory(lisa_working_path) + self._log_path = self._get_log_path(variables, lisa_log_path) + self._test_agent_package_path = self._working_directory/"eggs"/f"WALinuxAgent-{AGENT_VERSION}.zip" + self._test_source_directory = Path(tests_e2e.__path__[0]) + self._test_tools_tarball_path = self._working_directory/"waagent-tools.tar" + self._pypy_x64_path = Path("/tmp/pypy3.7-x64.tar.bz2") + self._pypy_arm64_path = Path("/tmp/pypy3.7-arm64.tar.bz2") + + self._runbook_name = variables["name"] + + self._lisa_log = lisa_log + + self._lisa_environment_name = environment.name + self._environment_name = variables["c_env_name"] + + self._test_suites = variables["c_test_suites"] + + self._cloud = variables["cloud"] + self._subscription_id = variables["subscription_id"] + self._location = variables["c_location"] + self._image = variables["c_image"] + + self._is_vhd = variables["c_is_vhd"] + + self._user = variables["user"] + self._identity_file = variables["identity_file"] + + self._allow_ssh = variables["allow_ssh"] + + self._skip_setup = variables["skip_setup"] + self._keep_environment = variables["keep_environment"] + self._collect_logs = variables["collect_logs"] + + # The AgentTestSuiteCombinator can create 4 kinds of platform/environment combinations: + # + # * New VM + # The VM is created by LISA. The platform will be 'azure' and the environment will contain a single 'remote' node. + # + # * Existing VM + # The VM was passed as argument to the runbook. The platform will be 'ready' and the environment will contain a single 'remote' node. + # + # * New VMSS + # The AgentTestSuite will create the scale set before executing the tests. The platform will be 'ready' and the environment will a single 'local' node. + # + # * Existing VMSS + # The VMSS was passed as argument to the runbook. The platform will be 'ready' and the environment will contain a list of 'remote' nodes, + # one for each instance of the scale set. + # + + # Note that _vm_name and _vmss_name are mutually exclusive, only one of them will be set. + self._vm_name = None + self._vm_ip_address = None + self._vmss_name = None + self._create_scale_set = False + self._delete_scale_set = False + + if isinstance(environment.nodes[0], LocalNode): + # We need to create a new VMSS. + # Use the same naming convention as LISA for the scale set name: lisa----n0, + # except that, for the "rg_name", LISA uses "e" as prefix (e.g. "e0", "e1", etc.), while we use "w" (for + # WALinuxAgent, e.g. "w0", "w1", etc.) to avoid name collisions. Also, note that we hardcode the scale set name + # to "n0" since we are creating a single scale set. Lastly, the resource group name cannot have any uppercase + # characters, because the publicIP cannot have uppercase characters in its domain name label. + AgentTestSuite._rg_count_lock.acquire() + try: + self._resource_group_name = f"lisa-{self._runbook_name.lower()}-{RUN_ID}-w{AgentTestSuite._rg_count}" + AgentTestSuite._rg_count += 1 + finally: + AgentTestSuite._rg_count_lock.release() + self._vmss_name = f"{self._resource_group_name}-n0" + self._test_nodes = [] # we'll fill this up when the scale set is created + self._create_scale_set = True + self._delete_scale_set = False # we set it to True once we create the scale set + else: + # Else we are using a VM that was created by LISA, or an existing VM/VMSS + node_context = get_node_context(environment.nodes[0]) + + if isinstance(environment.nodes[0].features._platform, AzurePlatform): # The test VM was created by LISA + self._resource_group_name = node_context.resource_group_name + self._vm_name = node_context.vm_name + self._vm_ip_address = environment.nodes[0].connection_info['address'] + self._test_nodes = [_TestNode(self._vm_name, self._vm_ip_address)] + else: # An existing VM/VMSS was passed as argument to the runbook + self._resource_group_name = variables["resource_group_name"] + if variables["vm_name"] != "": + self._vm_name = variables["vm_name"] + self._vm_ip_address = environment.nodes[0].connection_info['address'] + self._test_nodes = [_TestNode(self._vm_name, self._vm_ip_address)] + else: + self._vmss_name = variables["vmss_name"] + self._test_nodes = [_TestNode(node.name, node.connection_info['address']) for node in environment.nodes.list()] @staticmethod def _get_log_path(variables: Dict[str, Any], lisa_log_path: str) -> Path: @@ -189,28 +300,36 @@ def _get_log_path(variables: Dict[str, Any], lisa_log_path: str) -> Path: def _get_working_directory(lisa_working_path: str) -> Path: # LISA's "working_path" has a value similar to # "<--working_path>/20230322/20230322-194430-287/tests/20230322-194451-333-agent_test_suite - # where "<--working_path>" is the value given to the --working_path command line argument. Create the working for + # where "<--working_path>" is the value given to the --working_path command line argument. Create the working directory for # the AgentTestSuite as # "<--working_path>/20230322/20230322-194430-287/waagent # This directory will be unique for each execution of the runbook ("20230322-194430" is the timestamp and "287" is a # unique ID per execution) - return Path(lisa_working_path).parent.parent / "waagent" - - @property - def context(self): - if self.__context is None: - raise Exception("The context for the AgentTestSuite has not been initialized") - return self.__context + return Path(lisa_working_path).parent.parent/"waagent" # # Test suites within the same runbook may be executed concurrently, and setup needs to be done only once. - # We use this lock to allow only 1 thread to do the setup. Setup completion is marked using the 'completed' + # We use these locks to allow only 1 thread to do the setup. Setup completion is marked using the 'completed' # file: the thread doing the setup creates the file and threads that find that the file already exists # simply skip setup. # + _working_directory_lock = RLock() _setup_lock = RLock() - def _setup(self) -> None: + def _create_working_directory(self) -> None: + """ + Creates the working directory for the test suite. + """ + self._working_directory_lock.acquire() + + try: + if not self._working_directory.exists(): + log.info("Creating working directory: %s", self._working_directory) + self._working_directory.mkdir(parents=True) + finally: + self._working_directory_lock.release() + + def _setup_test_run(self) -> None: """ Prepares the test suite for execution (currently, it just builds the agent package) @@ -219,20 +338,56 @@ def _setup(self) -> None: self._setup_lock.acquire() try: - log.info("") - log.info("**************************************** [Build] ****************************************") - log.info("") - completed: Path = self.context.working_directory/"completed" + completed: Path = self._working_directory / "completed" if completed.exists(): log.info("Found %s. Build has already been done, skipping.", completed) return - self.context.lisa_log.info("Building test agent") - log.info("Creating working directory: %s", self.context.working_directory) - self.context.working_directory.mkdir(parents=True) + log.info("") + log.info("********************************** [Preparing Test Run] **********************************") + log.info("") - self._build_agent_package() + self._lisa_log.info("Building agent package to %s", self._test_agent_package_path) + log.info("Building agent package to %s", self._test_agent_package_path) + makepkg.run(agent_family="Test", output_directory=str(self._working_directory), log=log) + if not self._test_agent_package_path.exists(): # the target path is created by makepkg, ensure we are using the correct value + raise Exception(f"The test Agent package was not created at the expected path {self._test_agent_package_path}") + + # + # Ensure that Pypy (both x64 and ARM) has been downloaded to the local machine; it is pre-downloaded to /tmp on + # the container image used for Azure Pipelines runs, but for developer runs it may need to be downloaded. + # + for pypy in [self._pypy_x64_path, self._pypy_arm64_path]: + if pypy.exists(): + log.info("Found Pypy at %s", pypy) + else: + pypy_download = f"https://dcrdata.blob.core.windows.net/python/{pypy.name}" + self._lisa_log.info("Downloading %s to %s", pypy_download, pypy) + log.info("Downloading %s to %s", pypy_download, pypy) + run_command(["wget", pypy_download, "-O", pypy]) + + # + # Create a tarball with the tools we need to copy to the test node. The tarball includes two directories: + # + # * bin - Executables file (Bash and Python scripts) + # * lib - Library files (Python modules) + # + self._lisa_log.info("Creating %s with the tools needed on the test node", self._test_tools_tarball_path) + log.info("Creating %s with the tools needed on the test node", self._test_tools_tarball_path) + log.info("Adding orchestrator/scripts") + command = "cd {0} ; tar cf {1} --transform='s,^,bin/,' *".format(self._test_source_directory/"orchestrator"/"scripts", self._test_tools_tarball_path) + log.info("%s", command) + run_command(command, shell=True) + log.info("Adding tests/scripts") + command = "cd {0} ; tar rf {1} --transform='s,^,bin/,' *".format(self._test_source_directory/"tests"/"scripts", self._test_tools_tarball_path) + log.info("%s", command) + run_command(command, shell=True) + log.info("Adding tests/lib") + command = "cd {0} ; tar rf {1} --transform='s,^,lib/,' --exclude=__pycache__ tests_e2e/tests/lib".format(self._test_source_directory.parent, self._test_tools_tarball_path) + log.info("%s", command) + run_command(command, shell=True) + log.info("Contents of %s:\n%s", self._test_tools_tarball_path, run_command(['tar', 'tvf', str(self._test_tools_tarball_path)])) log.info("Completed setup, creating %s", completed) completed.touch() @@ -240,135 +395,135 @@ def _setup(self) -> None: finally: self._setup_lock.release() - def _build_agent_package(self) -> None: + def _clean_up(self, success: bool) -> None: """ - Builds the agent package and returns the path to the package. + Cleans up any items created by the test suite run. """ - log.info("Building agent package to %s", self.context.working_directory) - - makepkg.run(agent_family="Test", output_directory=str(self.context.working_directory), log=log) - - package_path: Path = self._get_agent_package_path() - if not package_path.exists(): - raise Exception(f"Can't find the agent package at {package_path}") - - log.info("Built agent package as %s", package_path) - - def _get_agent_package_path(self) -> Path: - """ - Returns the path to the agent package. - """ - return self.context.working_directory/"eggs"/f"WALinuxAgent-{AGENT_VERSION}.zip" + if self._delete_scale_set: + if self._keep_environment == KeepEnvironment.Always: + log.info("Won't delete the scale set %s, per the test suite configuration.", self._vmss_name) + elif self._keep_environment == KeepEnvironment.No or self._keep_environment == KeepEnvironment.Failed and success: + try: + self._lisa_log.info("Deleting resource group containing the test VMSS: %s", self._resource_group_name) + resource_group = ResourceGroupClient(cloud=self._cloud, location=self._location, subscription=self._subscription_id, name=self._resource_group_name) + resource_group.delete() + except Exception as error: # pylint: disable=broad-except + log.warning("Error deleting resource group %s: %s", self._resource_group_name, error) - def _clean_up(self) -> None: + def _setup_test_nodes(self) -> None: """ - Cleans up any leftovers from the test suite run. Currently just an empty placeholder for future use. + Prepares the test nodes for execution of the test suite (installs tools and the test agent, etc) """ + install_test_agent = self._test_suites[0].install_test_agent # All suites in the environment have the same value for install_test_agent - def _setup_node(self) -> None: - """ - Prepares the remote node for executing the test suite (installs tools and the test agent, etc) - """ - self.context.lisa_log.info("Setting up test node") - log.info("") - log.info("************************************** [Node Setup] **************************************") log.info("") - log.info("Test Node: %s", self.context.vm.name) - log.info("IP Address: %s", self.context.vm_ip_address) - log.info("Resource Group: %s", self.context.vm.resource_group) + log.info("************************************ [Test Nodes Setup] ************************************") log.info("") + for node in self._test_nodes: + self._lisa_log.info(f"Setting up test node {node}") + log.info("Test Node: %s", node.name) + log.info("IP Address: %s", node.ip_address) + log.info("") - # - # Ensure that the correct version (x84 vs ARM64) Pypy has been downloaded; it is pre-downloaded to /tmp on the container image - # used for Azure Pipelines runs, but for developer runs it may need to be downloaded. - # - if self.context.ssh_client.get_architecture() == "aarch64": - pypy_path = Path("/tmp/pypy3.7-arm64.tar.bz2") - pypy_download = "https://downloads.python.org/pypy/pypy3.7-v7.3.5-aarch64.tar.bz2" - else: - pypy_path = Path("/tmp/pypy3.7-x64.tar.bz2") - pypy_download = "https://downloads.python.org/pypy/pypy3.7-v7.3.5-linux64.tar.bz2" - if pypy_path.exists(): - log.info("Found Pypy at %s", pypy_path) - else: - log.info("Downloading %s to %s", pypy_download, pypy_path) - run_command(["wget", pypy_download, "-O", pypy_path]) - - # - # Create a tarball with the files we need to copy to the test node. The tarball includes two directories: - # - # * bin - Executables file (Bash and Python scripts) - # * lib - Library files (Python modules) - # - # After extracting the tarball on the test node, 'bin' will be added to PATH and PYTHONPATH will be set to 'lib'. - # - # Note that executables are placed directly under 'bin', while the path for Python modules is preserved under 'lib. - # - tarball_path: Path = Path("/tmp/waagent.tar") - log.info("Creating %s with the files need on the test node", tarball_path) - log.info("Adding orchestrator/scripts") - run_command(['tar', 'cvf', str(tarball_path), '--transform=s,.*/,bin/,', '-C', str(self.context.test_source_directory/"orchestrator"/"scripts"), '.']) - # log.info("Adding tests/scripts") - # run_command(['tar', 'rvf', str(tarball_path), '--transform=s,.*/,bin/,', '-C', str(self.context.test_source_directory/"tests"/"scripts"), '.']) - log.info("Adding tests/lib") - run_command(['tar', 'rvf', str(tarball_path), '--transform=s,^,lib/,', '-C', str(self.context.test_source_directory.parent), '--exclude=__pycache__', 'tests_e2e/tests/lib']) - log.info("Contents of %s:\n\n%s", tarball_path, run_command(['tar', 'tvf', str(tarball_path)])) - - # - # Cleanup the test node (useful for developer runs) - # - log.info('Preparing the test node for setup') - # Note that removing lib requires sudo, since a Python cache may have been created by tests using sudo - self.context.ssh_client.run_command("rm -rvf ~/{bin,lib,tmp}", use_sudo=True) - - # - # Copy the tarball, Pypy and the test Agent to the test node - # - target_path = Path("~")/"tmp" - self.context.ssh_client.run_command(f"mkdir {target_path}") - log.info("Copying %s to %s:%s", tarball_path, self.context.node.name, target_path) - self.context.ssh_client.copy_to_node(tarball_path, target_path) - log.info("Copying %s to %s:%s", pypy_path, self.context.node.name, target_path) - self.context.ssh_client.copy_to_node(pypy_path, target_path) - agent_package_path: Path = self._get_agent_package_path() - log.info("Copying %s to %s:%s", agent_package_path, self.context.node.name, target_path) - self.context.ssh_client.copy_to_node(agent_package_path, target_path) - - # - # Extract the tarball and execute the install scripts - # - log.info('Installing tools on the test node') - command = f"tar xf {target_path/tarball_path.name} && ~/bin/install-tools" - log.info("%s\n%s", command, self.context.ssh_client.run_command(command)) - - if self.context.is_vhd: - log.info("Using a VHD; will not install the Test Agent.") - else: - log.info("Installing the Test Agent on the test node") - command = f"install-agent --package ~/tmp/{agent_package_path.name} --version {AGENT_VERSION}" - log.info("%s\n%s", command, self.context.ssh_client.run_command(command, use_sudo=True)) - - log.info("Completed test node setup") + ssh_client = SshClient(ip_address=node.ip_address, username=self._user, identity_file=Path(self._identity_file)) + + self._check_ssh_connectivity(ssh_client) + + # + # Cleanup the test node (useful for developer runs) + # + log.info('Preparing the test node for setup') + # Note that removing lib requires sudo, since a Python cache may have been created by tests using sudo + ssh_client.run_command("rm -rvf ~/{bin,lib,tmp}", use_sudo=True) + + # + # Copy Pypy, the test Agent, and the test tools to the test node + # + ssh_client = SshClient(ip_address=node.ip_address, username=self._user, identity_file=Path(self._identity_file)) + if ssh_client.get_architecture() == "aarch64": + pypy_path = self._pypy_arm64_path + else: + pypy_path = self._pypy_x64_path + target_path = Path("~")/"tmp" + ssh_client.run_command(f"mkdir {target_path}") + log.info("Copying %s to %s:%s", pypy_path, node.name, target_path) + ssh_client.copy_to_node(pypy_path, target_path) + log.info("Copying %s to %s:%s", self._test_agent_package_path, node.name, target_path) + ssh_client.copy_to_node(self._test_agent_package_path, target_path) + log.info("Copying %s to %s:%s", self._test_tools_tarball_path, node.name, target_path) + ssh_client.copy_to_node(self._test_tools_tarball_path, target_path) + + # + # Extract the tarball with the test tools. The tarball includes two directories: + # + # * bin - Executables file (Bash and Python scripts) + # * lib - Library files (Python modules) + # + # After extracting the tarball on the test node, 'bin' will be added to PATH and PYTHONPATH will be set to 'lib'. + # + # Note that executables are placed directly under 'bin', while the path for Python modules is preserved under 'lib. + # + log.info('Installing tools on the test node') + command = f"tar xvf {target_path/self._test_tools_tarball_path.name} && ~/bin/install-tools" + log.info("Remote command [%s] completed:\n%s", command, ssh_client.run_command(command)) + + if self._is_vhd: + log.info("Using a VHD; will not install the Test Agent.") + elif not install_test_agent: + log.info("Will not install the Test Agent per the test suite configuration.") + else: + log.info("Installing the Test Agent on the test node") + command = f"install-agent --package ~/tmp/{self._test_agent_package_path.name} --version {AGENT_VERSION}" + log.info("%s\n%s", command, ssh_client.run_command(command, use_sudo=True)) + + log.info("Completed test node setup") - def _collect_node_logs(self) -> None: + @staticmethod + def _check_ssh_connectivity(ssh_client: SshClient) -> None: + # We may be trying to connect to the test node while it is still booting. Execute a simple command to check that SSH is ready, + # and raise an exception if it is not after a few attempts. + max_attempts = 5 + for attempt in range(max_attempts): + try: + log.info("Checking SSH connectivity to the test node...") + ssh_client.run_command("echo 'SSH connectivity check'") + log.info("SSH is ready.") + break + except CommandError as error: + # Check for "System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8)." + if not any(m in error.stderr for m in ["Unprivileged users are not permitted to log in yet", "Permission denied"]): + raise + if attempt >= max_attempts - 1: + raise Exception(f"SSH connectivity check failed after {max_attempts} attempts, giving up [{error}]") + log.info("SSH is not ready [%s], will retry after a short delay.", error) + time.sleep(15) + + def _collect_logs_from_test_nodes(self) -> None: """ - Collects the test logs from the remote machine and copies them to the local machine + Collects the test logs from the test nodes and copies them to the local machine """ - try: - # Collect the logs on the test machine into a compressed tarball - self.context.lisa_log.info("Collecting logs on test node") - log.info("Collecting logs on test node") - stdout = self.context.ssh_client.run_command("collect-logs", use_sudo=True) - log.info(stdout) - - # Copy the tarball to the local logs directory - remote_path = "/tmp/waagent-logs.tgz" - local_path = self.context.log_path/'{0}.tgz'.format(self.context.environment_name) - log.info("Copying %s:%s to %s", self.context.node.name, remote_path, local_path) - self.context.ssh_client.copy_from_node(remote_path, local_path) - - except: # pylint: disable=bare-except - log.exception("Failed to collect logs from the test machine") + for node in self._test_nodes: + node_name = node.name + ssh_client = SshClient(ip_address=node.ip_address, username=self._user, identity_file=Path(self._identity_file)) + try: + # Collect the logs on the test machine into a compressed tarball + self._lisa_log.info("Collecting logs on test node %s", node_name) + log.info("Collecting logs on test node %s", node_name) + stdout = ssh_client.run_command("collect-logs", use_sudo=True) + log.info(stdout) + + # Copy the tarball to the local logs directory + tgz_name = self._environment_name + if len(self._test_nodes) > 1: + # Append instance of scale set to the end of tarball name + tgz_name += '_' + node_name.split('_')[-1] + remote_path = "/tmp/waagent-logs.tgz" + local_path = self._log_path / '{0}.tgz'.format(tgz_name) + log.info("Copying %s:%s to %s", node_name, remote_path, local_path) + ssh_client.copy_from_node(remote_path, local_path) + + except: # pylint: disable=bare-except + log.exception("Failed to collect logs from the test machine") # NOTES: # @@ -380,157 +535,198 @@ def _collect_node_logs(self) -> None: # # W0621: Redefining name 'log' from outer scope (line 53) (redefined-outer-name) @TestCaseMetadata(description="", priority=0, requirement=simple_requirement(environment_status=EnvironmentStatus.Deployed)) - def main(self, node: Node, environment: Environment, variables: Dict[str, Any], working_path: str, log_path: str, log: Logger): # pylint: disable=redefined-outer-name + def main(self, environment: Environment, variables: Dict[str, Any], working_path: str, log_path: str, log: Logger): # pylint: disable=redefined-outer-name """ Entry point from LISA """ - self._initialize(node, variables, working_path, log_path, log) - self._execute(environment, variables) + self._initialize(environment, variables, working_path, log_path, log) + self._execute() + + def _execute(self) -> None: + unexpected_error = False + test_suite_success = True - def _execute(self, environment: Environment, variables: Dict[str, Any]): - """ - Executes each of the AgentTests included in the "c_test_suites" variable (which is generated by the AgentTestSuitesCombinator). - """ # Set the thread name to the name of the environment. The thread name is added to each item in LISA's log. - with _set_thread_name(self.context.environment_name): - log_path: Path = self.context.log_path/f"env-{self.context.environment_name}.log" + with set_thread_name(self._environment_name): + log_path: Path = self._log_path / f"env-{self._environment_name}.log" with set_current_thread_log(log_path): start_time: datetime.datetime = datetime.datetime.now() - success = True + failed_cases = [] try: # Log the environment's name and the variables received from the runbook (note that we need to expand the names of the test suites) - log.info("LISA Environment (for correlation with the LISA log): %s", environment.name) - log.info("Runbook variables:") - for name, value in variables.items(): - log.info(" %s: %s", name, value if name != 'c_test_suites' else [t.name for t in value]) + log.info("LISA Environment (for correlation with the LISA log): %s", self._lisa_environment_name) + log.info("Test suites: %s", [t.name for t in self._test_suites]) - test_suite_success = True + self._create_working_directory() + + if not self._skip_setup: + self._setup_test_run() try: - if not self.context.skip_setup: - self._setup() + test_context = self._create_test_context() - if not self.context.skip_setup: - self._setup_node() + if not self._skip_setup: + try: + self._setup_test_nodes() + except: + test_suite_success = False + raise - # pylint seems to think self.context.test_suites is not iterable. Suppressing warning, since its type is List[AgentTestSuite] - # E1133: Non-iterable value self.context.test_suites is used in an iterating context (not-an-iterable) - for suite in self.context.test_suites: # pylint: disable=E1133 - log.info("Executing test suite %s", suite.name) - self.context.lisa_log.info("Executing Test Suite %s", suite.name) - test_suite_success = self._execute_test_suite(suite) and test_suite_success + check_log_start_time = datetime.datetime.min - test_suite_success = self._check_agent_log() and test_suite_success + for suite in self._test_suites: + log.info("Executing test suite %s", suite.name) + self._lisa_log.info("Executing Test Suite %s", suite.name) + case_success, check_log_start_time = self._execute_test_suite(suite, test_context, check_log_start_time) + test_suite_success = case_success and test_suite_success + if not case_success: + failed_cases.append(suite.name) finally: - collect = self.context.collect_logs - if collect == CollectLogs.Always or collect == CollectLogs.Failed and not test_suite_success: - self._collect_node_logs() + if self._collect_logs == CollectLogs.Always or self._collect_logs == CollectLogs.Failed and not test_suite_success: + self._collect_logs_from_test_nodes() except Exception as e: # pylint: disable=bare-except # Report the error and raise an exception to let LISA know that the test errored out. - success = False + unexpected_error = True log.exception("UNEXPECTED ERROR.") self._report_test_result( - self.context.environment_name, + self._environment_name, "Unexpected Error", TestStatus.FAILED, start_time, message="UNEXPECTED ERROR.", add_exception_stack_trace=True) - raise Exception(f"[{self.context.environment_name}] Unexpected error in AgentTestSuite: {e}") + raise Exception(f"[{self._environment_name}] Unexpected error in AgentTestSuite: {e}") finally: - self._clean_up() - if not success: + self._clean_up(test_suite_success and not unexpected_error) + if unexpected_error: self._mark_log_as_failed() - def _execute_test_suite(self, suite: TestSuiteInfo) -> bool: + # Check if any test failures or unexpected errors occurred. If so, raise an Exception here so that + # lisa marks the environment as failed. Otherwise, lisa would mark this environment as passed and + # clean up regardless of the value of 'keep_environment'. This should be the last thing that + # happens during suite execution. + if not test_suite_success or unexpected_error: + raise TestFailedException(self._environment_name, failed_cases) + + def _execute_test_suite(self, suite: TestSuiteInfo, test_context: AgentTestContext, check_log_start_time: datetime.datetime) -> Tuple[bool, datetime.datetime]: """ - Executes the given test suite and returns True if all the tests in the suite succeeded. + Executes the given test suite and returns a tuple of a bool indicating whether all the tests in the suite succeeded, and the timestamp that should be used + for the next check of the agent log. """ suite_name = suite.name - suite_full_name = f"{suite_name}-{self.context.environment_name}" + suite_full_name = f"{suite_name}-{self._environment_name}" suite_start_time: datetime.datetime = datetime.datetime.now() + check_log_start_time_override = datetime.datetime.max # tests can override the timestamp for the agent log check with the get_ignore_errors_before_timestamp() method - success: bool = True # True if all the tests succeed - - with _set_thread_name(suite_full_name): # The thread name is added to the LISA log - log_path: Path = self.context.log_path/f"{suite_full_name}.log" + with set_thread_name(suite_full_name): # The thread name is added to the LISA log + log_path: Path = self._log_path / f"{suite_full_name}.log" with set_current_thread_log(log_path): + suite_success: bool = True + try: log.info("") log.info("**************************************** %s ****************************************", suite_name) log.info("") summary: List[str] = [] + ignore_error_rules: List[Dict[str, Any]] = [] for test in suite.tests: - test_name = test.__name__ - test_full_name = f"{suite_name}-{test_name}" + test_full_name = f"{suite_name}-{test.name}" test_start_time: datetime.datetime = datetime.datetime.now() - log.info("******** Executing %s", test_name) - self.context.lisa_log.info("Executing test %s", test_full_name) - - try: + log.info("******** Executing %s", test.name) + self._lisa_log.info("Executing test %s", test_full_name) - test(self.context).run() + test_success: bool = True - summary.append(f"[Passed] {test_name}") - log.info("******** [Passed] %s", test_name) - self.context.lisa_log.info("[Passed] %s", test_full_name) + test_instance = test.test_class(test_context) + try: + test_instance.run() + summary.append(f"[Passed] {test.name}") + log.info("******** [Passed] %s", test.name) + self._lisa_log.info("[Passed] %s", test_full_name) self._report_test_result( suite_full_name, - test_name, + test.name, TestStatus.PASSED, test_start_time) except TestSkipped as e: - summary.append(f"[Skipped] {test_name}") - log.info("******** [Skipped] %s: %s", test_name, e) - self.context.lisa_log.info("******** [Skipped] %s", test_full_name) + summary.append(f"[Skipped] {test.name}") + log.info("******** [Skipped] %s: %s", test.name, e) + self._lisa_log.info("******** [Skipped] %s", test_full_name) self._report_test_result( suite_full_name, - test_name, + test.name, TestStatus.SKIPPED, test_start_time, message=str(e)) except AssertionError as e: - success = False - summary.append(f"[Failed] {test_name}") - log.error("******** [Failed] %s: %s", test_name, e) - self.context.lisa_log.error("******** [Failed] %s", test_full_name) + test_success = False + summary.append(f"[Failed] {test.name}") + log.error("******** [Failed] %s: %s", test.name, e) + self._lisa_log.error("******** [Failed] %s", test_full_name) self._report_test_result( suite_full_name, - test_name, + test.name, TestStatus.FAILED, test_start_time, message=str(e)) + except RemoteTestError as e: + test_success = False + summary.append(f"[Failed] {test.name}") + message = f"UNEXPECTED ERROR IN [{e.command}] {e.stderr}\n{e.stdout}" + log.error("******** [Failed] %s: %s", test.name, message) + self._lisa_log.error("******** [Failed] %s", test_full_name) + self._report_test_result( + suite_full_name, + test.name, + TestStatus.FAILED, + test_start_time, + message=str(message)) except: # pylint: disable=bare-except - success = False - summary.append(f"[Error] {test_name}") - log.exception("UNHANDLED EXCEPTION IN %s", test_name) - self.context.lisa_log.exception("UNHANDLED EXCEPTION IN %s", test_full_name) + test_success = False + summary.append(f"[Error] {test.name}") + log.exception("UNEXPECTED ERROR IN %s", test.name) + self._lisa_log.exception("UNEXPECTED ERROR IN %s", test_full_name) self._report_test_result( suite_full_name, - test_name, + test.name, TestStatus.FAILED, test_start_time, - message="Unhandled exception.", + message="Unexpected error.", add_exception_stack_trace=True) log.info("") - log.info("********* [Test Results]") + suite_success = suite_success and test_success + + ignore_error_rules.extend(test_instance.get_ignore_error_rules()) + + # Check if the test is requesting to override the timestamp for the agent log check. + # Note that if multiple tests in the suite provide an override, we'll use the earliest timestamp. + test_check_log_start_time = test_instance.get_ignore_errors_before_timestamp() + if test_check_log_start_time != datetime.datetime.min: + check_log_start_time_override = min(check_log_start_time_override, test_check_log_start_time) + + if not test_success and test.blocks_suite: + log.warning("%s failed and blocks the suite. Stopping suite execution.", test.name) + break + + log.info("") + log.info("******** [Test Results]") log.info("") for r in summary: log.info("\t%s", r) log.info("") except: # pylint: disable=bare-except - success = False + suite_success = False self._report_test_result( suite_full_name, suite_name, @@ -539,66 +735,112 @@ def _execute_test_suite(self, suite: TestSuiteInfo) -> bool: message=f"Unhandled exception while executing test suite {suite_name}.", add_exception_stack_trace=True) finally: - if not success: + if not suite_success: self._mark_log_as_failed() - return success + next_check_log_start_time = datetime.datetime.utcnow() + suite_success = suite_success and self._check_agent_log_on_test_nodes(ignore_error_rules, check_log_start_time_override if check_log_start_time_override != datetime.datetime.max else check_log_start_time) - def _check_agent_log(self) -> bool: + return suite_success, next_check_log_start_time + + def _check_agent_log_on_test_nodes(self, ignore_error_rules: List[Dict[str, Any]], check_log_start_time: datetime.datetime) -> bool: """ - Checks the agent log for errors; returns true on success (no errors int the log) + Checks the agent log on the test nodes for errors; returns true on success (no errors in the logs) """ - start_time: datetime.datetime = datetime.datetime.now() + success: bool = True + + for node in self._test_nodes: + node_name = node.name + ssh_client = SshClient(ip_address=node.ip_address, username=self._user, identity_file=Path(self._identity_file)) + + test_result_name = self._environment_name + if len(self._test_nodes) > 1: + # If there are multiple test nodes, as in a scale set, append the name of the node to the name of the result + test_result_name += '_' + node_name.split('_')[-1] + + start_time: datetime.datetime = datetime.datetime.now() + + try: + message = f"Checking agent log on test node {node_name}, starting at {check_log_start_time.strftime('%Y-%m-%dT%H:%M:%S.%fZ')}" + self._lisa_log.info(message) + log.info(message) + + output = ssh_client.run_command("check-agent-log.py -j") + errors = json.loads(output, object_hook=AgentLogRecord.from_dictionary) + + # Filter out errors that occurred before the starting timestamp or that match an ignore rule + errors = [e for e in errors if e.timestamp >= check_log_start_time and (len(ignore_error_rules) == 0 or not AgentLog.matches_ignore_rule(e, ignore_error_rules))] + + if len(errors) == 0: + # If no errors, we are done; don't create a log or test result. + log.info("There are no errors in the agent log") + else: + message = f"Detected {len(errors)} error(s) in the agent log on {node_name}" + self._lisa_log.error(message) + log.error("%s:\n\n%s\n", message, '\n'.join(['\t\t' + e.text.replace('\n', '\n\t\t') for e in errors])) + self._mark_log_as_failed() + success = False - try: - self.context.lisa_log.info("Checking agent log on the test node") - log.info("Checking agent log on the test node") - - output = self.context.ssh_client.run_command("check-agent-log.py -j") - errors = json.loads(output, object_hook=AgentLogRecord.from_dictionary) - - # Individual tests may have rules to ignore known errors; filter those out - ignore_error_rules = [] - # pylint seems to think self.context.test_suites is not iterable. Suppressing warning, since its type is List[AgentTestSuite] - # E1133: Non-iterable value self.context.test_suites is used in an iterating context (not-an-iterable) - for suite in self.context.test_suites: # pylint: disable=E1133 - for test in suite.tests: - ignore_error_rules.extend(test(self.context).get_ignore_error_rules()) - - if len(ignore_error_rules) > 0: - new = [] - for e in errors: - if not AgentLog.matches_ignore_rule(e, ignore_error_rules): - new.append(e) - errors = new - - if len(errors) == 0: - # If no errors, we are done; don't create a log or test result. - log.info("There are no errors in the agent log") - return True - - message = f"Detected {len(errors)} error(s) in the agent log" - self.context.lisa_log.error(message) - log.error("%s:\n\n%s\n", message, '\n'.join(['\t\t' + e.text.replace('\n', '\n\t\t') for e in errors])) - self._mark_log_as_failed() - - self._report_test_result( - self.context.environment_name, - "CheckAgentLog", - TestStatus.FAILED, - start_time, - message=message + ' - First few errors:\n' + '\n'.join([e.text for e in errors[0:3]])) - except: # pylint: disable=bare-except - log.exception("Error checking agent log") - self._report_test_result( - self.context.environment_name, - "CheckAgentLog", - TestStatus.FAILED, - start_time, - "Error checking agent log", - add_exception_stack_trace=True) - - return False + self._report_test_result( + test_result_name, + "CheckAgentLog", + TestStatus.FAILED, + start_time, + message=message + ' - First few errors:\n' + '\n'.join([e.text for e in errors[0:3]])) + except: # pylint: disable=bare-except + log.exception("Error checking agent log on %s", node_name) + success = False + self._report_test_result( + test_result_name, + "CheckAgentLog", + TestStatus.FAILED, + start_time, + "Error checking agent log", + add_exception_stack_trace=True) + + return success + + def _create_test_context(self,) -> AgentTestContext: + """ + Creates the context for the test run. + """ + if self._vm_name is not None: + self._lisa_log.info("Creating test context for virtual machine") + vm: VirtualMachineClient = VirtualMachineClient( + cloud=self._cloud, + location=self._location, + subscription=self._subscription_id, + resource_group=self._resource_group_name, + name=self._vm_name) + return AgentVmTestContext( + working_directory=self._working_directory, + vm=vm, + ip_address=self._vm_ip_address, + username=self._user, + identity_file=self._identity_file) + else: + log.info("Creating test context for scale set") + if self._create_scale_set: + self._create_test_scale_set() + else: + log.info("Using existing scale set %s", self._vmss_name) + + scale_set = VirtualMachineScaleSetClient( + cloud=self._cloud, + location=self._location, + subscription=self._subscription_id, + resource_group=self._resource_group_name, + name=self._vmss_name) + + # If we created the scale set, fill up the test nodes + if self._create_scale_set: + self._test_nodes = [_TestNode(name=i.instance_name, ip_address=i.ip_address) for i in scale_set.get_instances_ip_address()] + + return AgentVmssTestContext( + working_directory=self._working_directory, + vmss=scale_set, + username=self._user, + identity_file=self._identity_file) @staticmethod def _mark_log_as_failed(): @@ -642,4 +884,56 @@ def _report_test_result( notifier.notify(msg) + def _create_test_scale_set(self) -> None: + """ + Creates a scale set for the test run + """ + self._lisa_log.info("Creating resource group %s", self._resource_group_name) + resource_group = ResourceGroupClient(cloud=self._cloud, location=self._location, subscription=self._subscription_id, name=self._resource_group_name) + resource_group.create() + self._delete_scale_set = True + + self._lisa_log.info("Creating scale set %s", self._vmss_name) + log.info("Creating scale set %s", self._vmss_name) + template, parameters = self._get_scale_set_deployment_template(self._vmss_name) + resource_group.deploy_template(template, parameters) + + def _get_scale_set_deployment_template(self, scale_set_name: str) -> Tuple[Dict[str, Any], Dict[str, Any]]: + """ + Returns the deployment template for scale sets and its parameters + """ + def read_file(path: str) -> str: + with open(path, "r") as file_: + return file_.read().strip() + + publisher, offer, sku, version = self._image.replace(":", " ").split(' ') + + template: Dict[str, Any] = json.loads(read_file(str(self._test_source_directory/"orchestrator"/"templates/vmss.json"))) + + # Scale sets for some images need to be deployed with 'plan' property + plan_required_images = ["almalinux", "kinvolk", "erockyenterprisesoftwarefoundationinc1653071250513"] + if publisher in plan_required_images: + resources: List[Dict[str, Any]] = template.get('resources') + for resource in resources: + if resource.get('type') == "Microsoft.Compute/virtualMachineScaleSets": + resource["plan"] = { + "name": "[parameters('sku')]", + "product": "[parameters('offer')]", + "publisher": "[parameters('publisher')]" + } + + if self._allow_ssh != '': + NetworkSecurityRule(template, is_lisa_template=False).add_allow_ssh_rule(self._allow_ssh) + + return template, { + "username": {"value": self._user}, + "sshPublicKey": {"value": read_file(f"{self._identity_file}.pub")}, + "vmName": {"value": scale_set_name}, + "publisher": {"value": publisher}, + "offer": {"value": offer}, + "sku": {"value": sku}, + "version": {"value": version} + } + + diff --git a/tests_e2e/orchestrator/lib/agent_test_suite_combinator.py b/tests_e2e/orchestrator/lib/agent_test_suite_combinator.py index 28fca0fad..ffecaf363 100644 --- a/tests_e2e/orchestrator/lib/agent_test_suite_combinator.py +++ b/tests_e2e/orchestrator/lib/agent_test_suite_combinator.py @@ -1,8 +1,12 @@ # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. +import datetime import logging +import random import re +import traceback import urllib.parse +import uuid from dataclasses import dataclass, field from typing import Any, Dict, List, Optional, Type @@ -13,74 +17,80 @@ # Disable those warnings, since 'lisa' is an external, non-standard, dependency # E0401: Unable to import 'lisa' (import-error) # etc -from lisa import schema # pylint: disable=E0401 +from lisa import notifier, schema # pylint: disable=E0401 from lisa.combinator import Combinator # pylint: disable=E0401 +from lisa.messages import TestStatus, TestResultMessage # pylint: disable=E0401 from lisa.util import field_metadata # pylint: disable=E0401 -from tests_e2e.orchestrator.lib.agent_test_loader import AgentTestLoader, VmImageInfo +from tests_e2e.orchestrator.lib.agent_test_loader import AgentTestLoader, VmImageInfo, TestSuiteInfo +from tests_e2e.tests.lib.logging import set_thread_name +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient +from tests_e2e.tests.lib.virtual_machine_scale_set_client import VirtualMachineScaleSetClient @dataclass_json() @dataclass class AgentTestSuitesCombinatorSchema(schema.Combinator): - test_suites: str = field( - default_factory=str, metadata=field_metadata(required=True) - ) - cloud: str = field( - default_factory=str, metadata=field_metadata(required=True) - ) - location: str = field( - default_factory=str, metadata=field_metadata(required=True) - ) - image: str = field( - default_factory=str, metadata=field_metadata(required=False) - ) - vm_size: str = field( - default_factory=str, metadata=field_metadata(required=False) - ) - vm_name: str = field( - default_factory=str, metadata=field_metadata(required=False) - ) + """ + Defines the parameters passed to the combinator from the runbook. + + The runbook is a static document and always passes all these parameters to the combinator, so they are all + marked as required. Optional parameters can pass an empty value to indicate that they are not specified. + """ + allow_ssh: str = field(default_factory=str, metadata=field_metadata(required=True)) + cloud: str = field(default_factory=str, metadata=field_metadata(required=True)) + identity_file: str = field(default_factory=str, metadata=field_metadata(required=True)) + image: str = field(default_factory=str, metadata=field_metadata(required=True)) + keep_environment: str = field(default_factory=str, metadata=field_metadata(required=True)) + location: str = field(default_factory=str, metadata=field_metadata(required=True)) + resource_group_name: str = field(default_factory=str, metadata=field_metadata(required=True)) + subscription_id: str = field(default_factory=str, metadata=field_metadata(required=True)) + test_suites: str = field(default_factory=str, metadata=field_metadata(required=True)) + user: str = field(default_factory=str, metadata=field_metadata(required=True)) + vm_name: str = field(default_factory=str, metadata=field_metadata(required=True)) + vm_size: str = field(default_factory=str, metadata=field_metadata(required=True)) + vmss_name: str = field(default_factory=str, metadata=field_metadata(required=True)) class AgentTestSuitesCombinator(Combinator): """ - The "agent_test_suites" combinator returns a list of variables that specify the environments (i.e. test VMs) that the agent - test suites must be executed on: - - * c_env_name: Unique name for the environment, e.g. "0001-com-ubuntu-server-focal-20_04-lts-westus2" - * c_marketplace_image: e.g. "Canonical UbuntuServer 18.04-LTS latest", - * c_location: e.g. "westus2", - * c_vm_size: e.g. "Standard_D2pls_v5" - * c_vhd: e.g "https://rhel.blob.core.windows.net/images/RHEL_8_Standard-8.3.202006170423.vhd?se=..." - * c_test_suites: e.g. [AgentBvt, FastTrack] - - (c_marketplace_image, c_location, c_vm_size) and vhd are mutually exclusive and define the environment (i.e. the test VM) - in which the test will be executed. c_test_suites defines the test suites that should be executed in that - environment. - - The 'vm_name' runbook parameter can be used to execute the test suites on an existing VM. In that case, the combinator - generates a single item with these variables: - - * c_env_name: Name for the environment, same as vm_name - * c_vm_name: Name of the test VM - * c_location: Location of the test VM e.g. "westus2", - * c_test_suites: e.g. [AgentBvt, FastTrack] + The "agent_test_suites" combinator returns a list of variables that specify the test environments (i.e. test VMs) that the + test suites must be executed on. These variables are prefixed with "c_" to distinguish them from the command line arguments + of the runbook. See the runbook definition for details on each of those variables. + + The combinator can generate environments for VMs created and managed by LISA, Scale Sets created and managed by the AgentTestSuite, + or existing VMs or Scale Sets. """ def __init__(self, runbook: AgentTestSuitesCombinatorSchema) -> None: super().__init__(runbook) if self.runbook.cloud not in self._DEFAULT_LOCATIONS: raise Exception(f"Invalid cloud: {self.runbook.cloud}") - if self.runbook.vm_name != '' and (self.runbook.image != '' or self.runbook.vm_size != ''): - raise Exception("Invalid runbook parameters: When 'vm_name' is specified, 'image' and 'vm_size' should not be specified.") + if self.runbook.vm_name != '' and self.runbook.vmss_name != '': + raise Exception("Invalid runbook parameters: 'vm_name' and 'vmss_name' are mutually exclusive.") if self.runbook.vm_name != '': - self._environments = self.create_environment_for_existing_vm() - else: - self._environments = self.create_environment_list() - self._index = 0 - + if self.runbook.image != '' or self.runbook.vm_size != '': + raise Exception("Invalid runbook parameters: The 'vm_name' parameter indicates an existing VM, 'image' and 'vm_size' should not be specified.") + if self.runbook.resource_group_name == '': + raise Exception("Invalid runbook parameters: The 'vm_name' parameter indicates an existing VM, a 'resource_group_name' must be specified.") + + if self.runbook.vmss_name != '': + if self.runbook.image != '' or self.runbook.vm_size != '': + raise Exception("Invalid runbook parameters: The 'vmss_name' parameter indicates an existing VMSS, 'image' and 'vm_size' should not be specified.") + if self.runbook.resource_group_name == '': + raise Exception("Invalid runbook parameters: The 'vmss_name' parameter indicates an existing VMSS, a 'resource_group_name' must be specified.") + + self._log: logging.Logger = logging.getLogger("lisa") + + with set_thread_name("AgentTestSuitesCombinator"): + if self.runbook.vm_name != '': + self._environments = [self.create_existing_vm_environment()] + elif self.runbook.vmss_name != '': + self._environments = [self.create_existing_vmss_environment()] + else: + self._environments = self.create_environment_list() + self._index = 0 @classmethod def type_name(cls) -> str: @@ -98,142 +108,444 @@ def _next(self) -> Optional[Dict[str, Any]]: return result _DEFAULT_LOCATIONS = { - "china": "china north 2", - "government": "usgovarizona", - "public": "westus2" + "AzureCloud": "westus2", + "AzureChinaCloud": "chinanorth2", + "AzureUSGovernment": "usgovarizona", } - def create_environment_for_existing_vm(self) -> List[Dict[str, Any]]: - loader = AgentTestLoader(self.runbook.test_suites) - - environment: List[Dict[str, Any]] = [ - { - "c_env_name": self.runbook.vm_name, - "c_vm_name": self.runbook.vm_name, - "c_location": self.runbook.location, - "c_test_suites": loader.test_suites, - } - ] - - log: logging.Logger = logging.getLogger("lisa") - log.info("******** Environment for existing VMs *****") - log.info( - "{ c_env_name: '%s', c_vm_name: '%s', c_location: '%s', c_test_suites: '%s' }", - environment[0]['c_env_name'], environment[0]['c_vm_name'], environment[0]['c_location'], [s.name for s in environment[0]['c_test_suites']]) - log.info("***************************") + _MARKETPLACE_IMAGE_INFORMATION_LOCATIONS = { + "AzureCloud": "", # empty indicates the default location used by LISA + "AzureChinaCloud": "chinanorth2", + "AzureUSGovernment": "usgovarizona", + } - return environment + _SHARED_RESOURCE_GROUP_LOCATIONS = { + "AzureCloud": "", # empty indicates the default location used by LISA + "AzureChinaCloud": "chinanorth2", + "AzureUSGovernment": "usgovarizona", + } def create_environment_list(self) -> List[Dict[str, Any]]: - loader = AgentTestLoader(self.runbook.test_suites) - - # - # If the runbook provides any of 'image', 'location', or 'vm_size', those values - # override any configuration values on the test suite. - # - # Check 'images' first and add them to 'runbook_images', if any - # - if self.runbook.image == "": - runbook_images = [] - else: - runbook_images = loader.images.get(self.runbook.image) - if runbook_images is None: - if not self._is_urn(self.runbook.image) and not self._is_vhd(self.runbook.image): - raise Exception(f"The 'image' parameter must be an image or image set name, a urn, or a vhd: {self.runbook.image}") - i = VmImageInfo() - i.urn = self.runbook.image # Note that this could be a URN or the URI for a VHD - i.locations = [] - i.vm_sizes = [] - runbook_images = [i] - - # - # Now walk through all the test_suites and create a list of the environments (test VMs) that need to be created. - # - environment_list: List[Dict[str, Any]] = [] - shared_environments: Dict[str, Dict[str, Any]] = {} - - for suite_info in loader.test_suites: + """ + Examines the test_suites specified in the runbook and returns a list of the environments (i.e. test VMs or scale sets) that need to be + created in order to execute these suites. + + Note that if the runbook provides an 'image', 'location', or 'vm_size', those values override any values provided in the + configuration of the test suites. + """ + environments: List[Dict[str, Any]] = [] + shared_environments: Dict[str, Dict[str, Any]] = {} # environments shared by multiple test suites + + loader = AgentTestLoader(self.runbook.test_suites, self.runbook.cloud) + + runbook_images = self._get_runbook_images(loader) + + skip_test_suites: List[str] = [] + skip_test_suites_images: List[str] = [] + for test_suite_info in loader.test_suites: + if self.runbook.cloud in test_suite_info.skip_on_clouds: + skip_test_suites.append(test_suite_info.name) + continue if len(runbook_images) > 0: - images_info = runbook_images + images_info: List[VmImageInfo] = runbook_images else: - # The test suite may be referencing multiple image sets, and sets can intersect, so we need to ensure - # we eliminate any duplicates. - unique_images: Dict[str, str] = {} - for image in suite_info.images: - for i in loader.images[image]: - unique_images[i] = i - images_info = unique_images.values() + images_info: List[VmImageInfo] = self._get_test_suite_images(test_suite_info, loader) + + skip_images_info: List[VmImageInfo] = self._get_test_suite_skip_images(test_suite_info, loader) + if len(skip_images_info) > 0: + skip_test_suite_image = f"{test_suite_info.name}: {','.join([i.urn for i in skip_images_info])}" + skip_test_suites_images.append(skip_test_suite_image) for image in images_info: - # The URN can actually point to a VHD if the runbook provided a VHD in the 'images' parameter + if image in skip_images_info: + continue + # 'image.urn' can actually be the URL to a VHD or an image from a gallery if the runbook provided it in the 'image' parameter if self._is_vhd(image.urn): marketplace_image = "" vhd = image.urn - name = "vhd" + image_name = urllib.parse.urlparse(vhd).path.split('/')[-1] # take the last fragment of the URL's path (e.g. "RHEL_8_Standard-8.3.202006170423.vhd") + shared_gallery = "" + elif self._is_image_from_gallery(image.urn): + marketplace_image = "" + vhd = "" + image_name = self._get_name_of_image_from_gallery(image.urn) + shared_gallery = image.urn else: marketplace_image = image.urn vhd = "" - match = AgentTestSuitesCombinator._URN.match(image.urn) - if match is None: - raise Exception(f"Invalid URN: {image.urn}") - name = f"{match.group('offer')}-{match.group('sku')}" - - # If the runbook specified a location, use it. Then try the suite location, if any. Otherwise, check if the image specifies - # a list of locations and use any of them. If no location is specified so far, use the default. - if self.runbook.location != "": - location = self.runbook.location - elif suite_info.location != '': - location = suite_info.location - elif len(image.locations) > 0: - location = image.locations[0] - else: - location = AgentTestSuitesCombinator._DEFAULT_LOCATIONS[self.runbook.cloud] - - # If the runbook specified a VM size, use it. Else if the image specifies a list of VM sizes, use any of them. Otherwise, - # set the size to empty and let LISA choose it. - if self.runbook.vm_size != '': - vm_size = self.runbook.vm_size - elif len(image.vm_sizes) > 0: - vm_size = image.vm_sizes[0] - else: - vm_size = "" - - if suite_info.owns_vm: - # create an environment for exclusive use by this suite - environment_list.append({ - "c_marketplace_image": marketplace_image, - "c_location": location, - "c_vm_size": vm_size, - "c_vhd": vhd, - "c_test_suites": [suite_info], - "c_env_name": f"{name}-{suite_info.name}" - }) + image_name = self._get_image_name(image.urn) + shared_gallery = "" + + if test_suite_info.executes_on_scale_set and (vhd != "" or shared_gallery != ""): + raise Exception("VHDS and images from galleries are currently not supported on scale sets.") + + location: str = self._get_location(test_suite_info, image) + if location is None: + continue + + vm_size = self._get_vm_size(image) + + if test_suite_info.owns_vm or not test_suite_info.install_test_agent: + # + # Create an environment for exclusive use by this suite + # + # TODO: Allow test suites that set 'install_test_agent' to False to share environments (we need to ensure that + # all the suites in the shared environment have the same value for 'install_test_agent') + # + if test_suite_info.executes_on_scale_set: + env = self.create_vmss_environment( + env_name=f"{image_name}-vmss-{test_suite_info.name}", + marketplace_image=marketplace_image, + location=location, + vm_size=vm_size, + test_suite_info=test_suite_info) + else: + env = self.create_vm_environment( + env_name=f"{image_name}-{test_suite_info.name}", + marketplace_image=marketplace_image, + vhd=vhd, + shared_gallery=shared_gallery, + location=location, + vm_size=vm_size, + test_suite_info=test_suite_info) + environments.append(env) else: # add this suite to the shared environments - key: str = f"{name}-{location}" - if key in shared_environments: - shared_environments[key]["c_test_suites"].append(suite_info) + env_name: str = f"{image_name}-vmss-{location}" if test_suite_info.executes_on_scale_set else f"{image_name}-{location}" + env = shared_environments.get(env_name) + if env is not None: + env["c_test_suites"].append(test_suite_info) else: - shared_environments[key] = { - "c_marketplace_image": marketplace_image, - "c_location": location, - "c_vm_size": vm_size, - "c_vhd": vhd, - "c_test_suites": [suite_info], - "c_env_name": key + if test_suite_info.executes_on_scale_set: + env = self.create_vmss_environment( + env_name=env_name, + marketplace_image=marketplace_image, + location=location, + vm_size=vm_size, + test_suite_info=test_suite_info) + else: + env = self.create_vm_environment( + env_name=env_name, + marketplace_image=marketplace_image, + vhd=vhd, + shared_gallery=shared_gallery, + location=location, + vm_size=vm_size, + test_suite_info=test_suite_info) + shared_environments[env_name] = env + + if test_suite_info.template != '': + vm_tags = env["vm_tags"] + if "templates" not in vm_tags: + vm_tags["templates"] = test_suite_info.template + else: + vm_tags["templates"] += "," + test_suite_info.template + + environments.extend(shared_environments.values()) + + if len(environments) == 0: + raise Exception("No VM images were found to execute the test suites.") + + # Log a summary of each environment and the suites that will be executed on it + format_suites = lambda suites: ", ".join([s.name for s in suites]) + summary = [f"{e['c_env_name']}: [{format_suites(e['c_test_suites'])}]" for e in environments] + summary.sort() + self._log.info("Executing tests on %d environments\n\n%s\n", len(environments), '\n'.join([f"\t{s}" for s in summary])) + + if len(skip_test_suites) > 0: + self._log.info("Skipping test suites %s", skip_test_suites) + + if len(skip_test_suites_images) > 0: + self._log.info("Skipping test suits run on images \n %s", '\n'.join([f"\t{skip}" for skip in skip_test_suites_images])) + + return environments + + def create_existing_vm_environment(self) -> Dict[str, Any]: + loader = AgentTestLoader(self.runbook.test_suites, self.runbook.cloud) + + vm: VirtualMachineClient = VirtualMachineClient( + cloud=self.runbook.cloud, + location=self.runbook.location, + subscription=self.runbook.subscription_id, + resource_group=self.runbook.resource_group_name, + name=self.runbook.vm_name) + + ip_address = vm.get_ip_address() + + return { + "c_env_name": self.runbook.vm_name, + "c_platform": [ + { + "type": "ready", + "capture_vm_information": False + } + ], + "c_environment": { + "environments": [ + { + "nodes": [ + { + "type": "remote", + "name": self.runbook.vm_name, + "public_address": ip_address, + "public_port": 22, + "username": self.runbook.user, + "private_key_file": self.runbook.identity_file + } + ], + } + ] + }, + "c_location": self.runbook.location, + "c_test_suites": loader.test_suites, + } + + def create_existing_vmss_environment(self) -> Dict[str, Any]: + loader = AgentTestLoader(self.runbook.test_suites, self.runbook.cloud) + + vmss = VirtualMachineScaleSetClient( + cloud=self.runbook.cloud, + location=self.runbook.location, + subscription=self.runbook.subscription_id, + resource_group=self.runbook.resource_group_name, + name=self.runbook.vmss_name) + + ip_addresses = vmss.get_instances_ip_address() + + return { + "c_env_name": self.runbook.vmss_name, + "c_environment": { + "environments": [ + { + "nodes": [ + { + "type": "remote", + "name": i.instance_name, + "public_address": i.ip_address, + "public_port": 22, + "username": self.runbook.user, + "private_key_file": self.runbook.identity_file + } for i in ip_addresses + ], + } + ] + }, + "c_platform": [ + { + "type": "ready", + "capture_vm_information": False + } + ], + "c_location": self.runbook.location, + "c_test_suites": loader.test_suites, + } + + def create_vm_environment(self, env_name: str, marketplace_image: str, vhd: str, shared_gallery: str, location: str, vm_size: str, test_suite_info: TestSuiteInfo) -> Dict[str, Any]: + # + # Custom ARM templates (to create the test VMs) require special handling. These templates are processed by the azure_update_arm_template + # hook, which does not have access to the runbook variables. Instead, we use a dummy VM tag named "templates" and pass the + # names of the custom templates in its value. The hook can then retrieve the value from the Platform object (see wiki for more details). + # We also use a dummy item, "vm_tags" in the environment dictionary in order to concatenate templates from multiple test suites when they + # share the same test environment. Similarly, we use a dummy VM tag named "allow_ssh" to pass the value of the "allow_ssh" runbook parameter. + # + vm_tags = {} + if self.runbook.allow_ssh != '': + vm_tags["allow_ssh"] = self.runbook.allow_ssh + environment = { + "c_platform": [ + { + "type": "azure", + "admin_username": self.runbook.user, + "admin_private_key_file": self.runbook.identity_file, + "keep_environment": self.runbook.keep_environment, + "capture_vm_information": False, + "azure": { + "deploy": True, + "cloud": self.runbook.cloud, + "marketplace_image_information_location": self._MARKETPLACE_IMAGE_INFORMATION_LOCATIONS[self.runbook.cloud], + "shared_resource_group_location": self._SHARED_RESOURCE_GROUP_LOCATIONS[self.runbook.cloud], + "subscription_id": self.runbook.subscription_id, + "wait_delete": False, + "vm_tags": vm_tags + }, + "requirement": { + "core_count": { + "min": 2 + }, + "azure": { + "marketplace": marketplace_image, + "vhd": vhd, + "shared_gallery": shared_gallery, + "location": location, + "vm_size": vm_size } + } + } + ], + + "c_environment": None, + + "c_env_name": env_name, + "c_test_suites": [test_suite_info], + "c_location": location, + "c_image": marketplace_image, + "c_is_vhd": vhd != "", + "vm_tags": vm_tags + } + + if shared_gallery != '': + # Currently all the images in our shared gallery require secure boot + environment['c_platform'][0]['requirement']["features"] = { + "items": [ + { + "type": "Security_Profile", + "security_profile": "secureboot" + } + ] + } + return environment - environment_list.extend(shared_environments.values()) + def create_vmss_environment(self, env_name: str, marketplace_image: str, location: str, vm_size: str, test_suite_info: TestSuiteInfo) -> Dict[str, Any]: + return { + "c_platform": [ + { + "type": "ready", + "capture_vm_information": False + } + ], + + "c_environment": { + "environments": [ + { + "nodes": [ + {"type": "local"} + ], + } + ] + }, + + "c_env_name": env_name, + "c_test_suites": [test_suite_info], + "c_location": location, + "c_image": marketplace_image, + "c_is_vhd": False, + "c_vm_size": vm_size, + "vm_tags": {} + } + + def _get_runbook_images(self, loader: AgentTestLoader) -> List[VmImageInfo]: + """ + Returns the images specified in the runbook, or an empty list if none are specified. + """ + if self.runbook.image == "": + return [] + + images = loader.images.get(self.runbook.image) + if images is not None: + return images + + # If it is not image or image set, it must be a URN, VHD, or an image from a gallery + if not self._is_urn(self.runbook.image) and not self._is_vhd(self.runbook.image) and not self._is_image_from_gallery(self.runbook.image): + raise Exception(f"The 'image' parameter must be an image, image set name, urn, vhd, or an image from a shared gallery: {self.runbook.image}") - log: logging.Logger = logging.getLogger("lisa") - log.info("******** Environments *****") - for e in environment_list: - log.info( - "{ c_marketplace_image: '%s', c_location: '%s', c_vm_size: '%s', c_vhd: '%s', c_test_suites: '%s', c_env_name: '%s' }", - e['c_marketplace_image'], e['c_location'], e['c_vm_size'], e['c_vhd'], [s.name for s in e['c_test_suites']], e['c_env_name']) - log.info("***************************") + i = VmImageInfo() + i.urn = self.runbook.image # Note that this could be a URN or the URI for a VHD, or an image from a shared gallery + i.locations = [] + i.vm_sizes = [] - return environment_list + return [i] + + @staticmethod + def _get_test_suite_images(suite: TestSuiteInfo, loader: AgentTestLoader) -> List[VmImageInfo]: + """ + Returns the images used by a test suite. + + A test suite may be reference multiple image sets and sets can intersect; this method eliminates any duplicates. + """ + unique: Dict[str, VmImageInfo] = {} + for image in suite.images: + match = AgentTestLoader.RANDOM_IMAGES_RE.match(image) + if match is None: + image_list = loader.images[image] + else: + count = match.group('count') + if count is None: + count = 1 + matching_images = loader.images[match.group('image_set')].copy() + random.shuffle(matching_images) + image_list = matching_images[0:int(count)] + for i in image_list: + unique[i.urn] = i + return [v for k, v in unique.items()] + + @staticmethod + def _get_test_suite_skip_images(suite: TestSuiteInfo, loader: AgentTestLoader) -> List[VmImageInfo]: + """ + Returns images that need to be skipped by the suite. + + A test suite may reference multiple image sets and sets can intersect; this method eliminates any duplicates. + """ + skip_unique: Dict[str, VmImageInfo] = {} + for image in suite.skip_on_images: + image_list = loader.images[image] + for i in image_list: + skip_unique[i.urn] = i + return [v for k, v in skip_unique.items()] + + def _get_location(self, suite_info: TestSuiteInfo, image: VmImageInfo) -> str: + """ + Returns the location on which the test VM for the given test suite and image should be created. + + If the image is not available on any location, returns None, to indicate that the test suite should be skipped. + """ + # If the runbook specified a location, use it. + if self.runbook.location != "": + return self.runbook.location + + # Then try the suite location, if any. + for location in suite_info.locations: + if location.startswith(self.runbook.cloud + ":"): + return location.split(":")[1] + + # If the image has a location restriction, use any location where it is available. + # However, if it is not available on any location, skip the image (return None) + if image.locations: + image_locations = image.locations.get(self.runbook.cloud) + if image_locations is not None: + if len(image_locations) == 0: + return None + return image_locations[0] + + # Else use the default. + return AgentTestSuitesCombinator._DEFAULT_LOCATIONS[self.runbook.cloud] + + def _get_vm_size(self, image: VmImageInfo) -> str: + """ + Returns the VM size that should be used to create the test VM for the given image. + + If the size is set to an empty string, LISA will choose an appropriate size + """ + # If the runbook specified a VM size, use it. + if self.runbook.vm_size != '': + return self.runbook.vm_size + + # If the image specifies a list of VM sizes, use any of them. + if len(image.vm_sizes) > 0: + return image.vm_sizes[0] + + # Otherwise, set the size to empty and LISA will select an appropriate size. + return "" + + + @staticmethod + def _get_image_name(urn: str) -> str: + """ + Creates an image name ("offer-sku") given its URN + """ + match = AgentTestSuitesCombinator._URN.match(urn) + if match is None: + raise Exception(f"Invalid URN: {urn}") + return f"{match.group('offer')}-{match.group('sku')}" _URN = re.compile(r"(?P[^\s:]+)[\s:](?P[^\s:]+)[\s:](?P[^\s:]+)[\s:](?P[^\s:]+)") @@ -247,3 +559,52 @@ def _is_vhd(vhd: str) -> bool: # VHDs are given as URIs to storage; do some basic validation, not intending to be exhaustive. parsed = urllib.parse.urlparse(vhd) return parsed.scheme == 'https' and parsed.netloc != "" and parsed.path != "" + + # Images from a gallery are given as "//". + _IMAGE_FROM_GALLERY = re.compile(r"(?P[^/]+)/(?P[^/]+)/(?P[^/]+)") + + @staticmethod + def _is_image_from_gallery(image: str) -> bool: + return AgentTestSuitesCombinator._IMAGE_FROM_GALLERY.match(image) is not None + + @staticmethod + def _get_name_of_image_from_gallery(image: str) -> bool: + match = AgentTestSuitesCombinator._IMAGE_FROM_GALLERY.match(image) + if match is None: + raise Exception(f"Invalid image from gallery: {image}") + return match.group('image') + + @staticmethod + def _report_test_result( + suite_name: str, + test_name: str, + status: TestStatus, + start_time: datetime.datetime, + message: str = "", + add_exception_stack_trace: bool = False + ) -> None: + """ + Reports a test result to the junit notifier + """ + # The junit notifier requires an initial RUNNING message in order to register the test in its internal cache. + msg: TestResultMessage = TestResultMessage() + msg.type = "AgentTestResultMessage" + msg.id_ = str(uuid.uuid4()) + msg.status = TestStatus.RUNNING + msg.suite_full_name = suite_name + msg.suite_name = msg.suite_full_name + msg.full_name = test_name + msg.name = msg.full_name + msg.elapsed = 0 + + notifier.notify(msg) + + # Now send the actual result. The notifier pipeline makes a deep copy of the message so it is OK to re-use the + # same object and just update a few fields. If using a different object, be sure that the "id_" is the same. + msg.status = status + msg.message = message + if add_exception_stack_trace: + msg.stacktrace = traceback.format_exc() + msg.elapsed = (datetime.datetime.now() - start_time).total_seconds() + + notifier.notify(msg) diff --git a/tests_e2e/orchestrator/lib/update_arm_template_hook.py b/tests_e2e/orchestrator/lib/update_arm_template_hook.py new file mode 100644 index 000000000..801583ff7 --- /dev/null +++ b/tests_e2e/orchestrator/lib/update_arm_template_hook.py @@ -0,0 +1,88 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import importlib.util +import logging + +from pathlib import Path +from typing import Any + +# Disable those warnings, since 'lisa' is an external, non-standard, dependency +# E0401: Unable to import 'lisa.*' (import-error) +# pylint: disable=E0401 +from lisa.environment import Environment +from lisa.util import hookimpl, plugin_manager +from lisa.sut_orchestrator.azure.platform_ import AzurePlatformSchema +# pylint: enable=E0401 + +import tests_e2e +from tests_e2e.tests.lib.network_security_rule import NetworkSecurityRule +from tests_e2e.tests.lib.update_arm_template import UpdateArmTemplate + + +class UpdateArmTemplateHook: + """ + This hook allows to customize the ARM template used to create the test VMs (see wiki for details). + """ + @hookimpl + def azure_update_arm_template(self, template: Any, environment: Environment) -> None: + log: logging.Logger = logging.getLogger("lisa") + + azure_runbook: AzurePlatformSchema = environment.platform.runbook.get_extended_runbook(AzurePlatformSchema) + vm_tags = azure_runbook.vm_tags + + # + # Add the allow SSH security rule if requested by the runbook + # + allow_ssh: str = vm_tags.get("allow_ssh") + if allow_ssh is not None: + log.info("******** Waagent: Adding network security rule to allow SSH connections from %s", allow_ssh) + NetworkSecurityRule(template, is_lisa_template=True).add_allow_ssh_rule(allow_ssh) + + # + # Apply any template customizations provided by the tests. + # + # The "templates" tag is a comma-separated list of the template customizations provided by the tests + test_templates = vm_tags.get("templates") + if test_templates is not None: + log.info("******** Waagent: Applying custom templates '%s' to environment '%s'", test_templates, environment.name) + + for t in test_templates.split(","): + update_arm_template = self._get_update_arm_template(t) + update_arm_template().update(template, is_lisa_template=True) + + _SOURCE_CODE_ROOT: Path = Path(tests_e2e.__path__[0]) + + @staticmethod + def _get_update_arm_template(test_template: str) -> UpdateArmTemplate: + """ + Returns the UpdateArmTemplate class that implements the template customization for the test. + """ + source_file: Path = UpdateArmTemplateHook._SOURCE_CODE_ROOT/"tests"/test_template + + spec = importlib.util.spec_from_file_location(f"tests_e2e.tests.templates.{source_file.name}", str(source_file)) + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) + + # find all the classes in the module that are subclasses of UpdateArmTemplate but are not UpdateArmTemplate itself. + matches = [v for v in module.__dict__.values() if isinstance(v, type) and issubclass(v, UpdateArmTemplate) and v != UpdateArmTemplate] + if len(matches) != 1: + raise Exception(f"Error in {source_file}: template files must contain exactly one class derived from UpdateArmTemplate)") + return matches[0] + + +plugin_manager.register(UpdateArmTemplateHook()) diff --git a/tests_e2e/orchestrator/runbook.yml b/tests_e2e/orchestrator/runbook.yml index 8075725eb..8b0ef37ec 100644 --- a/tests_e2e/orchestrator/runbook.yml +++ b/tests_e2e/orchestrator/runbook.yml @@ -1,4 +1,4 @@ -name: WALinuxAgent +name: $(name) testcase: - criteria: @@ -9,124 +9,196 @@ extension: variable: # - # These variables define parameters handled by LISA. + # The test environments are generated dynamically by the AgentTestSuitesCombinator using the 'platform' and 'environment' variables. + # Most of the variables below are parameters for the combinator and/or the AgentTestSuite (marked as 'is_case_visible'), but a few of + # them, such as the runbook name and the SSH proxy variables, are handled by LISA. + # + # Many of these variables are optional, depending on the scenario. An empty values indicates that the variable has not been specified. + # + + # + # The name of the runbook, it is added as a prefix ("lisa-") to ARM resources created by the test run. + # + # Set the name to your email alias when doing developer runs. + # + - name: name + value: "WALinuxAgent" + is_case_visible: true + + # + # Test suites to execute + # + - name: test_suites + value: "agent_bvt, no_outbound_connections, extensions_disabled, agent_not_provisioned, fips, agent_ext_workflow, agent_status, multi_config_ext, agent_cgroups, ext_cgroups, agent_firewall, ext_telemetry_pipeline, ext_sequencing, agent_persist_firewall, publish_hostname, agent_update, recover_network_interface" + + # + # Parameters used to create test VMs # - name: subscription_id value: "" - - name: user - value: "waagent" - - name: identity_file + is_case_visible: true + - name: cloud + value: "AzureCloud" + is_case_visible: true + - name: location value: "" - is_secret: true - - name: admin_password + - name: image value: "" - is_secret: true + - name: vm_size + value: "" + + # + # Whether to skip deletion of the test VMs after the test run completes. + # + # Possible values: always, no, failed + # - name: keep_environment value: "no" + is_case_visible: true + + # + # Username and SSH public key for the admin user on the test VMs + # + - name: user + value: "waagent" + is_case_visible: true + - name: identity_file + value: "" + is_case_visible: true + # - # These variables define parameters for the AgentTestSuite; see the test wiki for details. + # Set the resource group and vm, or the group and the vmss, to execute the test run on an existing VM or VMSS. # - # NOTE: c_test_suites, generated by the AgentTestSuitesCombinator, is also a parameter - # for the AgentTestSuite + - name: resource_group_name + value: "" + is_case_visible: true + - name: vm_name + value: "" + is_case_visible: true + - name: vmss_name + value: "" + is_case_visible: true + + # + # Directory for test logs # - # Root directory for log files (optional) - name: log_path value: "" is_case_visible: true + # # Whether to collect logs from the test VM + # + # Possible values: always, no, failed + # - name: collect_logs value: "failed" is_case_visible: true - # Whether to skip setup of the test VM + # + # Whether to skip setup of the test VMs. This is useful in developer runs when using existing VMs to save initialization time. + # - name: skip_setup value: false is_case_visible: true # - # These variables are parameters for the AgentTestSuitesCombinator + # Takes an IP address as value; if not empty, it adds a Network Security Rule allowing SSH access from the specified IP address to any test VMs created by the runbook execution. # - # The test suites to execute - - name: test_suites - value: "agent_bvt" - - name: cloud - value: "public" - - name: image + - name: allow_ssh value: "" - - name: location + is_case_visible: true + + # + # These variables are handled by LISA to use an SSH proxy when executing the runbook + # + - name: proxy + value: False + - name: proxy_host value: "" - - name: vm_size + - name: proxy_user + value: "foo" + - name: proxy_identity_file value: "" + is_secret: true # - # The values for these variables are generated by the AgentTestSuitesCombinator combinator. They are + # The variables below are generated by the AgentTestSuitesCombinator combinator. They are # prefixed with "c_" to distinguish them from the rest of the variables, whose value can be set from # the command line. # - # c_marketplace_image, c_vm_size, c_location, and c_vhd are handled by LISA and define - # the set of test VMs that need to be created, while c_test_suites and c_env_name are parameters - # for the AgentTestSuite; the former defines the test suites that must be executed on each - # of those test VMs and the latter is the name of the environment, which is used for logging - # purposes (NOTE: the AgentTestSuite also uses c_vhd). + + # + # The combinator generates the test environments using these two variables, which are passed to LISA + # + - name: c_environment + value: {} + - name: c_platform + value: [] + + # + # Name of the test environment, used for mainly for logging purposes # - name: c_env_name value: "" is_case_visible: true - - name: c_marketplace_image + + # + # Test suites assigned for execution in the current test environment. + # + # The combinator splits the test suites specified in the 'test_suites' variable in subsets and assigns each subset + # to a test environment. The AgentTestSuite uses 'c_test_suites' to execute the suites assigned to the current environment. + # + - name: c_test_suites + value: [] + is_case_visible: true + + # + # These parameters are used by the AgentTestSuite to create the test scale sets. + # + # Note that there are other 3 variables named 'image', 'vm_size' and 'location', which can be passed + # from the command line. The combinator generates the values for these parameters using test metadata, + # but they can be overriden with these command line variables. The final values are passed to the + # AgentTestSuite in the corresponding 'c_*' variables. + # + - name: c_image value: "" + is_case_visible: true - name: c_vm_size value: "" + is_case_visible: true - name: c_location value: "" - - name: c_vhd - value: "" - is_case_visible: true - - name: c_test_suites - value: [] is_case_visible: true # - # Set these variables to use an SSH proxy when executing the runbook + # True if the image is a VHD (instead of a URN) # - - name: proxy - value: False - - name: proxy_host - value: "" - - name: proxy_user - value: "foo" - - name: proxy_identity_file - value: "" - is_secret: true + - name: c_is_vhd + value: false + is_case_visible: true -platform: - - type: azure - admin_username: $(user) - admin_private_key_file: $(identity_file) - admin_password: $(admin_password) - keep_environment: $(keep_environment) - azure: - deploy: True - subscription_id: $(subscription_id) - wait_delete: false - requirement: - core_count: - min: 2 - azure: - marketplace: $(c_marketplace_image) - vhd: $(c_vhd) - location: $(c_location) - vm_size: $(c_vm_size) +environment: $(c_environment) + +platform: $(c_platform) combinator: type: agent_test_suites - test_suites: $(test_suites) + allow_ssh: $(allow_ssh) cloud: $(cloud) + identity_file: $(identity_file) image: $(image) + keep_environment: $(keep_environment) location: $(location) + resource_group_name: $(resource_group_name) + subscription_id: $(subscription_id) + test_suites: $(test_suites) + user: $(user) + vm_name: $(vm_name) vm_size: $(vm_size) + vmss_name: $(vmss_name) -concurrency: 16 +concurrency: 32 notifier: - type: agent.junit diff --git a/tests_e2e/orchestrator/sample_runbooks/existing_vm.yml b/tests_e2e/orchestrator/sample_runbooks/existing_vm.yml deleted file mode 100644 index 2a5109f41..000000000 --- a/tests_e2e/orchestrator/sample_runbooks/existing_vm.yml +++ /dev/null @@ -1,143 +0,0 @@ -# Microsoft Azure Linux Agent -# -# Copyright 2018 Microsoft Corporation -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# -# Executes the test suites on an existing VM -# -name: ExistingVM - -testcase: - - criteria: - area: waagent - -extension: - - "../lib" - -variable: - # - # These variables identify the existing VM, and the user for SSH connections - # - - name: cloud - value: "public" - - name: subscription_id - value: "" - - name: resource_group_name - value: "" - - name: vm_name - value: "" - - name: location - value: "" - - - name: user - value: "" - - name: identity_file - value: "" - is_secret: true - - # - # The test suites to execute - # - - name: test_suites - value: "agent_bvt" - - # - # These variables define parameters for the AgentTestSuite; see the test wiki for details. - # - # NOTE: c_test_suites, generated by the AgentTestSuitesCombinator, is also a parameter - # for the AgentTestSuite - # - # Root directory for log files (optional) - - name: log_path - value: "" - is_case_visible: true - - # Whether to collect logs from the test VM - - name: collect_logs - value: "failed" - is_case_visible: true - - # Whether to skip setup of the test VM - - name: skip_setup - value: false - is_case_visible: true - - # - # The values for these variables are generated by the AgentTestSuitesCombinator combinator. They are - # prefixed with "c_" to distinguish them from the rest of the variables, whose value can be set from - # the command line. - # - # c_marketplace_image, c_vm_size, c_location, and c_vhd are handled by LISA and define - # the set of test VMs that need to be created, while c_test_suites is a parameter - # for the AgentTestSuite and defines the test suites that must be executed on each - # of those test VMs (the AgentTestSuite also uses c_vhd) - # - - name: c_env_name - value: "" - is_case_visible: true - - name: c_vm_name - value: "" - - name: c_location - value: "" - - name: c_test_suites - value: [] - is_case_visible: true - - # - # Set these variables to use an SSH proxy when executing the runbook - # - - name: proxy - value: False - - name: proxy_host - value: "" - - name: proxy_user - value: "foo" - - name: proxy_identity_file - value: "" - is_secret: true - -platform: - - type: azure - admin_username: $(user) - admin_private_key_file: $(identity_file) - azure: - resource_group_name: $(resource_group_name) - deploy: false - subscription_id: $(subscription_id) - requirement: - azure: - name: $(c_vm_name) - location: $(c_location) - -combinator: - type: agent_test_suites - test_suites: $(test_suites) - cloud: $(cloud) - location: $(location) - vm_name: $(vm_name) - -notifier: - - type: env_stats - - type: agent.junit - -dev: - enabled: $(proxy) - mock_tcp_ping: $(proxy) - jump_boxes: - - private_key_file: $(proxy_identity_file) - address: $(proxy_host) - username: $(proxy_user) - password: "dummy" diff --git a/tests_e2e/orchestrator/scripts/agent-service b/tests_e2e/orchestrator/scripts/agent-service new file mode 100755 index 000000000..5c4c7ee09 --- /dev/null +++ b/tests_e2e/orchestrator/scripts/agent-service @@ -0,0 +1,92 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +set -euo pipefail + +# +# The service name is walinuxagent in Ubuntu/debian and waagent elsewhere +# + +usage() ( + echo "Usage: agent-service command" + exit 1 +) + +if [ "$#" -lt 1 ]; then + usage +fi +cmd=$1 +shift + +if [ "$#" -ne 0 ] || [ -z ${cmd+x} ] ; then + usage +fi + +if command -v systemctl &> /dev/null; then + service-status() { systemctl --no-pager -l status $1; } + service-stop() { systemctl stop $1; } + service-restart() { systemctl restart $1; } + service-start() { systemctl start $1; } + service-disable() { systemctl disable $1; } +else + service-status() { service $1 status; } + service-stop() { service $1 stop; } + service-restart() { service $1 restart; } + service-start() { service $1 start; } + service-disable() { service $1 disable; } +fi + +python=$(get-agent-python) +distro=$($python -c 'from azurelinuxagent.common.version import get_distro; print(get_distro()[0])') +distro=$(echo $distro | tr '[:upper:]' '[:lower:]') + +if [[ $distro == *"ubuntu"* || $distro == *"debian"* ]]; then + service_name="walinuxagent" +else + service_name="waagent" +fi + +echo "Service name: $service_name" + +if [[ "$cmd" == "restart" ]]; then + echo "Restarting service..." + service-restart $service_name + echo "Service status..." + service-status $service_name +fi + +if [[ "$cmd" == "start" ]]; then + echo "Starting service..." + service-start $service_name +fi + +if [[ "$cmd" == "stop" ]]; then + echo "Stopping service..." + service-stop $service_name +fi + +if [[ "$cmd" == "status" ]]; then + echo "Service status..." + service-status $service_name +fi + +if [[ "$cmd" == "disable" ]]; then + echo "Disabling service..." + service-disable $service_name +fi diff --git a/tests_e2e/orchestrator/scripts/collect-logs b/tests_e2e/orchestrator/scripts/collect-logs index eadf0483a..c221288a1 100755 --- a/tests_e2e/orchestrator/scripts/collect-logs +++ b/tests_e2e/orchestrator/scripts/collect-logs @@ -10,13 +10,16 @@ logs_file_name="/tmp/waagent-logs.tgz" echo "Collecting logs to $logs_file_name ..." +PYTHON=$(get-agent-python) +waagent_conf=$($PYTHON -c 'from azurelinuxagent.common.osutil import get_osutil; print(get_osutil().agent_conf_file_path)') + tar --exclude='journal/*' --exclude='omsbundle' --exclude='omsagent' --exclude='mdsd' --exclude='scx*' \ --exclude='*.so' --exclude='*__LinuxDiagnostic__*' --exclude='*.zip' --exclude='*.deb' --exclude='*.rpm' \ --warning=no-file-changed \ -czf "$logs_file_name" \ /var/log \ /var/lib/waagent/ \ - /etc/waagent.conf + $waagent_conf set -euxo pipefail diff --git a/tests_e2e/orchestrator/scripts/install-agent b/tests_e2e/orchestrator/scripts/install-agent index 14663d0b8..61181b44d 100755 --- a/tests_e2e/orchestrator/scripts/install-agent +++ b/tests_e2e/orchestrator/scripts/install-agent @@ -70,39 +70,88 @@ if service-status walinuxagent > /dev/null 2>&1;then else service_name="waagent" fi -echo "Service name: $service_name" # # Output the initial version of the agent + # python=$(get-agent-python) waagent=$(get-agent-bin-path) -echo "Agent's path: $waagent" + +echo "========== Initial Status ==========" +echo "Service Name: $service_name" +echo "Agent Path: $waagent" +echo "Agent Version:" $python "$waagent" --version -printf "\n" +echo "Service Status:" + +# Sometimes the service can take a while to start; give it a few minutes, +started=false +for i in {1..6} +do + if service-status $service_name; then + started=true + break + fi + echo "Waiting for service to start..." + sleep 30 +done +if [ $started == false ]; then + echo "Service failed to start." + exit 1 +fi # # Install the package # +echo "========== Installing Agent ==========" echo "Installing $package as version $version..." unzip.py "$package" "/var/lib/waagent/WALinuxAgent-$version" -# Ensure that AutoUpdate is enabled. some distros, e.g. Flatcar, don't have a waagent.conf -# but AutoUpdate defaults to True so there is no need to do anything in that case. -if [[ -e /etc/waagent.conf ]]; then - sed -i 's/AutoUpdate.Enabled=n/AutoUpdate.Enabled=y/g' /etc/waagent.conf +python=$(get-agent-python) +# Ensure that AutoUpdate is enabled. some distros, e.g. Flatcar have a waagent.conf in different path +waagent_conf_path=$($python -c 'from azurelinuxagent.common.osutil import get_osutil; osutil=get_osutil(); print(osutil.agent_conf_file_path)') +echo "Agent's conf path: $waagent_conf_path" +sed -i 's/AutoUpdate.Enabled=n/AutoUpdate.Enabled=y/g' "$waagent_conf_path" +# By default UpdateToLatestVersion flag set to True, so that agent go through update logic to look for new agents. +# But in e2e tests this flag needs to be off in test version 9.9.9.9 to stop the agent updates, so that our scenarios run on 9.9.9.9. +sed -i '$a AutoUpdate.UpdateToLatestVersion=n' "$waagent_conf_path" +# Logging and exiting tests if Extensions.Enabled flag is disabled for other distros than debian +if grep -q "Extensions.Enabled=n" $waagent_conf_path; then + pypy_get_distro=$(pypy3 -c 'from azurelinuxagent.common.version import get_distro; print(get_distro())') + python_get_distro=$($python -c 'from azurelinuxagent.common.version import get_distro; print(get_distro())') + # As we know debian distros disable extensions by default, so we need to enable them to verify agent extension scenarios + # If rest of the distros disable extensions, then we exit the test setup to fail the test. + if [[ $pypy_get_distro == *"debian"* ]] || [[ $python_get_distro == *"debian"* ]]; then + echo "Extensions.Enabled flag is disabled and this is expected in debian distro, so enabling it" + update-waagent-conf Extensions.Enabled=y + else + echo "Extensions.Enabled flag is disabled which is unexpected in this distro, so exiting test setup to fail the test" + exit 1 + fi +fi + +# +# TODO: Remove this block once the symlink is created in the Flatcar image +# +# Currently, the Agent looks for /usr/share/oem/waagent.conf, but new Flatcar images use /etc/waagent.conf. Flatcar will create +# this symlink in new images, but we need to create it for now. +if [[ $(uname -a) == *"flatcar"* ]]; then + if [[ ! -f /usr/share/oem/waagent.conf ]]; then + ln -s "$waagent_conf_path" /usr/share/oem/waagent.conf + fi fi # # Restart the service # echo "Restarting service..." -service-stop $service_name +agent-service stop # Rename the previous log to ensure the new log starts with the agent we just installed mv /var/log/waagent.log /var/log/waagent."$(date --iso-8601=seconds)".log -service-start $service_name +agent-service start # # Verify that the new agent is running and output its status. @@ -111,9 +160,10 @@ service-start $service_name echo "Verifying agent installation..." check-version() { - for i in {0..5} + # We need to wait for the extension handler to start, give it a couple of minutes + for i in {1..12} do - if $python "$waagent" --version | grep -E "Goal state agent:\s+$version" > /dev/null; then + if waagent-version | grep -E "Goal state agent:\s+$version" > /dev/null; then return 0 fi sleep 10 @@ -123,15 +173,19 @@ check-version() { } if check-version "$version"; then - printf "\nThe agent was installed successfully\n" + printf "The agent was installed successfully\n" exit_code=0 else - printf "\nFailed to install agent.\n" + printf "************************************\n" + printf " * ERROR: Failed to install agent. *\n" + printf "************************************\n" exit_code=1 fi +printf "\n" +echo "========== Final Status ==========" $python "$waagent" --version printf "\n" -service-status $service_name +agent-service status exit $exit_code diff --git a/tests_e2e/orchestrator/scripts/prepare-pypy b/tests_e2e/orchestrator/scripts/prepare-pypy new file mode 100755 index 000000000..fe469c914 --- /dev/null +++ b/tests_e2e/orchestrator/scripts/prepare-pypy @@ -0,0 +1,56 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This script is used to prepare a tarball containing Pypy with the assert-py module pre-installed. +# It needs to be run on x64 and arm64 VMs and the resulting tarballs need to be uploaded to storage, +# from where they are downloaded and installed to the test VMs (see wiki for detail). +# + +set -euo pipefail + +cd /tmp +rm -rf pypy3.7-* + +arch=$(uname -m) +printf "Preparing Pypy for architecture %s...\n" $arch + +printf "\n*** Downloading Pypy...\n" +if [[ $arch == "aarch64" ]]; then + tarball="pypy3.7-arm64.tar.bz2" + wget https://downloads.python.org/pypy/pypy3.7-v7.3.5-aarch64.tar.bz2 -O $tarball +else + tarball="pypy3.7-x64.tar.bz2" + wget https://downloads.python.org/pypy/pypy3.7-v7.3.5-linux64.tar.bz2 -O $tarball +fi + +printf "\n*** Installing assertpy...\n" +tar xf $tarball +./pypy3.7-v7.3.5-*/bin/pypy -m ensurepip +./pypy3.7-v7.3.5-*/bin/pypy -mpip install assertpy + +printf "\n*** Creating new tarball for Pypy...\n" +# remove the cache files created when Pypy, and set the owner to 0/0, in order to match the original tarball +find pypy3.7-v7.3.5-* -name '*.pyc' -exec rm {} \; +mv -v $tarball "$tarball.original" +tar cf $tarball --bzip2 --owner 0:0 --group 0:0 pypy3.7-v7.3.5-* +rm -rf pypy3.7-v7.3.5-* + +printf "\nPypy is ready at %s\n" "$(pwd)/$tarball" + diff --git a/tests_e2e/orchestrator/scripts/update-waagent-conf b/tests_e2e/orchestrator/scripts/update-waagent-conf new file mode 100755 index 000000000..43dadeee2 --- /dev/null +++ b/tests_e2e/orchestrator/scripts/update-waagent-conf @@ -0,0 +1,48 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# Updates waagent.conf with the specified setting and value(allows multiple) and restarts the Agent. +# + +set -euo pipefail + +if [[ $# -lt 1 ]]; then + echo "Usage: update-waagent-conf []" + exit 1 +fi + +PYTHON=$(get-agent-python) +waagent_conf=$($PYTHON -c 'from azurelinuxagent.common.osutil import get_osutil; print(get_osutil().agent_conf_file_path)') +for setting_value in "$@"; do + IFS='=' read -r -a setting_value_array <<< "$setting_value" + name=${setting_value_array[0]} + value=${setting_value_array[1]} + + if [[ -z "$name" || -z "$value" ]]; then + echo "Invalid setting=value: $setting_value" + exit 1 + fi + echo "Setting $name=$value in $waagent_conf" + sed -i -E "/^$name=/d" "$waagent_conf" + sed -i -E "\$a $name=$value" "$waagent_conf" + updated=$(grep "$name" "$waagent_conf") + echo "Updated value: $updated" +done +agent-service restart \ No newline at end of file diff --git a/azurelinuxagent/distro/suse/__init__.py b/tests_e2e/orchestrator/scripts/waagent-version old mode 100644 new mode 100755 similarity index 75% rename from azurelinuxagent/distro/suse/__init__.py rename to tests_e2e/orchestrator/scripts/waagent-version index de7be3364..842ae91d2 --- a/azurelinuxagent/distro/suse/__init__.py +++ b/tests_e2e/orchestrator/scripts/waagent-version @@ -1,3 +1,7 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# # Copyright 2018 Microsoft Corporation # # Licensed under the Apache License, Version 2.0 (the "License"); @@ -12,6 +16,10 @@ # See the License for the specific language governing permissions and # limitations under the License. # -# Requires Python 2.6+ and Openssl 1.0+ +# returns the version of the agent # +set -euo pipefail +python=$(get-agent-python) +waagent=$(get-agent-bin-path) +$python "$waagent" --version \ No newline at end of file diff --git a/tests_e2e/orchestrator/templates/vmss.json b/tests_e2e/orchestrator/templates/vmss.json new file mode 100644 index 000000000..293edf80c --- /dev/null +++ b/tests_e2e/orchestrator/templates/vmss.json @@ -0,0 +1,253 @@ +{ + "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", + "contentVersion": "1.0.0.0", + "parameters": { + "username": { + "type": "string" + }, + "sshPublicKey": { + "type": "string" + }, + "vmName": { + "type": "string" + }, + "scenarioPrefix": { + "type": "string", + "defaultValue": "e2e-test" + }, + "publisher": { + "type": "string" + }, + "offer": { + "type": "string" + }, + "sku": { + "type": "string" + }, + "version": { + "type": "string" + } + }, + "variables": { + "nicName": "[concat(parameters('scenarioPrefix'),'Nic')]", + "vnetAddressPrefix": "10.130.0.0/16", + "subnetName": "[concat(parameters('scenarioPrefix'),'Subnet')]", + "subnetPrefix": "10.130.0.0/24", + "publicIPAddressName": "[concat(parameters('scenarioPrefix'),'PublicIp')]", + "lbIpName": "[concat(parameters('scenarioPrefix'),'PublicLbIp')]", + "virtualNetworkName": "[concat(parameters('scenarioPrefix'),'Vnet')]", + "lbName": "[concat(parameters('scenarioPrefix'),'lb')]", + "lbIpId": "[resourceId('Microsoft.Network/publicIPAddresses', variables('lbIpName'))]", + "bepoolName": "[concat(variables('lbName'), 'bepool')]", + "natpoolName": "[concat(variables('lbName'), 'natpool')]", + "feIpConfigName": "[concat(variables('lbName'), 'fepool', 'IpConfig')]", + "sshProbeName": "[concat(variables('lbName'), 'probe')]", + "vnetID": "[resourceId('Microsoft.Network/virtualNetworks',variables('virtualNetworkName'))]", + "subnetRef": "[concat(variables('vnetID'),'/subnets/',variables('subnetName'))]", + "lbId": "[resourceId('Microsoft.Network/loadBalancers', variables('lbName'))]", + "bepoolID": "[concat(variables('lbId'), '/backendAddressPools/', variables('bepoolName'))]", + "natpoolID": "[concat(variables('lbId'), '/inboundNatPools/', variables('natpoolName'))]", + "feIpConfigId": "[concat(variables('lbId'), '/frontendIPConfigurations/', variables('feIpConfigName'))]", + "sshProbeId": "[concat(variables('lbId'), '/probes/', variables('sshProbeName'))]", + "sshKeyPath": "[concat('/home/', parameters('username'), '/.ssh/authorized_keys')]" + }, + "resources": [ + { + "apiVersion": "2023-06-01", + "type": "Microsoft.Network/virtualNetworks", + "name": "[variables('virtualNetworkName')]", + "location": "[resourceGroup().location]", + "properties": { + "addressSpace": { + "addressPrefixes": [ + "[variables('vnetAddressPrefix')]" + ] + }, + "subnets": [ + { + "name": "[variables('subnetName')]", + "properties": { + "addressPrefix": "[variables('subnetPrefix')]" + } + } + ] + } + }, + { + "type": "Microsoft.Network/publicIPAddresses", + "name": "[variables('lbIpName')]", + "location": "[resourceGroup().location]", + "apiVersion": "2023-06-01", + "properties": { + "publicIPAllocationMethod": "Dynamic", + "dnsSettings": { + "domainNameLabel": "[parameters('vmName')]" + } + } + }, + { + "type": "Microsoft.Network/loadBalancers", + "name": "[variables('lbName')]", + "location": "[resourceGroup().location]", + "apiVersion": "2020-06-01", + "dependsOn": [ + "[concat('Microsoft.Network/virtualNetworks/', variables('virtualNetworkName'))]", + "[concat('Microsoft.Network/publicIPAddresses/', variables('lbIpName'))]" + ], + "properties": { + "frontendIPConfigurations": [ + { + "name": "[variables('feIpConfigName')]", + "properties": { + "PublicIpAddress": { + "id": "[variables('lbIpId')]" + } + } + } + ], + "backendAddressPools": [ + { + "name": "[variables('bepoolName')]" + } + ], + "inboundNatPools": [ + { + "name": "[variables('natpoolName')]", + "properties": { + "FrontendIPConfiguration": { + "Id": "[variables('feIpConfigId')]" + }, + "BackendPort": 22, + "Protocol": "tcp", + "FrontendPortRangeStart": 3500, + "FrontendPortRangeEnd": 4500 + } + } + ], + "loadBalancingRules": [ + { + "name": "ProbeRule", + "properties": { + "frontendIPConfiguration": { + "id": "[variables('feIpConfigId')]" + }, + "backendAddressPool": { + "id": "[variables('bepoolID')]" + }, + "protocol": "Tcp", + "frontendPort": 80, + "backendPort": 80, + "idleTimeoutInMinutes": 5, + "probe": { + "id": "[variables('sshProbeId')]" + } + } + } + ], + "probes": [ + { + "name": "[variables('sshProbeName')]", + "properties": { + "protocol": "tcp", + "port": 22, + "intervalInSeconds": 5, + "numberOfProbes": 2 + } + } + ] + } + }, + { + "apiVersion": "2023-03-01", + "type": "Microsoft.Compute/virtualMachineScaleSets", + "name": "[parameters('vmName')]", + "location": "[resourceGroup().location]", + "dependsOn": [ + "[concat('Microsoft.Network/virtualNetworks/', variables('virtualNetworkName'))]", + "[concat('Microsoft.Network/loadBalancers/', variables('lbName'))]" + ], + "sku": { + "name": "Standard_D2s_v3", + "tier": "Standard", + "capacity": 3 + }, + "properties": { + "orchestrationMode": "Uniform", + "overprovision": false, + "virtualMachineProfile": { + "extensionProfile": { + "extensions": [] + }, + "osProfile": { + "computerNamePrefix": "[parameters('vmName')]", + "adminUsername": "[parameters('username')]", + "linuxConfiguration": { + "disablePasswordAuthentication": true, + "ssh": { + "publicKeys": [ + { + "path": "[variables('sshKeyPath')]", + "keyData": "[parameters('sshPublicKey')]" + } + ] + } + } + }, + "storageProfile": { + "osDisk": { + "osType": "Linux", + "createOption": "FromImage", + "caching": "ReadWrite", + "managedDisk": { + "storageAccountType": "Premium_LRS" + }, + "diskSizeGB": 64 + }, + "imageReference": { + "publisher": "[parameters('publisher')]", + "offer": "[parameters('offer')]", + "sku": "[parameters('sku')]", + "version": "[parameters('version')]" + } + }, + "diagnosticsProfile": { + "bootDiagnostics": { + "enabled": true + } + }, + "networkProfile": { + "networkInterfaceConfigurations": [ + { + "name": "[variables('nicName')]", + "properties": { + "primary": true, + "ipConfigurations": [ + { + "name": "ipconfig1", + "properties": { + "primary": true, + "publicIPAddressConfiguration": { + "name": "[variables('publicIPAddressName')]", + "properties": { + "idleTimeoutInMinutes": 15 + } + }, + "subnet": { + "id": "[variables('subnetRef')]" + } + } + } + ] + } + } + ] + } + }, + "upgradePolicy": { + "mode": "Automatic" + }, + "platformFaultDomainCount": 1 + } + } + ] +} diff --git a/tests_e2e/pipeline/pipeline-cleanup.yml b/tests_e2e/pipeline/pipeline-cleanup.yml index ba880a4f4..69e929be5 100644 --- a/tests_e2e/pipeline/pipeline-cleanup.yml +++ b/tests_e2e/pipeline/pipeline-cleanup.yml @@ -1,58 +1,56 @@ # # Pipeline for cleaning up any remaining Resource Groups generated by the Azure.WALinuxAgent pipeline. # -# Deletes any resource groups that are more than a day old and contain string "lisa-WALinuxAgent-" +# Deletes any resource groups that are older than 'older_than' and match the 'name_pattern' regular expression # -schedules: - - cron: "0 */12 * * *" # Run twice a day (every 12 hours) - displayName: cleanup build - branches: - include: - - develop - always: true -trigger: - - develop +parameters: + - name: name_pattern + displayName: Regular expression to match the name of the resource groups to delete + type: string + default: lisa-WALinuxAgent-.* -pr: none + - name: older_than + displayName: Delete resources older than (use the syntax of the "date -d" command) + type: string + default: 12 hours ago -pool: - vmImage: ubuntu-latest + - name: service_connections + type: object + default: + - azuremanagement + - azuremanagement.china + - azuremanagement.government -variables: - - name: azureConnection - value: 'azuremanagement' - - name: rgPrefix - value: 'lisa-WALinuxAgent-' +pool: + name: waagent-pool steps: + - ${{ each service_connection in parameters.service_connections }}: + - task: AzureCLI@2 + inputs: + azureSubscription: ${{ service_connection }} + scriptType: 'bash' + scriptLocation: 'inlineScript' + inlineScript: | + set -euxo pipefail - - task: AzureKeyVault@2 - displayName: "Fetch secrets from KV" - inputs: - azureSubscription: '$(azureConnection)' - KeyVaultName: 'dcrV2SPs' - SecretsFilter: '*' - RunAsPreJob: true + # + # We use the REST API to list the resource groups because we need the createdTime and that + # property is not available via the az-cli commands. + # + subscription_id=$(az account list --all --query "[?isDefault].id" -o tsv) + + date=$(date --utc +%Y-%m-%d'T'%H:%M:%S.%N'Z' -d "${{ parameters.older_than }}") - - task: AzureCLI@2 - inputs: - azureSubscription: '$(azureConnection)' - scriptType: 'bash' - scriptLocation: 'inlineScript' - inlineScript: | - set -euxo pipefail - date=`date --utc +%Y-%m-%d'T'%H:%M:%S.%N'Z' -d "1 day ago"` - - # Using the Azure REST GET resourceGroups API call as we can add the createdTime to the results. - # This feature is not available via the az-cli commands directly so we have to use the Azure REST APIs - - az rest --method GET \ - --url "https://management.azure.com/subscriptions/$(SUBSCRIPTION-ID)/resourcegroups" \ - --url-parameters api-version=2021-04-01 \$expand=createdTime \ - --output json \ - --query value \ - | jq --arg date "$date" '.[] | select (.createdTime < $date).name' \ - | grep "$(rgPrefix)" \ - | xargs -l -t -r az group delete --no-wait -y -n \ - || echo "No resource groups found to delete" + rest_endpoint=$(az cloud show --query "endpoints.resourceManager" -o tsv) + + pattern="${{ parameters.name_pattern }}" + + az rest --method GET \ + --url "${rest_endpoint}/subscriptions/${subscription_id}/resourcegroups" \ + --url-parameters api-version=2021-04-01 \$expand=createdTime \ + --output json \ + --query value \ + | jq --arg date "$date" '.[] | select (.createdTime < $date).name | match("'${pattern}'"; "g").string' \ + | xargs -l -t -r az group delete --subscription "${subscription_id}" --no-wait -y -n diff --git a/tests_e2e/pipeline/pipeline.yml b/tests_e2e/pipeline/pipeline.yml index 1de541634..35d3fe4c1 100644 --- a/tests_e2e/pipeline/pipeline.yml +++ b/tests_e2e/pipeline/pipeline.yml @@ -5,18 +5,20 @@ # parameters: + # # See the test wiki for a description of the parameters - - name: test_suites - displayName: Test Suites - type: string - default: agent_bvt - + # # NOTES: # * 'image', 'location' and 'vm_size' override any values in the test suites/images definition # files. Those parameters are useful for 1-off tests, like testing a VHD or checking if # an image is supported in a particular location. # * Azure Pipelines do not allow empty string for the parameter value, using "-" instead. # + - name: test_suites + displayName: Test Suites (comma-separated list of test suites to run) + type: string + default: "-" + - name: image displayName: Image (image/image set name, URN, or VHD) type: string @@ -41,25 +43,26 @@ parameters: - failed - no + - name: collect_lisa_logs + displayName: Collect LISA logs + type: boolean + default: false + - name: keep_environment displayName: Keep the test VMs (do not delete them) type: string - default: no + default: failed values: - always - failed - no -trigger: - - develop - -pr: none - pool: - vmImage: ubuntu-latest + name: waagent-pool jobs: - job: "ExecuteTests" + timeoutInMinutes: 90 steps: - task: UsePythonVersion@0 @@ -69,9 +72,24 @@ jobs: addToPath: true architecture: 'x64' - # Extract the Azure cloud from the "connection_info" variable and store it in the "cloud" variable. - # The cloud name is used as a suffix of the value for "connection_info" and comes after the last '-'. - - bash: echo "##vso[task.setvariable variable=cloud]$(echo $CONNECTION_INFO | sed 's/^.*-//')" + # Extract the Azure cloud from the "connection_info" variable. Its value includes one of + # 'public', 'china', or 'government' as a suffix (the suffix comes after the last '-'). + - bash: | + case $(echo $CONNECTION_INFO | sed 's/^.*-//') in + public) + echo "##vso[task.setvariable variable=cloud]AzureCloud" + ;; + china) + echo "##vso[task.setvariable variable=cloud]AzureChinaCloud" + ;; + government) + echo "##vso[task.setvariable variable=cloud]AzureUSGovernment" + ;; + *) + echo "Invalid CONNECTION_INFO: $CONNECTION_INFO" >&2 + exit 1 + ;; + esac displayName: "Set Cloud type" - task: DownloadSecureFile@1 @@ -105,12 +123,21 @@ jobs: TEST_SUITES: ${{ parameters.test_suites }} VM_SIZE: ${{ parameters.vm_size }} + - bash: $(Build.SourcesDirectory)/tests_e2e/pipeline/scripts/collect_artifacts.sh + displayName: "Collect test artifacts" + # Collect artifacts even if the previous step is cancelled (e.g. timeout) + condition: always() + env: + COLLECT_LISA_LOGS: ${{ parameters.collect_lisa_logs }} + - publish: $(Build.ArtifactStagingDirectory) artifact: 'artifacts' displayName: 'Publish test artifacts' + condition: always() - task: PublishTestResults@2 displayName: 'Publish test results' + condition: always() inputs: testResultsFormat: 'JUnit' testResultsFiles: 'runbook_logs/agent.junit.xml' diff --git a/tests_e2e/pipeline/scripts/collect_artifacts.sh b/tests_e2e/pipeline/scripts/collect_artifacts.sh new file mode 100755 index 000000000..4dc8ae0f5 --- /dev/null +++ b/tests_e2e/pipeline/scripts/collect_artifacts.sh @@ -0,0 +1,69 @@ +#!/usr/bin/env bash +# +# Moves the relevant logs to the staging directory +# +set -euxo pipefail + +# +# The execute_test.sh script gives ownership of the log directory to the 'waagent' user in +# the Docker container; re-take ownership +# +sudo find "$LOGS_DIRECTORY" -exec chown "$USER" {} \; + +# +# Move the logs for failed tests to a temporary location +# +mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp +for log in $(grep -l MARKER-LOG-WITH-ERRORS "$LOGS_DIRECTORY"/*.log); do + mv "$log" "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp +done + +# +# Move the environment logs to "environment_logs" +# +if ls "$LOGS_DIRECTORY"/env-*.log > /dev/null 2>&1; then + mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/environment_logs + mv "$LOGS_DIRECTORY"/env-*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/environment_logs +fi + +# +# Move the rest of the logs to "test_logs" +# +if ls "$LOGS_DIRECTORY"/*.log > /dev/null 2>&1; then + mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/test_logs + mv "$LOGS_DIRECTORY"/*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/test_logs +fi + +# +# Move the logs for failed tests to the main directory +# +if ls "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp/*.log > /dev/null 2>&1; then + mv "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp/*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY" +fi +rmdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp + +# +# Move the logs collected from the test VMs to vm_logs +# +if ls "$LOGS_DIRECTORY"/*.tgz > /dev/null 2>&1; then + mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/vm_logs + mv "$LOGS_DIRECTORY"/*.tgz "$BUILD_ARTIFACTSTAGINGDIRECTORY"/vm_logs +fi + +# +# Move the main LISA log and the JUnit report to "runbook_logs" +# +# Note that files created by LISA are under .../lisa//" +# +mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs +mv "$LOGS_DIRECTORY"/lisa/*/*/lisa-*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs +mv "$LOGS_DIRECTORY"/lisa/*/*/agent.junit.xml "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs + +# +# Move the rest of the LISA logs to "lisa_logs" +# +if [[ ${COLLECT_LISA_LOGS,,} == 'true' ]]; then # case-insensitive comparison + mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/lisa_logs + mv "$LOGS_DIRECTORY"/lisa/*/*/* "$BUILD_ARTIFACTSTAGINGDIRECTORY"/lisa_logs +fi + diff --git a/tests_e2e/pipeline/scripts/execute_tests.sh b/tests_e2e/pipeline/scripts/execute_tests.sh index 15c9f0b5f..9c185b333 100755 --- a/tests_e2e/pipeline/scripts/execute_tests.sh +++ b/tests_e2e/pipeline/scripts/execute_tests.sh @@ -2,6 +2,9 @@ set -euxo pipefail +echo "Hostname: $(hostname)" +echo "\$USER: $USER" + # # UID of 'waagent' in the Docker container # @@ -10,7 +13,7 @@ WAAGENT_UID=1000 # # Set the correct mode and owner for the private SSH key and generate the public key. # -cd "$HOME" +cd "$AGENT_TEMPDIRECTORY" mkdir ssh cp "$DOWNLOADSSHKEY_SECUREFILEPATH" ssh chmod 700 ssh/id_rsa @@ -26,10 +29,17 @@ chmod a+w "$BUILD_SOURCESDIRECTORY" # # Create the directory where the Docker container will create the test logs and give ownership to 'waagent' # -LOGS_DIRECTORY="$HOME/logs" +LOGS_DIRECTORY="$AGENT_TEMPDIRECTORY/logs" +echo "##vso[task.setvariable variable=logs_directory]$LOGS_DIRECTORY" mkdir "$LOGS_DIRECTORY" sudo chown "$WAAGENT_UID" "$LOGS_DIRECTORY" +# +# Give the current user access to the Docker daemon +# +sudo usermod -aG docker $USER +newgrp docker < /dev/null + # # Pull the container image used to execute the tests # @@ -39,6 +49,11 @@ docker pull waagenttests.azurecr.io/waagenttests:latest # Azure Pipelines does not allow an empty string as the value for a pipeline parameter; instead we use "-" to indicate # an empty value. Change "-" to "" for the variables that capture the parameter values. +if [[ $TEST_SUITES == "-" ]]; then + TEST_SUITES="" # Don't set the test_suites variable +else + TEST_SUITES="-v test_suites:\"$TEST_SUITES\"" +fi if [[ $IMAGE == "-" ]]; then IMAGE="" fi @@ -49,13 +64,14 @@ if [[ $VM_SIZE == "-" ]]; then VM_SIZE="" fi -# A test failure will cause automation to exit with an error code and we don't want this script to stop so we force the command -# to succeed and capture the exit code to return it at the end of the script. -echo "exit 0" > /tmp/exit.sh +# +# Get the external IP address of the VM. +# +IP_ADDRESS=$(curl -4 ifconfig.io/ip) docker run --rm \ --volume "$BUILD_SOURCESDIRECTORY:/home/waagent/WALinuxAgent" \ - --volume "$HOME"/ssh:/home/waagent/.ssh \ + --volume "$AGENT_TEMPDIRECTORY"/ssh:/home/waagent/.ssh \ --volume "$LOGS_DIRECTORY":/home/waagent/logs \ --env AZURE_CLIENT_ID \ --env AZURE_CLIENT_SECRET \ @@ -69,52 +85,11 @@ docker run --rm \ -v cloud:$CLOUD \ -v subscription_id:$SUBSCRIPTION_ID \ -v identity_file:\$HOME/.ssh/id_rsa \ - -v test_suites:\"$TEST_SUITES\" \ -v log_path:\$HOME/logs \ -v collect_logs:\"$COLLECT_LOGS\" \ -v keep_environment:\"$KEEP_ENVIRONMENT\" \ -v image:\"$IMAGE\" \ -v location:\"$LOCATION\" \ - -v vm_size:\"$VM_SIZE\"" \ -|| echo "exit $?" > /tmp/exit.sh - -# -# Re-take ownership of the logs directory -# -sudo find "$LOGS_DIRECTORY" -exec chown "$USER" {} \; - -# -# Move the relevant logs to the staging directory -# -# Move the logs for failed tests to a temporary location -mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp -for log in $(grep -l MARKER-LOG-WITH-ERRORS "$LOGS_DIRECTORY"/*.log); do - mv "$log" "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp -done -# Move the environment logs to "environment_logs" -if ls "$LOGS_DIRECTORY"/env-*.log > /dev/null 2>&1; then - mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/environment_logs - mv "$LOGS_DIRECTORY"/env-*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/environment_logs -fi -# Move the rest of the logs to "test_logs" -if ls "$LOGS_DIRECTORY"/*.log > /dev/null 2>&1; then - mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/test_logs - mv "$LOGS_DIRECTORY"/*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/test_logs -fi -# Move the logs for failed tests to the main directory -if ls "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp/*.log > /dev/null 2>&1; then - mv "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp/*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY" -fi -rmdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/tmp -# Move the logs collected from the test VMs to vm_logs -if ls "$LOGS_DIRECTORY"/*.tgz > /dev/null 2>&1; then - mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/vm_logs - mv "$LOGS_DIRECTORY"/*.tgz "$BUILD_ARTIFACTSTAGINGDIRECTORY"/vm_logs -fi -# Files created by LISA are under .../lisa//" -mkdir "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs -mv "$LOGS_DIRECTORY"/lisa/*/*/lisa-*.log "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs -mv "$LOGS_DIRECTORY"/lisa/*/*/agent.junit.xml "$BUILD_ARTIFACTSTAGINGDIRECTORY"/runbook_logs - -cat /tmp/exit.sh -bash /tmp/exit.sh + -v vm_size:\"$VM_SIZE\" \ + -v allow_ssh:\"$IP_ADDRESS\" \ + $TEST_SUITES" diff --git a/tests_e2e/pipeline/scripts/setup-agent.sh b/tests_e2e/pipeline/scripts/setup-agent.sh new file mode 100755 index 000000000..9b1316059 --- /dev/null +++ b/tests_e2e/pipeline/scripts/setup-agent.sh @@ -0,0 +1,54 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# Script to setup the agent VM for the Azure Pipelines agent pool; it simply installs the Azure CLI, the Docker Engine and jq. +# + +set -euox pipefail + +# Add delay per Azure Pipelines documentation +sleep 30 + +# Install Azure CLI +curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash + +# Add Docker's official GPG key: +sudo apt-get update +sudo apt-get install ca-certificates curl gnupg +sudo install -m 0755 -d /etc/apt/keyrings +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg +sudo chmod a+r /etc/apt/keyrings/docker.gpg + +# Add the repository to Apt sources: +echo \ +"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ +$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ +sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +sudo apt-get update + +# Install Docker Engine +sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin + +# Verify that Docker Engine is installed correctly by running the hello-world image. +sudo docker run hello-world + +# Install jq; it is used by the cleanup pipeline to parse the JSON output of the Azure CLI +sudo apt-get install -y jq + diff --git a/tests_e2e/test_suites/agent_bvt.yml b/tests_e2e/test_suites/agent_bvt.yml index 1f0f91405..8c840670f 100644 --- a/tests_e2e/test_suites/agent_bvt.yml +++ b/tests_e2e/test_suites/agent_bvt.yml @@ -1,8 +1,8 @@ name: "AgentBvt" tests: - - "bvts/extension_operations.py" - - "bvts/run_command.py" - - "bvts/vm_access.py" + - "agent_bvt/extension_operations.py" + - "agent_bvt/run_command.py" + - "agent_bvt/vm_access.py" images: - "endorsed" - "endorsed-arm64" diff --git a/tests_e2e/test_suites/agent_cgroups.yml b/tests_e2e/test_suites/agent_cgroups.yml new file mode 100644 index 000000000..d6d1fc0f1 --- /dev/null +++ b/tests_e2e/test_suites/agent_cgroups.yml @@ -0,0 +1,9 @@ +# +# The test suite verify the agent running in expected cgroups and also, checks agent tracking the cgroups for polling resource metrics. Also, it verifies the agent cpu quota is set as expected. +# +name: "AgentCgroups" +tests: + - "agent_cgroups/agent_cgroups.py" + - "agent_cgroups/agent_cpu_quota.py" +images: "cgroups-endorsed" +owns_vm: true \ No newline at end of file diff --git a/tests_e2e/test_suites/agent_ext_workflow.yml b/tests_e2e/test_suites/agent_ext_workflow.yml new file mode 100644 index 000000000..2d506c00d --- /dev/null +++ b/tests_e2e/test_suites/agent_ext_workflow.yml @@ -0,0 +1,14 @@ +name: "AgentExtWorkflow" +tests: + - "agent_ext_workflow/extension_workflow.py" +images: + - "centos_79" + - "suse_12" + - "rhel_79" + - "ubuntu_1604" + - "ubuntu_1804" +# This test suite uses the DCR Test Extension, which is only published in southcentralus region in public cloud +locations: "AzureCloud:southcentralus" +skip_on_clouds: + - "AzureChinaCloud" + - "AzureUSGovernment" diff --git a/tests_e2e/test_suites/agent_firewall.yml b/tests_e2e/test_suites/agent_firewall.yml new file mode 100644 index 000000000..0e095ba39 --- /dev/null +++ b/tests_e2e/test_suites/agent_firewall.yml @@ -0,0 +1,15 @@ +# +# This test verifies that the agent firewall rules are set correctly. The expected firewall rules are: +# 0 0 ACCEPT tcp -- * * 0.0.0.0/0 168.63.129.16 tcp dpt:53 +# 0 0 ACCEPT tcp -- * * 0.0.0.0/0 168.63.129.16 owner UID match 0 +# 0 0 DROP tcp -- * * 0.0.0.0/0 168.63.129.16 ctstate INVALID,NEW +# The first rule allows tcp traffic to port 53 for non root user. The second rule allows traffic to wireserver for root user. +# The third rule drops all other traffic to wireserver. +# +name: "AgentFirewall" +tests: + - "agent_firewall/agent_firewall.py" +images: + - "endorsed" + - "endorsed-arm64" +owns_vm: true # This vm cannot be shared with other tests because it modifies the firewall rules and agent status. \ No newline at end of file diff --git a/tests_e2e/test_suites/agent_not_provisioned.yml b/tests_e2e/test_suites/agent_not_provisioned.yml new file mode 100644 index 000000000..7c85353f0 --- /dev/null +++ b/tests_e2e/test_suites/agent_not_provisioned.yml @@ -0,0 +1,12 @@ +# +# Disables Agent provisioning using osProfile.linuxConfiguration.provisionVMAgent and verifies that the agent is disabled +# and extension operations are not allowed. +# +name: "AgentNotProvisioned" +tests: + - "agent_not_provisioned/agent_not_provisioned.py" +images: "random(endorsed)" +template: "agent_not_provisioned/disable_agent_provisioning.py" +owns_vm: true +install_test_agent: false + diff --git a/tests_e2e/test_suites/agent_persist_firewall.yml b/tests_e2e/test_suites/agent_persist_firewall.yml new file mode 100644 index 000000000..137f3af87 --- /dev/null +++ b/tests_e2e/test_suites/agent_persist_firewall.yml @@ -0,0 +1,19 @@ +# +# Iptable rules that agent add not persisted on reboot. So we use firewalld service if distro supports it otherwise agent creates custom service and only runs on boot before network up. +# so that attacker will not have room to contact the wireserver +# This test verifies that either of the service is active. Ensure those rules are added on boot and working as expected. +# +name: "AgentPersistFirewall" +tests: + - "agent_persist_firewall/agent_persist_firewall.py" +images: + - "endorsed" + - "endorsed-arm64" +owns_vm: true # This vm cannot be shared with other tests because it modifies the firewall rules and agent status. +# agent persist firewall service not running on flatcar distro since agent can't install custom service due to read only filesystem. +# so skipping the test run on flatcar distro. +# (2023-11-14T19:04:13.738695Z ERROR ExtHandler ExtHandler Unable to setup the persistent firewall rules: [Errno 30] Read-only file system: '/lib/systemd/system/waagent-network-setup.service) +skip_on_images: + - "flatcar" + - "flatcar_arm64" + - "debian_9" # TODO: Reboot is slow on debian_9. Need to investigate further. \ No newline at end of file diff --git a/tests_e2e/test_suites/agent_publish.yml b/tests_e2e/test_suites/agent_publish.yml new file mode 100644 index 000000000..3ab29c6a0 --- /dev/null +++ b/tests_e2e/test_suites/agent_publish.yml @@ -0,0 +1,12 @@ +# +# This test is used to verify that the agent will be updated after publishing a new version to the agent update channel. +# +name: "AgentPublish" +tests: + - "agent_publish/agent_publish.py" +images: + - "random(endorsed, 10)" + - "random(endorsed-arm64, 2)" +locations: "AzureCloud:centraluseuap" +owns_vm: true +install_test_agent: false \ No newline at end of file diff --git a/tests_e2e/test_suites/agent_status.yml b/tests_e2e/test_suites/agent_status.yml new file mode 100644 index 000000000..86acfe05e --- /dev/null +++ b/tests_e2e/test_suites/agent_status.yml @@ -0,0 +1,9 @@ +# +# This scenario validates the agent status is updated without any goal state changes +# +name: "AgentStatus" +tests: + - "agent_status/agent_status.py" +images: + - "endorsed" + - "endorsed-arm64" diff --git a/tests_e2e/test_suites/agent_update.yml b/tests_e2e/test_suites/agent_update.yml new file mode 100644 index 000000000..3d3d4918f --- /dev/null +++ b/tests_e2e/test_suites/agent_update.yml @@ -0,0 +1,15 @@ +# Scenario validates RSM and Self-updates paths +# RSM update: If vm enrolled into RSM, it will validate agent uses RSM to update to target version +# Self-update: If vm not enrolled into RSM, it will validate agent uses self-update to update to latest version published +name: "AgentUpdate" +tests: + - "agent_update/rsm_update.py" + - "agent_update/self_update.py" +images: + - "random(endorsed, 10)" +# - "random(endorsed-arm64, 2)" TODO: HGPA not deployed on some arm64 hosts(so agent stuck on Vmesttings calls as per contract) and will enable once HGPA deployed there +locations: "AzureCloud:eastus2euap" +owns_vm: true +skip_on_clouds: + - "AzureChinaCloud" + - "AzureUSGovernment" \ No newline at end of file diff --git a/tests_e2e/test_suites/agent_wait_for_cloud_init.yml b/tests_e2e/test_suites/agent_wait_for_cloud_init.yml new file mode 100644 index 000000000..727803811 --- /dev/null +++ b/tests_e2e/test_suites/agent_wait_for_cloud_init.yml @@ -0,0 +1,13 @@ +# +# This test verifies that the Agent waits for cloud-init to complete before it starts processing extensions. +# +# NOTE: This test is not fully automated. It requires a custom image where the test Agent has been installed and Extensions.WaitForCloudInit is enabled in waagent.conf. +# To execute it manually, create a custom image and use the 'image' runbook parameter, for example: "-v: image:gallery/wait-cloud-init/1.0.1". +# +name: "AgentWaitForCloudInit" +tests: + - "agent_wait_for_cloud_init/agent_wait_for_cloud_init.py" +template: "agent_wait_for_cloud_init/add_cloud_init_script.py" +install_test_agent: false +# Dummy image, since the parameter is required. The actual image needs to be passed as a parameter to the runbook. +images: "ubuntu_2204" diff --git a/tests_e2e/test_suites/ext_cgroups.yml b/tests_e2e/test_suites/ext_cgroups.yml new file mode 100644 index 000000000..4603393bf --- /dev/null +++ b/tests_e2e/test_suites/ext_cgroups.yml @@ -0,0 +1,13 @@ +# +# The test suite installs the few extensions and +# verify those extensions are running in expected cgroups and also, checks agent tracking those cgroups for polling resource metrics. +# +name: "ExtCgroups" +tests: + - "ext_cgroups/ext_cgroups.py" +images: "cgroups-endorsed" +# The DCR test extension installs sample service, so this test suite uses it to test services cgroups but this is only published in southcentralus region in public cloud. +locations: "AzureCloud:southcentralus" +skip_on_clouds: + - "AzureChinaCloud" + - "AzureUSGovernment" \ No newline at end of file diff --git a/tests_e2e/test_suites/ext_sequencing.yml b/tests_e2e/test_suites/ext_sequencing.yml new file mode 100644 index 000000000..1976a8502 --- /dev/null +++ b/tests_e2e/test_suites/ext_sequencing.yml @@ -0,0 +1,10 @@ +# +# Adds extensions with multiple dependencies to VMSS using 'provisionAfterExtensions' property and validates they are +# enabled in order of dependencies. +# +name: "ExtSequencing" +tests: + - "ext_sequencing/ext_sequencing.py" +images: "endorsed" +# This scenario is executed on instances of a scaleset created by the agent test suite. +executes_on_scale_set: true \ No newline at end of file diff --git a/tests_e2e/test_suites/ext_telemetry_pipeline.yml b/tests_e2e/test_suites/ext_telemetry_pipeline.yml new file mode 100644 index 000000000..f309f5cb8 --- /dev/null +++ b/tests_e2e/test_suites/ext_telemetry_pipeline.yml @@ -0,0 +1,9 @@ +# +# This test ensures that the agent does not throw any errors while trying to transmit events to wireserver. It does not +# validate if the events actually make it to wireserver +# +name: "ExtTelemetryPipeline" +tests: + - "agent_bvt/vm_access.py" + - "ext_telemetry_pipeline/ext_telemetry_pipeline.py" +images: "random(endorsed)" diff --git a/tests_e2e/test_suites/extensions_disabled.yml b/tests_e2e/test_suites/extensions_disabled.yml new file mode 100644 index 000000000..1e98dd9cc --- /dev/null +++ b/tests_e2e/test_suites/extensions_disabled.yml @@ -0,0 +1,9 @@ +# +# The test suite disables extension processing and verifies that extensions +# are not processed, but the agent continues reporting status. +# +name: "ExtensionsDisabled" +tests: + - "extensions_disabled/extensions_disabled.py" +images: "random(endorsed)" +owns_vm: true diff --git a/tests_e2e/test_suites/fail.yml b/tests_e2e/test_suites/fail.yml index 6cd3b01af..ae38db062 100644 --- a/tests_e2e/test_suites/fail.yml +++ b/tests_e2e/test_suites/fail.yml @@ -1,5 +1,7 @@ name: "Fail" tests: - - "fail_test.py" - - "error_test.py" -images: "ubuntu_1804" + - "samples/fail_test.py" + - "samples/fail_remote_test.py" + - "samples/error_test.py" + - "samples/error_remote_test.py" +images: "ubuntu_2004" diff --git a/tests_e2e/test_suites/fips.yml b/tests_e2e/test_suites/fips.yml new file mode 100644 index 000000000..bdff00098 --- /dev/null +++ b/tests_e2e/test_suites/fips.yml @@ -0,0 +1,16 @@ +# +# FIPS should not affect extension processing. The test enables FIPS and then executes an extension. +# +# NOTE: Enabling FIPS is very specific to the distro. This test is only executed on Mariner 2. +# +# TODO: Add other distros. +# +# NOTE: FIPS can be enabled on RHEL9 using these instructions: see https://access.redhat.com/solutions/137833#rhel9), +# but extensions with protected settings do not work end-to-end, since the Agent can't decrypt the tenant +# certificate. +# +name: "FIPS" +tests: + - source: "fips/fips.py" +images: "mariner_2" +owns_vm: true diff --git a/tests_e2e/test_suites/images.yml b/tests_e2e/test_suites/images.yml index 253f8a138..03c1bfd77 100644 --- a/tests_e2e/test_suites/images.yml +++ b/tests_e2e/test_suites/images.yml @@ -5,13 +5,13 @@ image-sets: # Endorsed distros that are tested on the daily runs endorsed: # -# TODO: Add CentOS 6.10 and Debian 8 +# TODO: Add Debian 8 # -# - "centos_610" # - "debian_8" # - "alma_9" - "centos_79" + - "centos_82" - "debian_9" - "debian_10" - "debian_11" @@ -28,6 +28,7 @@ image-sets: - "ubuntu_1804" - "ubuntu_2004" - "ubuntu_2204" + - "ubuntu_2404" # Endorsed distros (ARM64) that are tested on the daily runs endorsed-arm64: @@ -37,6 +38,20 @@ image-sets: - "rhel_90_arm64" - "ubuntu_2204_arm64" + # As of today agent only support and enabled resource governance feature on following distros + cgroups-endorsed: + - "centos_82" + - "rhel_82" + - "ubuntu_1604" + - "ubuntu_1804" + - "ubuntu_2004" + + # These distros use Python 2.6. Currently they are not tested on the daily runs; this image set is here just for reference. + python-26: + - "centos_610" + - "oracle_610" + - "rhel_610" + # # An image can be specified by a string giving its urn, as in # @@ -47,7 +62,7 @@ image-sets: # mariner_2_arm64: # urn: "microsoftcblmariner cbl-mariner cbl-mariner-2-arm64 latest" # locations: -# - "eastus" +# - AzureCloud: ["eastus"] # vm_sizes: # - "Standard_D2pls_v5" # @@ -55,40 +70,117 @@ image-sets: # two properties can be used to specify that the image is available only in # some locations, or that it can be used only on some VM sizes. # +# The 'locations' property consists of 3 items, one for each cloud (AzureCloud, +# AzureUSGovernment and AzureChinaCloud). For each of these items: +# +# - If the item is not present, the image is available in all locations for that cloud. +# - If the value is a list of locations, the image is available only in those locations +# - If the value is an empty list, the image is not available in that cloud. +# # URNs follow the format ' ' or # ':::' # images: - alma_9: "almalinux almalinux 9-gen2 latest" + alma_9: + urn: "almalinux almalinux 9-gen2 latest" + locations: + AzureChinaCloud: [] centos_610: "OpenLogic CentOS 6.10 latest" + centos_75: "OpenLogic CentOS 7.5 latest" centos_79: "OpenLogic CentOS 7_9 latest" + centos_82: + urn: "OpenLogic CentOS 8_2 latest" + vm_sizes: + # Since centos derived from redhat, please see the comment for vm size in rhel_82 + - "Standard_B2s" debian_8: "credativ Debian 8 latest" debian_9: "credativ Debian 9 latest" debian_10: "Debian debian-10 10 latest" debian_11: "Debian debian-11 11 latest" - debian_11_arm64: "Debian debian-11 11-backports-arm64 latest" - flatcar: "kinvolk flatcar-container-linux-free stable latest" + debian_11_arm64: + urn: "Debian debian-11 11-backports-arm64 latest" + locations: + AzureUSGovernment: [] + AzureChinaCloud: [] + flatcar: + urn: "kinvolk flatcar-container-linux-free stable latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] flatcar_arm64: urn: "kinvolk flatcar-container-linux-corevm stable latest" - vm_sizes: - - "Standard_D2pls_v5" - mariner_1: "microsoftcblmariner cbl-mariner cbl-mariner-1 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] + mariner_1: + urn: "microsoftcblmariner cbl-mariner cbl-mariner-1 latest" + locations: + AzureChinaCloud: [] mariner_2: "microsoftcblmariner cbl-mariner cbl-mariner-2 latest" mariner_2_arm64: urn: "microsoftcblmariner cbl-mariner cbl-mariner-2-arm64 latest" locations: - - "eastus" + AzureChinaCloud: [] + AzureUSGovernment: [] + oracle_610: "Oracle Oracle-Linux 6.10 latest" + oracle_75: + urn: "Oracle Oracle-Linux 7.5 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] + oracle_79: + urn: "Oracle Oracle-Linux ol79-gen2 latest" + locations: + AzureChinaCloud: [] + oracle_82: + urn: "Oracle Oracle-Linux ol82-gen2 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] + rhel_610: "RedHat RHEL 6.10 latest" + rhel_75: + urn: "RedHat RHEL 7.5 latest" + locations: + AzureChinaCloud: [] + rhel_79: + urn: "RedHat RHEL 7_9 latest" + locations: + AzureChinaCloud: [] + rhel_82: + urn: "RedHat RHEL 8.2 latest" + locations: + AzureChinaCloud: [] vm_sizes: - - "Standard_D2pls_v5" - rocky_9: "erockyenterprisesoftwarefoundationinc1653071250513 rockylinux-9 rockylinux-9 latest" + # Previously one user reported agent hang on this VM size for redhat 7+ but not observed in rhel 8. So I'm using same vm size to test agent cgroups scenario for rhel 8 to make sure we don't see any issue in automation. + - "Standard_B2s" + rhel_90: + urn: "RedHat RHEL 9_0 latest" + locations: + AzureChinaCloud: [] + rhel_90_arm64: + urn: "RedHat rhel-arm64 9_0-arm64 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] + rocky_9: + urn: "erockyenterprisesoftwarefoundationinc1653071250513 rockylinux-9 rockylinux-9 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] suse_12: "SUSE sles-12-sp5-basic gen1 latest" suse_15: "SUSE sles-15-sp2-basic gen2 latest" - rhel_79: "RedHat RHEL 7_9 latest" - rhel_82: "RedHat RHEL 8.2 latest" - rhel_90: "RedHat RHEL 9_0 latest" - rhel_90_arm64: "RedHat rhel-arm64 9_0-arm64 latest" ubuntu_1604: "Canonical UbuntuServer 16.04-LTS latest" ubuntu_1804: "Canonical UbuntuServer 18.04-LTS latest" ubuntu_2004: "Canonical 0001-com-ubuntu-server-focal 20_04-lts latest" ubuntu_2204: "Canonical 0001-com-ubuntu-server-jammy 22_04-lts latest" - ubuntu_2204_arm64: "Canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest" + ubuntu_2204_arm64: + urn: "Canonical 0001-com-ubuntu-server-jammy 22_04-lts-arm64 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] + ubuntu_2404: + # TODO: Currently using the daily build, update to the release build once it is available + urn: "Canonical 0001-com-ubuntu-server-noble-daily 24_04-daily-lts-gen2 latest" + locations: + AzureChinaCloud: [] + AzureUSGovernment: [] diff --git a/tests_e2e/test_suites/keyvault_certificates.yml b/tests_e2e/test_suites/keyvault_certificates.yml new file mode 100644 index 000000000..00c51db7d --- /dev/null +++ b/tests_e2e/test_suites/keyvault_certificates.yml @@ -0,0 +1,9 @@ +# +# This test verifies that the Agent can download and extract KeyVault certificates that use different encryption algorithms +# +name: "KeyvaultCertificates" +tests: + - "keyvault_certificates/keyvault_certificates.py" +images: + - "endorsed" + - "endorsed-arm64" diff --git a/tests_e2e/test_suites/multi_config_ext.yml b/tests_e2e/test_suites/multi_config_ext.yml new file mode 100644 index 000000000..24bdaa736 --- /dev/null +++ b/tests_e2e/test_suites/multi_config_ext.yml @@ -0,0 +1,9 @@ +# +# Multi-config extensions are no longer supported but there are still customers running RCv2 and we don't want to break +# them. This test suite is used to verify that the agent processes RCv2 (a multi-config extension) as expected. +# +name: "MultiConfigExt" +tests: + - "multi_config_ext/multi_config_ext.py" +images: + - "endorsed" diff --git a/tests_e2e/test_suites/no_outbound_connections.yml b/tests_e2e/test_suites/no_outbound_connections.yml new file mode 100644 index 000000000..b256b5146 --- /dev/null +++ b/tests_e2e/test_suites/no_outbound_connections.yml @@ -0,0 +1,20 @@ +# +# This suite is used to test the scenario where outbound connections are blocked on the VM. In this case, +# the agent should fallback to the HostGAPlugin to request any downloads. +# +# The suite uses a custom ARM template to create a VM with a Network Security Group that blocks all outbound +# connections. The first test in the suite verifies that the setup of the NSG was successful, then the rest +# of the tests exercise different extension operations. The last test in the suite checks the agent log +# to verify it did fallback to the HostGAPlugin to execute the extensions. +# +name: "NoOutboundConnections" +tests: + - source: "no_outbound_connections/check_no_outbound_connections.py" + blocks_suite: true # If the NSG is not setup correctly, there is no point in executing the rest of the tests. + - "agent_bvt/extension_operations.py" + - "agent_bvt/run_command.py" + - "agent_bvt/vm_access.py" + - "no_outbound_connections/check_fallback_to_hgap.py" +images: "random(endorsed)" +template: "no_outbound_connections/deny_outbound_connections.py" +owns_vm: true diff --git a/tests_e2e/test_suites/pass.yml b/tests_e2e/test_suites/pass.yml index 40b0e60b4..b80db63f5 100644 --- a/tests_e2e/test_suites/pass.yml +++ b/tests_e2e/test_suites/pass.yml @@ -1,4 +1,5 @@ name: "Pass" tests: - - "pass_test.py" + - "samples/pass_test.py" + - "samples/pass_remote_test.py" images: "ubuntu_2004" diff --git a/tests_e2e/test_suites/publish_hostname.yml b/tests_e2e/test_suites/publish_hostname.yml new file mode 100644 index 000000000..09864a4d6 --- /dev/null +++ b/tests_e2e/test_suites/publish_hostname.yml @@ -0,0 +1,8 @@ +# +# Changes hostname and checks that the agent published the updated hostname to dns. +# +name: "PublishHostname" +tests: + - "publish_hostname/publish_hostname.py" +images: + - "endorsed" \ No newline at end of file diff --git a/tests_e2e/test_suites/recover_network_interface.yml b/tests_e2e/test_suites/recover_network_interface.yml new file mode 100644 index 000000000..3021013d2 --- /dev/null +++ b/tests_e2e/test_suites/recover_network_interface.yml @@ -0,0 +1,17 @@ +# +# Brings the primary network interface down and checks that the agent can recover the network. +# +name: "RecoverNetworkInterface" +tests: + - "recover_network_interface/recover_network_interface.py" +images: +# TODO: This scenario should be run on all distros which bring the network interface down to publish hostname. Currently, only RedhatOSUtil attempts to recover the network interface if down after hostname publishing. + - "centos_79" + - "centos_75" + - "centos_82" + - "rhel_75" + - "rhel_79" + - "rhel_82" + - "oracle_75" + - "oracle_79" + - "oracle_82" \ No newline at end of file diff --git a/tests_e2e/test_suites/vmss.yml b/tests_e2e/test_suites/vmss.yml new file mode 100644 index 000000000..d9ca6be01 --- /dev/null +++ b/tests_e2e/test_suites/vmss.yml @@ -0,0 +1,8 @@ +# +# Sample test for scale sets +# +name: "VMSS" +tests: + - "samples/vmss_test.py" +executes_on_scale_set: true +images: "ubuntu_2004" diff --git a/tests_e2e/tests/bvts/extension_operations.py b/tests_e2e/tests/agent_bvt/extension_operations.py similarity index 81% rename from tests_e2e/tests/bvts/extension_operations.py rename to tests_e2e/tests/agent_bvt/extension_operations.py index e8a45ee44..52f39c775 100755 --- a/tests_e2e/tests/bvts/extension_operations.py +++ b/tests_e2e/tests/agent_bvt/extension_operations.py @@ -31,23 +31,20 @@ from azure.core.exceptions import ResourceNotFoundError -from tests_e2e.tests.lib.agent_test import AgentTest -from tests_e2e.tests.lib.identifiers import VmExtensionIds, VmExtensionIdentifier +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds, VmExtensionIdentifier from tests_e2e.tests.lib.logging import log from tests_e2e.tests.lib.ssh_client import SshClient -from tests_e2e.tests.lib.vm_extension import VmExtension +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient -class ExtensionOperationsBvt(AgentTest): +class ExtensionOperationsBvt(AgentVmTest): def run(self): - ssh_client: SshClient = SshClient( - ip_address=self._context.vm_ip_address, - username=self._context.username, - private_key_file=self._context.private_key_file) + ssh_client: SshClient = self._context.create_ssh_client() is_arm64: bool = ssh_client.get_architecture() == "aarch64" - custom_script_2_0 = VmExtension( + custom_script_2_0 = VirtualMachineExtensionClient( self._context.vm, VmExtensionIds.CustomScript, resource_name="CustomScript") @@ -58,14 +55,14 @@ def run(self): log.info("Installing %s", custom_script_2_0) message = f"Hello {uuid.uuid4()}!" custom_script_2_0.enable( - settings={ + protected_settings={ 'commandToExecute': f"echo \'{message}\'" }, auto_upgrade_minor_version=False ) custom_script_2_0.assert_instance_view(expected_version="2.0", expected_message=message) - custom_script_2_1 = VmExtension( + custom_script_2_1 = VirtualMachineExtensionClient( self._context.vm, VmExtensionIdentifier(VmExtensionIds.CustomScript.publisher, VmExtensionIds.CustomScript.type, "2.1"), resource_name="CustomScript") @@ -73,11 +70,11 @@ def run(self): if is_arm64: log.info("Installing %s", custom_script_2_1) else: - log.info("Updating %s to %s", custom_script_2_0, custom_script_2_1) + log.info("Updating %s", custom_script_2_0) message = f"Hello {uuid.uuid4()}!" custom_script_2_1.enable( - settings={ + protected_settings={ 'commandToExecute': f"echo \'{message}\'" } ) diff --git a/tests_e2e/tests/bvts/run_command.py b/tests_e2e/tests/agent_bvt/run_command.py similarity index 80% rename from tests_e2e/tests/bvts/run_command.py rename to tests_e2e/tests/agent_bvt/run_command.py index 188c12d3f..df5cdcf2b 100755 --- a/tests_e2e/tests/bvts/run_command.py +++ b/tests_e2e/tests/agent_bvt/run_command.py @@ -31,28 +31,25 @@ from assertpy import assert_that, soft_assertions from typing import Callable, Dict -from tests_e2e.tests.lib.agent_test import AgentTest -from tests_e2e.tests.lib.identifiers import VmExtensionIds +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds from tests_e2e.tests.lib.logging import log from tests_e2e.tests.lib.ssh_client import SshClient -from tests_e2e.tests.lib.vm_extension import VmExtension +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient -class RunCommandBvt(AgentTest): +class RunCommandBvt(AgentVmTest): class TestCase: - def __init__(self, extension: VmExtension, get_settings: Callable[[str], Dict[str, str]]): + def __init__(self, extension: VirtualMachineExtensionClient, get_settings: Callable[[str], Dict[str, str]]): self.extension = extension self.get_settings = get_settings def run(self): - ssh_client = SshClient( - ip_address=self._context.vm_ip_address, - username=self._context.username, - private_key_file=self._context.private_key_file) + ssh_client: SshClient = self._context.create_ssh_client() test_cases = [ RunCommandBvt.TestCase( - VmExtension(self._context.vm, VmExtensionIds.RunCommand, resource_name="RunCommand"), + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommand, resource_name="RunCommand"), lambda s: { "script": base64.standard_b64encode(bytearray(s, 'utf-8')).decode('utf-8') }) @@ -63,7 +60,7 @@ def run(self): else: test_cases.append( RunCommandBvt.TestCase( - VmExtension(self._context.vm, VmExtensionIds.RunCommandHandler, resource_name="RunCommandHandler"), + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, resource_name="RunCommandHandler"), lambda s: { "source": { "script": s diff --git a/tests_e2e/tests/bvts/vm_access.py b/tests_e2e/tests/agent_bvt/vm_access.py similarity index 74% rename from tests_e2e/tests/bvts/vm_access.py rename to tests_e2e/tests/agent_bvt/vm_access.py index 1af0f99e1..1c231809f 100755 --- a/tests_e2e/tests/bvts/vm_access.py +++ b/tests_e2e/tests/agent_bvt/vm_access.py @@ -28,19 +28,19 @@ from assertpy import assert_that from pathlib import Path -from tests_e2e.tests.lib.agent_test import AgentTest, TestSkipped -from tests_e2e.tests.lib.identifiers import VmExtensionIds +from tests_e2e.tests.lib.agent_test import AgentVmTest, TestSkipped +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds from tests_e2e.tests.lib.logging import log from tests_e2e.tests.lib.ssh_client import SshClient -from tests_e2e.tests.lib.vm_extension import VmExtension +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient -class VmAccessBvt(AgentTest): +class VmAccessBvt(AgentVmTest): def run(self): - ssh: SshClient = SshClient(ip_address=self._context.vm_ip_address, username=self._context.username, private_key_file=self._context.private_key_file) - if "-flatcar" in ssh.run_command("uname -a"): - raise TestSkipped("Currently VMAccess is not supported on Flatcar") + ssh_client: SshClient = self._context.create_ssh_client() + if not VmExtensionIds.VmAccess.supports_distro(ssh_client.run_command("get_distro.py").rstrip()): + raise TestSkipped("Currently VMAccess is not supported on this distro") # Try to use a unique username for each test run (note that we truncate to 32 chars to # comply with the rules for usernames) @@ -52,13 +52,13 @@ def run(self): private_key_file: Path = self._context.working_directory/f"{username}_rsa" public_key_file: Path = self._context.working_directory/f"{username}_rsa.pub" log.info("Generating SSH key as %s", private_key_file) - ssh = SshClient(ip_address=self._context.vm_ip_address, username=username, private_key_file=private_key_file) - ssh.generate_ssh_key(private_key_file) + ssh_client = SshClient(ip_address=self._context.ip_address, username=username, identity_file=private_key_file) + ssh_client.generate_ssh_key(private_key_file) with public_key_file.open() as f: public_key = f.read() # Invoke the extension - vm_access = VmExtension(self._context.vm, VmExtensionIds.VmAccess, resource_name="VmAccess") + vm_access = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.VmAccess, resource_name="VmAccess") vm_access.enable( protected_settings={ 'username': username, @@ -70,7 +70,7 @@ def run(self): # Verify the user was added correctly by starting an SSH session to the VM log.info("Verifying SSH connection to the test VM") - stdout = ssh.run_command("echo -n $USER") + stdout = ssh_client.run_command("echo -n $USER") assert_that(stdout).described_as("Output from SSH command").is_equal_to(username) log.info("SSH command output ($USER): %s", stdout) diff --git a/tests_e2e/tests/agent_cgroups/agent_cgroups.py b/tests_e2e/tests/agent_cgroups/agent_cgroups.py new file mode 100644 index 000000000..449c5c362 --- /dev/null +++ b/tests_e2e/tests/agent_cgroups/agent_cgroups.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class AgentCgroups(AgentVmTest): + """ + This test verifies that the agent is running in the expected cgroups. + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + + def run(self): + log.info("=====Prepare agent=====") + log.info("Restarting agent service to make sure service starts with new configuration that was setup by the cgroupconfigurator") + self._ssh_client.run_command("agent-service restart", use_sudo=True) + log.info("=====Validating agent cgroups=====") + self._run_remote_test(self._ssh_client, "agent_cgroups-check_cgroups_agent.py") + log.info("Successfully Verified that agent present in correct cgroups") + + +if __name__ == "__main__": + AgentCgroups.run_from_command_line() diff --git a/tests_e2e/tests/agent_cgroups/agent_cpu_quota.py b/tests_e2e/tests/agent_cgroups/agent_cpu_quota.py new file mode 100644 index 000000000..be66428b9 --- /dev/null +++ b/tests_e2e/tests/agent_cgroups/agent_cpu_quota.py @@ -0,0 +1,40 @@ +from typing import List, Dict, Any + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class AgentCPUQuota(AgentVmTest): + """ + The test verify that the agent detects when it is throttled for using too much CPU, that it detects processes that do belong to the agent's cgroup, and that resource metrics are generated. + """ + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + + def run(self): + log.info("=====Validating agent cpu quota checks") + self._run_remote_test(self._ssh_client, "agent_cpu_quota-check_agent_cpu_quota.py", use_sudo=True) + log.info("Successfully Verified that agent running in expected CPU quotas") + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + ignore_rules = [ + # This is produced by the test, so it is expected + # Examples: + # 2023-10-03T17:59:03.007572Z INFO MonitorHandler ExtHandler [CGW] Disabling resource usage monitoring. Reason: Check on cgroups failed: + # [CGroupsException] The agent's cgroup includes unexpected processes: ['[PID: 3190] /usr/bin/python3\x00/home/azureuser/bin/agent_cpu_quota-start_servi', '[PID: 3293] dd\x00if=/dev/zero\x00of=/dev/null\x00'] + # [CGroupsException] The agent has been throttled for 5.7720997 seconds + {'message': r"Disabling resource usage monitoring. Reason: Check on cgroups failed"}, + # This may happen during service stop while terminating the process + # Example: + # 2022-03-11T21:11:11.713161Z ERROR E2ETest [Errno 3] No such process: + {'message': r'E2ETest.*No such process'}, + # 2022-10-26T15:38:39.655677Z ERROR E2ETest 'dd if=/dev/zero of=/dev/null' failed: -15 (): + {'message': r"E2ETest.*dd.*failed: -15"} + ] + return ignore_rules + + +if __name__ == "__main__": + AgentCPUQuota.run_from_command_line() diff --git a/tests_e2e/tests/agent_ext_workflow/README.md b/tests_e2e/tests/agent_ext_workflow/README.md new file mode 100644 index 000000000..a8d59fc15 --- /dev/null +++ b/tests_e2e/tests/agent_ext_workflow/README.md @@ -0,0 +1,45 @@ +# Agent Extension Worflow Test + +This scenario tests if the correct extension workflow sequence is being executed from the agent. + +### GuestAgentDcrTestExtension + +This is a test extension that exists for the sole purpose of testing the extension workflow of agent. This is currently deployed to SCUS only. + +All the extension does is prints the settings['name'] out to stdout. It is run everytime enable is called. + +Another important feature of this extension is that it maintains a `operations-.log` **for every operation that the agent executes on that extension**. We use this to confirm that the agent executed the correct sequence of operations. + +Sample operations-.log file snippet - +```text +Date:2019-07-30T21:54:03Z; Operation:install; SeqNo:0 +Date:2019-07-30T21:54:05Z; Operation:enable; SeqNo:0 +Date:2019-07-30T21:54:37Z; Operation:enable; SeqNo:1 +Date:2019-07-30T21:55:20Z; Operation:disable; SeqNo:1 +Date:2019-07-30T21:55:22Z; Operation:uninstall; SeqNo:1 +``` +The setting for this extension is of the format - +```json +{ + "name": String +} +``` +##### Repo link +https://github.com/larohra/GuestAgentDcrTestExtension + +##### Available Versions: +- 1.1.5 - Version with Basic functionalities as mentioned above +- 1.2.0 - Same functionalities as above with `"updateMode": "UpdateWithInstall"` in HandlerManifest.json to test update case +- 1.3.0 - Same functionalities as above with `"updateMode": "UpdateWithoutInstall"` in HandlerManifest.json to test update case + +### Test Sequence + +- Install the test extension on the VM +- Assert the extension status by checking if our Enable string matches the status message (We receive the status message by using the Azure SDK by polling for the VM instance view and parsing the extension status message) + +The Enable string of our test is of the following format (this is set in the `Settings` object when we call enable from the tests ) - +```text +[ExtensionName]-[Version], Count: [Enable-count] +``` +- Match the operation sequence as per the test and make sure they are in the correct chronological order +- Restart the agent and verify if the correct operation sequence is followed \ No newline at end of file diff --git a/tests_e2e/tests/agent_ext_workflow/extension_workflow.py b/tests_e2e/tests/agent_ext_workflow/extension_workflow.py new file mode 100644 index 000000000..b5a377e72 --- /dev/null +++ b/tests_e2e/tests/agent_ext_workflow/extension_workflow.py @@ -0,0 +1,443 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from azure.mgmt.compute.models import VirtualMachineExtensionInstanceView +from assertpy import assert_that, soft_assertions +from random import choice +import uuid + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds, VmExtensionIdentifier +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class ExtensionWorkflow(AgentVmTest): + """ + This scenario tests if the correct extension workflow sequence is being executed from the agent. It installs the + GuestAgentDcrTestExtension on the test VM and makes requests to install, enable, update, and delete the extension + from the VM. The GuestAgentDcrTestExtension maintains a local `operations-.log` for every operation that + the agent executes on that extension. We use this to confirm that the agent executed the correct sequence of + operations. + + Sample operations-.log file snippet - + Date:2019-07-30T21:54:03Z; Operation:install; SeqNo:0 + Date:2019-07-30T21:54:05Z; Operation:enable; SeqNo:0 + Date:2019-07-30T21:54:37Z; Operation:enable; SeqNo:1 + Date:2019-07-30T21:55:20Z; Operation:disable; SeqNo:1 + Date:2019-07-30T21:55:22Z; Operation:uninstall; SeqNo:1 + + The setting for the GuestAgentDcrTestExtension is of the format - + { + "name": String + } + + Test sequence - + - Install the test extension on the VM + - Assert the extension status by checking if our Enable string matches the status message + - The Enable string of our test is of the following format (this is set in the `Settings` object when we c + all enable from the tests): [ExtensionName]-[Version], Count: [Enable-count] + - Match the operation sequence as per the test and make sure they are in the correct chronological order + - Restart the agent and verify if the correct operation sequence is followed + """ + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = context.create_ssh_client() + + # This class represents the GuestAgentDcrTestExtension running on the test VM + class GuestAgentDcrTestExtension: + COUNT_KEY_NAME = "Count" + NAME_KEY_NAME = "name" + DATA_KEY_NAME = "data" + + def __init__(self, extension: VirtualMachineExtensionClient, ssh_client: SshClient, version: str): + self.extension = extension + self.name = "GuestAgentDcrTestExt" + self.version = version + self.expected_message = "" + self.enable_count = 0 + self.ssh_client = ssh_client + self.data = None + + def modify_ext_settings_and_enable(self, data=None): + self.enable_count += 1 + + # Settings follows the following format: [ExtensionName]-[Version], Count: [Enable-count] + setting_name = "%s-%s, %s: %s" % (self.name, self.version, self.COUNT_KEY_NAME, self.enable_count) + # We include data in the settings to test the special characters case. The settings with data follows the + # following format: [ExtensionName]-[Version], Count: [Enable-count], data: [data] + if data is not None: + setting_name = "{0}, {1}: {2}".format(setting_name, self.DATA_KEY_NAME, data) + + self.expected_message = setting_name + settings = {self.NAME_KEY_NAME: setting_name.encode('utf-8')} + + log.info("") + log.info("Add or update extension {0} , version {1}, settings {2}".format(self.extension, self.version, + settings)) + self.extension.enable(settings=settings, auto_upgrade_minor_version=False) + + def assert_instance_view(self, data=None): + log.info("") + + # If data is not None, we want to assert the instance view has the expected data + if data is None: + log.info("Assert instance view has expected message for test extension. Expected version: {0}, " + "Expected message: {1}".format(self.version, self.expected_message)) + self.extension.assert_instance_view(expected_version=self.version, + expected_message=self.expected_message) + else: + self.data = data + log.info("Assert instance view has expected data for test extension. Expected version: {0}, " + "Expected data: {1}".format(self.version, data)) + self.extension.assert_instance_view(expected_version=self.version, + assert_function=self.assert_data_in_instance_view) + + def assert_data_in_instance_view(self, instance_view: VirtualMachineExtensionInstanceView): + log.info("Asserting extension status ...") + status_message = instance_view.statuses[0].message + log.info("Status message: %s" % status_message) + + with soft_assertions(): + expected_ext_version = "%s-%s" % (self.name, self.version) + assert_that(expected_ext_version in status_message).described_as( + f"Specific extension version name should be in the InstanceView message ({expected_ext_version})").is_true() + + expected_count = "%s: %s" % (self.COUNT_KEY_NAME, self.enable_count) + assert_that(expected_count in status_message).described_as( + f"Expected count should be in the InstanceView message ({expected_count})").is_true() + + if self.data is not None: + expected_data = "{0}: {1}".format(self.DATA_KEY_NAME, self.data) + assert_that(expected_data in status_message).described_as( + f"Expected data should be in the InstanceView message ({expected_data})").is_true() + + def execute_assertion_script(self, file_name, args): + log.info("") + log.info("Running {0} remotely with arguments {1}".format(file_name, args)) + result = self.ssh_client.run_command(f"{file_name} {args}", use_sudo=True) + log.info(result) + log.info("Assertion completed successfully") + + def assert_scenario(self, file_name: str, command_args: str, assert_status: bool = False, restart_agent: list = None, data: str = None): + # Assert the extension status by checking if our Enable string matches the status message in the instance + # view + if assert_status: + self.assert_instance_view(data=data) + + # Remotely execute the assertion script + self.execute_assertion_script(file_name, command_args) + + # Restart the agent and test the status again if enabled (by checking the operations.log file in the VM) + # Restarting agent should just run enable again and rerun the same settings + if restart_agent is not None: + log.info("") + log.info("Restarting the agent...") + output = self.ssh_client.run_command("agent-service restart", use_sudo=True) + log.info("Restart completed:\n%s", output) + + for args in restart_agent: + self.execute_assertion_script('agent_ext_workflow-assert_operation_sequence.py', args) + + if assert_status: + self.assert_instance_view() + + def update_ext_version(self, extension: VirtualMachineExtensionClient, version: str): + self.extension = extension + self.version = version + + def run(self): + is_arm64: bool = self._ssh_client.get_architecture() == "aarch64" + + if is_arm64: + log.info("Skipping test case for %s, since it has not been published on ARM64", VmExtensionIds.GuestAgentDcrTestExtension) + else: + log.info("") + log.info("*******Verifying the extension install scenario*******") + + # Record the time we start the test + start_time = self._ssh_client.run_command("date '+%Y-%m-%dT%TZ'").rstrip() + + # Create DcrTestExtension with version 1.1.5 + dcr_test_ext_id_1_1 = VmExtensionIdentifier( + VmExtensionIds.GuestAgentDcrTestExtension.publisher, + VmExtensionIds.GuestAgentDcrTestExtension.type, + "1.1" + ) + dcr_test_ext_client = VirtualMachineExtensionClient( + self._context.vm, + dcr_test_ext_id_1_1, + resource_name="GuestAgentDcrTestExt" + ) + dcr_ext = ExtensionWorkflow.GuestAgentDcrTestExtension( + extension=dcr_test_ext_client, + ssh_client=self._ssh_client, + version="1.1.5" + ) + + # Install test extension on the VM + dcr_ext.modify_ext_settings_and_enable() + + # command_args are the args we pass to the agent_ext_workflow-assert_operation_sequence.py file to verify + # the operation sequence for the current test + command_args = f"--start-time {start_time} " \ + f"normal_ops_sequence " \ + f"--version {dcr_ext.version} " \ + f"--ops install enable" + # restart_agentcommand_args are the args we pass to the agent_ext_workflow-assert_operation_sequence.py file + # to verify the operation sequence after restarting the agent. Restarting agent should just run enable again + # and rerun the same settings + restart_agent_command_args = [f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {dcr_ext.version} " + f"--ops install enable enable"] + + # Assert the operation sequence to confirm the agent executed the operations in the correct chronological + # order + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-assert_operation_sequence.py', + command_args=command_args, + assert_status=True, + restart_agent=restart_agent_command_args + ) + + log.info("") + log.info("*******Verifying the extension enable scenario*******") + + # Record the time we start the test + start_time = self._ssh_client.run_command("date '+%Y-%m-%dT%TZ'").rstrip() + + # Enable test extension on the VM + dcr_ext.modify_ext_settings_and_enable() + + command_args = f"--start-time {start_time} " \ + f"normal_ops_sequence " \ + f"--version {dcr_ext.version} " \ + f"--ops enable" + restart_agent_command_args = [f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {dcr_ext.version} " + f"--ops enable enable"] + + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-assert_operation_sequence.py', + command_args=command_args, + assert_status=True, + restart_agent=restart_agent_command_args + ) + + log.info("") + log.info("*******Verifying the extension enable with special characters scenario*******") + + test_guid = str(uuid.uuid4()) + random_special_char_sentences = [ + "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen Wolther spillede på xylofon.", + "Falsches Üben von Xylophonmusik quält jeden größeren Zwerg", + "Zwölf Boxkämpfer jagten Eva quer über den Sylter Deich", + "Heizölrückstoßabdämpfung", + "Γαζέες καὶ μυρτιὲς δὲν θὰ βρῶ πιὰ στὸ χρυσαφὶ ξέφωτο", + "Ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία", + "El pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío, añoraba a su querido cachorro.", + "Portez ce vieux whisky au juge blond qui fume sur son île intérieure, à côté de l'alcôve ovoïde, où les bûches" + ] + sentence = choice(random_special_char_sentences) + test_str = "{0}; Special chars: {1}".format(test_guid, sentence) + + # Enable test extension on the VM + dcr_ext.modify_ext_settings_and_enable(data=test_str) + + command_args = f"--data {test_guid}" + + # We first ensure that the stdout contains the special characters and then we check if the test_guid is + # logged atleast once in the agent log to ensure that there were no errors when handling special characters + # in the agent + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-check_data_in_agent_log.py', + command_args=command_args, + assert_status=True, + data=test_guid + ) + + log.info("") + log.info("*******Verifying the extension uninstall scenario*******") + + # Record the time we start the test + start_time = self._ssh_client.run_command("date '+%Y-%m-%dT%TZ'").rstrip() + + # Remove the test extension on the VM + log.info("Delete %s from VM", dcr_test_ext_client) + dcr_ext.extension.delete() + + command_args = f"--start-time {start_time} " \ + f"normal_ops_sequence " \ + f"--version {dcr_ext.version} " \ + f"--ops disable uninstall" + restart_agent_command_args = [f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {dcr_ext.version} " + f"--ops disable uninstall"] + + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-assert_operation_sequence.py', + command_args=command_args, + restart_agent=restart_agent_command_args + ) + + log.info("") + log.info("*******Verifying the extension update with install scenario*******") + + # Record the time we start the test + start_time = self._ssh_client.run_command("date '+%Y-%m-%dT%TZ'").rstrip() + + # Version 1.2.0 of the test extension has the same functionalities as 1.1.5 with + # "updateMode": "UpdateWithInstall" in HandlerManifest.json to test update case + new_version_update_mode_with_install = "1.2.0" + old_version = "1.1.5" + + # Create DcrTestExtension with version 1.1 and 1.2 + dcr_test_ext_id_1_2 = VmExtensionIdentifier( + VmExtensionIds.GuestAgentDcrTestExtension.publisher, + VmExtensionIds.GuestAgentDcrTestExtension.type, + "1.2" + ) + dcr_test_ext_client_1_2 = VirtualMachineExtensionClient( + self._context.vm, + dcr_test_ext_id_1_2, + resource_name="GuestAgentDcrTestExt" + ) + dcr_ext = ExtensionWorkflow.GuestAgentDcrTestExtension( + extension=dcr_test_ext_client, + ssh_client=self._ssh_client, + version=old_version + ) + + # Install test extension v1.1.5 on the VM and assert instance view + dcr_ext.modify_ext_settings_and_enable() + dcr_ext.assert_instance_view() + + # Update extension object & version to new version + dcr_ext.update_ext_version(dcr_test_ext_client_1_2, new_version_update_mode_with_install) + + # Install test extension v1.2.0 on the VM + dcr_ext.modify_ext_settings_and_enable() + + command_args = f"--start-time {start_time} " \ + f"update_sequence " \ + f"--old-version {old_version} " \ + f"--old-ver-ops disable uninstall " \ + f"--new-version {new_version_update_mode_with_install} " \ + f"--new-ver-ops update install enable " \ + f"--final-ops disable update uninstall install enable" + restart_agent_command_args = [ + f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {old_version} " + f"--ops disable uninstall", + f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {new_version_update_mode_with_install} " + f"--ops update install enable enable" + ] + + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-assert_operation_sequence.py', + command_args=command_args, + assert_status=True, + restart_agent=restart_agent_command_args + ) + + log.info("") + log.info("Delete %s from VM", dcr_test_ext_client_1_2) + dcr_ext.extension.delete() + + log.info("") + log.info("*******Verifying the extension update without install scenario*******") + + # Record the time we start the test + start_time = self._ssh_client.run_command("date '+%Y-%m-%dT%TZ'").rstrip() + + # Version 1.3.0 of the test extension has the same functionalities as 1.1.5 with + # "updateMode": "UpdateWithoutInstall" in HandlerManifest.json to test update case + new_version_update_mode_without_install = "1.3.0" + + # Create DcrTestExtension with version 1.1 and 1.3 + dcr_test_ext_id_1_3 = VmExtensionIdentifier( + VmExtensionIds.GuestAgentDcrTestExtension.publisher, + VmExtensionIds.GuestAgentDcrTestExtension.type, + "1.3") + dcr_test_ext_client_1_3 = VirtualMachineExtensionClient( + self._context.vm, + dcr_test_ext_id_1_3, + resource_name="GuestAgentDcrTestExt" + ) + dcr_ext = ExtensionWorkflow.GuestAgentDcrTestExtension( + extension=dcr_test_ext_client, + ssh_client=self._ssh_client, + version=old_version + ) + + # Install test extension v1.1.5 on the VM and assert instance view + dcr_ext.modify_ext_settings_and_enable() + dcr_ext.assert_instance_view() + + # Update extension object & version to new version + dcr_ext.update_ext_version(dcr_test_ext_client_1_3, new_version_update_mode_without_install) + + # Install test extension v1.3.0 on the VM + dcr_ext.modify_ext_settings_and_enable() + + command_args = f"--start-time {start_time} " \ + f"update_sequence " \ + f"--old-version {old_version} " \ + f"--old-ver-ops disable uninstall " \ + f"--new-version {new_version_update_mode_without_install} " \ + f"--new-ver-ops update enable " \ + f"--final-ops disable update uninstall enable" + restart_agent_command_args = [ + f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {old_version} " + f"--ops disable uninstall", + f"--start-time {start_time} " + f"normal_ops_sequence " + f"--version {new_version_update_mode_without_install} " + f"--ops update enable enable" + ] + + dcr_ext.assert_scenario( + file_name='agent_ext_workflow-assert_operation_sequence.py', + command_args=command_args, + assert_status=True, + restart_agent=restart_agent_command_args + ) + + log.info("") + log.info("*******Verifying no lag between agent start and gs processing*******") + + log.info("") + log.info("Running agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py remotely...") + result = self._ssh_client.run_command("agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py", use_sudo=True) + log.info(result) + log.info("Validation for no lag time between agent start and gs processing completed successfully") + + +if __name__ == "__main__": + ExtensionWorkflow.run_from_command_line() diff --git a/tests_e2e/tests/agent_firewall/agent_firewall.py b/tests_e2e/tests/agent_firewall/agent_firewall.py new file mode 100644 index 000000000..c5b789dea --- /dev/null +++ b/tests_e2e/tests/agent_firewall/agent_firewall.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class AgentFirewall(AgentVmTest): + """ + This test verifies the agent firewall rules are added properly. It checks each firewall rule is present and working as expected. + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + + def run(self): + log.info("Checking iptable rules added by the agent") + self._run_remote_test(self._ssh_client, f"agent_firewall-verify_all_firewall_rules.py --user {self._context.username}", use_sudo=True) + log.info("Successfully verified all rules present and working as expected.") + + +if __name__ == "__main__": + AgentFirewall.run_from_command_line() + + diff --git a/tests_e2e/tests/agent_not_provisioned/agent_not_provisioned.py b/tests_e2e/tests/agent_not_provisioned/agent_not_provisioned.py new file mode 100755 index 000000000..103c8b44c --- /dev/null +++ b/tests_e2e/tests/agent_not_provisioned/agent_not_provisioned.py @@ -0,0 +1,99 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from assertpy import fail, assert_that +from typing import Any, Dict, List + +from azure.mgmt.compute.models import VirtualMachineInstanceView + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class AgentNotProvisioned(AgentVmTest): + """ + When osProfile.linuxConfiguration.provisionVMAgent is set to 'false', this test verifies that + the agent is disabled and that extension operations are not allowed. + """ + def run(self): + # + # Check the agent's log for the messages that indicate it is disabled. + # + ssh_client: SshClient = self._context.create_ssh_client() + + log.info("Checking the Agent's log to verify that it is disabled.") + try: + output = ssh_client.run_command(""" + # We need to wait for the agent to start and hit the disable code, give it a few minutes + n=18 + for i in $(seq $n); do + grep -E 'WARNING.*Daemon.*Disabling guest agent in accordance with ovf-env.xml' /var/log/waagent.log || \ + grep -E 'WARNING.*Daemon.*Disabling the guest agent by sleeping forever; to re-enable, remove /var/lib/waagent/disable_agent and restart' /var/log/waagent.log + if [[ $? == 0 ]]; then + exit 0 + fi + echo "Did not find the expected message in the agent's log, retrying after sleeping for a few seconds (attempt $i/$n)..." + sleep 10 + done + echo "Did not find the expected message in the agent's log, giving up." + exit 1 + """) + log.info("The Agent is disabled, log message: [%s]", output.rstrip()) + except CommandError as e: + fail(f"The agent's log does not contain the expected messages: {e}") + + # + # Validate that the agent is not reporting status. + # + log.info("Verifying that the Agent status is 'Not Ready' (i.e. it is not reporting status).") + instance_view: VirtualMachineInstanceView = self._context.vm.get_instance_view() + log.info("Instance view of VM Agent:\n%s", instance_view.vm_agent.serialize()) + assert_that(instance_view.vm_agent.statuses).described_as("The VM agent should have exactly 1 status").is_length(1) + assert_that(instance_view.vm_agent.statuses[0].code).described_as("The VM Agent should not be available").is_equal_to('ProvisioningState/Unavailable') + assert_that(instance_view.vm_agent.statuses[0].display_status).described_as("The VM Agent should not ready").is_equal_to('Not Ready') + log.info("The Agent status is 'Not Ready'") + + # + # Validate that extensions cannot be executed. + # + log.info("Verifying that extension processing is disabled.") + log.info("Executing CustomScript; it should fail.") + custom_script = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript, resource_name="CustomScript") + try: + custom_script.enable(settings={'commandToExecute': "date"}, force_update=True, timeout=20 * 60) + fail("CustomScript should have failed") + except Exception as error: + assert_that("OperationNotAllowed" in str(error)) \ + .described_as(f"Expected an OperationNotAllowed: {error}") \ + .is_true() + log.info("CustomScript failed, as expected: %s", error) + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + return [ + {'message': 'Disabling guest agent in accordance with ovf-env.xml'}, + {'message': 'Disabling the guest agent by sleeping forever; to re-enable, remove /var/lib/waagent/disable_agent and restart'} + ] + + +if __name__ == "__main__": + AgentNotProvisioned.run_from_command_line() + diff --git a/tests_e2e/tests/agent_not_provisioned/disable_agent_provisioning.py b/tests_e2e/tests/agent_not_provisioned/disable_agent_provisioning.py new file mode 100755 index 000000000..af3bc738a --- /dev/null +++ b/tests_e2e/tests/agent_not_provisioned/disable_agent_provisioning.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from typing import Any, Dict + +from tests_e2e.tests.lib.update_arm_template import UpdateArmTemplate + + +class DisableAgentProvisioning(UpdateArmTemplate): + """ + Updates the ARM template to set osProfile.linuxConfiguration.provisionVMAgent to false. + """ + def update(self, template: Dict[str, Any], is_lisa_template: bool) -> None: + if not is_lisa_template: + raise Exception('This test can only customize LISA ARM templates.') + + # + # NOTE: LISA's template uses this function to generate the value for osProfile.linuxConfiguration. The function is + # under the 'lisa' namespace. We set 'provisionVMAgent' to False. + # + # "getLinuxConfiguration": { + # "parameters": [ + # ... + # ], + # "output": { + # "type": "object", + # "value": { + # "disablePasswordAuthentication": true, + # "ssh": { + # "publicKeys": [ + # { + # "path": "[parameters('keyPath')]", + # "keyData": "[parameters('publicKeyData')]" + # } + # ] + # }, + # "provisionVMAgent": true + # } + # } + # } + # + get_linux_configuration = self.get_lisa_function(template, 'getLinuxConfiguration') + output = self.get_function_output(get_linux_configuration) + if output.get('customData') is not None: + raise Exception(f"The getOSProfile function already has a 'customData'. Won't override it. Definition: {get_linux_configuration}") + output['provisionVMAgent'] = False + diff --git a/tests_e2e/tests/agent_persist_firewall/agent_persist_firewall.py b/tests_e2e/tests/agent_persist_firewall/agent_persist_firewall.py new file mode 100644 index 000000000..5bfeb403a --- /dev/null +++ b/tests_e2e/tests/agent_persist_firewall/agent_persist_firewall.py @@ -0,0 +1,78 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient + + +class AgentPersistFirewallTest(AgentVmTest): + """ + This test verifies agent setup persist firewall rules using custom network setup service or firewalld service. Ensure those rules are added on boot and working as expected. + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client: SshClient = self._context.create_ssh_client() + + def run(self): + self._test_setup() + # Test case 1: After test agent install, verify firewalld or network.setup is running + self._verify_persist_firewall_service_running() + # Test case 2: Perform reboot and ensure firewall rules added on boot and working as expected + self._context.vm.restart(wait_for_boot=True, ssh_client=self._ssh_client) + self._verify_persist_firewall_service_running() + self._verify_firewall_rules_on_boot("first_boot") + # Test case 3: Disable the agent(so that agent won't get started after reboot) + # perform reboot and ensure firewall rules added on boot even after agent is disabled + self._disable_agent() + self._context.vm.restart(wait_for_boot=True, ssh_client=self._ssh_client) + self._verify_persist_firewall_service_running() + self._verify_firewall_rules_on_boot("second_boot") + # Test case 4: perform firewalld rules deletion and ensure deleted rules added back to rule set after agent start + self._verify_firewall_rules_readded() + + def _test_setup(self): + log.info("Doing test setup") + self._run_remote_test(self._ssh_client, f"agent_persist_firewall-test_setup {self._context.username}", + use_sudo=True) + log.info("Successfully completed test setup\n") + + def _verify_persist_firewall_service_running(self): + log.info("Verifying persist firewall service is running") + self._run_remote_test(self._ssh_client, "agent_persist_firewall-verify_persist_firewall_service_running.py", + use_sudo=True) + log.info("Successfully verified persist firewall service is running\n") + + def _verify_firewall_rules_on_boot(self, boot_name): + log.info("Verifying firewall rules on {0}".format(boot_name)) + self._run_remote_test(self._ssh_client, f"agent_persist_firewall-verify_firewall_rules_on_boot.py --user {self._context.username} --boot_name {boot_name}", + use_sudo=True) + log.info("Successfully verified firewall rules on {0}".format(boot_name)) + + def _disable_agent(self): + log.info("Disabling agent") + self._run_remote_test(self._ssh_client, "agent-service disable", use_sudo=True) + log.info("Successfully disabled agent\n") + + def _verify_firewall_rules_readded(self): + log.info("Verifying firewall rules readded") + self._run_remote_test(self._ssh_client, "agent_persist_firewall-verify_firewalld_rules_readded.py", + use_sudo=True) + log.info("Successfully verified firewall rules readded\n") diff --git a/tests_e2e/tests/agent_publish/agent_publish.py b/tests_e2e/tests/agent_publish/agent_publish.py new file mode 100644 index 000000000..0cf51c331 --- /dev/null +++ b/tests_e2e/tests/agent_publish/agent_publish.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import uuid +from datetime import datetime +from typing import Any, Dict, List + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds, VmExtensionIdentifier +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class AgentPublishTest(AgentVmTest): + """ + This script verifies if the agent update performed in the vm. + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client: SshClient = self._context.create_ssh_client() + + def run(self): + """ + we run the scenario in the following steps: + 1. Print the current agent version before the update + 2. Prepare the agent for the update + 3. Check for agent update from the log + 4. Print the agent version after the update + 5. Ensure CSE is working + """ + self._get_agent_info() + self._prepare_agent() + self._check_update() + self._get_agent_info() + self._check_cse() + + def get_ignore_errors_before_timestamp(self) -> datetime: + timestamp = self._ssh_client.run_command("agent_publish-get_agent_log_record_timestamp.py") + return datetime.strptime(timestamp.strip(), u'%Y-%m-%d %H:%M:%S.%f') + + def _get_agent_info(self) -> None: + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info('Agent info \n%s', stdout) + + def _prepare_agent(self) -> None: + log.info("Modifying agent update related config flags and renaming the log file") + self._run_remote_test(self._ssh_client, "sh -c 'agent-service stop && mv /var/log/waagent.log /var/log/waagent.$(date --iso-8601=seconds).log && update-waagent-conf AutoUpdate.UpdateToLatestVersion=y AutoUpdate.GAFamily=Test AutoUpdate.Enabled=y Extensions.Enabled=y'", use_sudo=True) + log.info('Renamed log file and updated agent-update DownloadNewAgents GAFamily config flags') + + def _check_update(self) -> None: + log.info("Verifying for agent update status") + self._run_remote_test(self._ssh_client, "agent_publish-check_update.py") + log.info('Successfully checked the agent update') + + def _check_cse(self) -> None: + custom_script_2_1 = VirtualMachineExtensionClient( + self._context.vm, + VmExtensionIdentifier(VmExtensionIds.CustomScript.publisher, VmExtensionIds.CustomScript.type, "2.1"), + resource_name="CustomScript") + + log.info("Installing %s", custom_script_2_1) + message = f"Hello {uuid.uuid4()}!" + custom_script_2_1.enable( + settings={ + 'commandToExecute': f"echo \'{message}\'" + }, + auto_upgrade_minor_version=False + ) + custom_script_2_1.assert_instance_view(expected_version="2.1", expected_message=message) + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + ignore_rules = [ + # + # This is expected as latest version can be the less than test version + # + # WARNING ExtHandler ExtHandler Agent WALinuxAgent-9.9.9.9 is permanently blacklisted + # + { + 'message': r"Agent WALinuxAgent-9.9.9.9 is permanently blacklisted" + } + + ] + return ignore_rules + + +if __name__ == "__main__": + AgentPublishTest.run_from_command_line() diff --git a/tests_e2e/tests/agent_status/agent_status.py b/tests_e2e/tests/agent_status/agent_status.py new file mode 100644 index 000000000..c02a3f4bf --- /dev/null +++ b/tests_e2e/tests/agent_status/agent_status.py @@ -0,0 +1,192 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# Validates the agent status is updated without processing additional goal states (aside from the first goal state +# from fabric) +# + +from azure.mgmt.compute.models import VirtualMachineInstanceView, InstanceViewStatus, VirtualMachineAgentInstanceView +from assertpy import assert_that +from datetime import datetime, timedelta +from time import sleep +import json + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class RetryableAgentStatusException(BaseException): + pass + + +class AgentStatus(AgentVmTest): + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + + def validate_instance_view_vmagent_status(self, instance_view: VirtualMachineInstanceView): + status: InstanceViewStatus = instance_view.vm_agent.statuses[0] + + # Validate message field + if status.message is None: + raise RetryableAgentStatusException("Agent status is invalid: 'message' property in instance view is None") + elif 'unresponsive' in status.message: + raise RetryableAgentStatusException("Agent status is invalid: Instance view shows unresponsive agent") + + # Validate display status field + if status.display_status is None: + raise RetryableAgentStatusException("Agent status is invalid: 'display_status' property in instance view is None") + elif 'Not Ready' in status.display_status: + raise RetryableAgentStatusException("Agent status is invalid: Instance view shows agent status is not ready") + + # Validate time field + if status.time is None: + raise RetryableAgentStatusException("Agent status is invalid: 'time' property in instance view is None") + + def validate_instance_view_vmagent(self, instance_view: VirtualMachineInstanceView): + """ + Checks that instance view has vm_agent.statuses and vm_agent.vm_agent_version properties which report the Guest + Agent as running and Ready: + + "vm_agent": { + "extension_handlers": [], + "vm_agent_version": "9.9.9.9", + "statuses": [ + { + "level": "Info", + "time": "2023-08-11T09:13:01.000Z", + "message": "Guest Agent is running", + "code": "ProvisioningState/succeeded", + "display_status": "Ready" + } + ] + } + """ + # Using dot operator for properties here because azure.mgmt.compute.models has classes for InstanceViewStatus + # and VirtualMachineAgentInstanceView. All the properties we validate are attributes of these classes and + # initialized to None + if instance_view.vm_agent is None: + raise RetryableAgentStatusException("Agent status is invalid: 'vm_agent' property in instance view is None") + + # Validate vm_agent_version field + vm_agent: VirtualMachineAgentInstanceView = instance_view.vm_agent + if vm_agent.vm_agent_version is None: + raise RetryableAgentStatusException("Agent status is invalid: 'vm_agent_version' property in instance view is None") + elif 'Unknown' in vm_agent.vm_agent_version: + raise RetryableAgentStatusException("Agent status is invalid: Instance view shows agent version is unknown") + + # Validate statuses field + if vm_agent.statuses is None: + raise RetryableAgentStatusException("Agent status is invalid: 'statuses' property in instance view is None") + elif len(instance_view.vm_agent.statuses) < 1: + raise RetryableAgentStatusException("Agent status is invalid: Instance view is missing an agent status entry") + else: + self.validate_instance_view_vmagent_status(instance_view=instance_view) + + log.info("Instance view has valid agent status, agent version: {0}, status: {1}" + .format(vm_agent.vm_agent_version, vm_agent.statuses[0].display_status)) + + def check_status_updated(self, status_timestamp: datetime, prev_status_timestamp: datetime, gs_processed_log: str, prev_gs_processed_log: str): + log.info("") + log.info("Check that the agent status updated without processing any additional goal states...") + + # If prev_ variables are not updated, then this is the first reported agent status + if prev_status_timestamp is not None and prev_gs_processed_log is not None: + # The agent status timestamp should be greater than the prev timestamp + if status_timestamp > prev_status_timestamp: + log.info( + "Current agent status timestamp {0} is greater than previous status timestamp {1}" + .format(status_timestamp, prev_status_timestamp)) + else: + raise RetryableAgentStatusException("Agent status failed to update: Current agent status timestamp {0} " + "is not greater than previous status timestamp {1}" + .format(status_timestamp, prev_status_timestamp)) + + # The last goal state processed in the agent log should be the same as before + if prev_gs_processed_log == gs_processed_log: + log.info( + "The last processed goal state is the same as the last processed goal state in the last agent " + "status update: \n{0}".format(gs_processed_log) + .format(status_timestamp, prev_status_timestamp)) + else: + raise Exception("Agent status failed to update without additional goal state: The agent processed an " + "additional goal state since the last agent status update. \n{0}" + "".format(gs_processed_log)) + + log.info("") + log.info("The agent status successfully updated without additional goal states") + + def run(self): + log.info("") + log.info("*******Verifying the agent status updates 3 times*******") + + timeout = datetime.now() + timedelta(minutes=6) + instance_view_exception = None + status_updated = 0 + prev_status_timestamp = None + prev_gs_processed_log = None + + # Retry validating agent status updates 2 times with timeout of 6 minutes + while datetime.now() <= timeout and status_updated < 2: + instance_view = self._context.vm.get_instance_view() + log.info("") + log.info( + "Check instance view to validate that the Guest Agent reports valid status...") + log.info("Instance view of VM is:\n%s", json.dumps(instance_view.serialize(), indent=2)) + + try: + # Validate the guest agent reports valid status + self.validate_instance_view_vmagent(instance_view) + + status_timestamp = instance_view.vm_agent.statuses[0].time + gs_processed_log = self._ssh_client.run_command( + "agent_status-get_last_gs_processed.py", use_sudo=True) + + self.check_status_updated(status_timestamp, prev_status_timestamp, gs_processed_log, prev_gs_processed_log) + + # Update variables with timestamps for this update + status_updated += 1 + prev_status_timestamp = status_timestamp + prev_gs_processed_log = gs_processed_log + + # Sleep 30s to allow agent status to update before we check again + sleep(30) + + except RetryableAgentStatusException as e: + instance_view_exception = str(e) + log.info("") + log.info(instance_view_exception) + log.info("Waiting 30s before retry...") + sleep(30) + + # If status_updated is 0, we know the agent status in the instance view was never valid + log.info("") + assert_that(status_updated > 0).described_as( + "Timeout has expired, instance view has invalid agent status: {0}".format( + instance_view_exception)).is_true() + + # Fail the test if we weren't able to validate the agent status updated 3 times + assert_that(status_updated == 2).described_as( + "Timeout has expired, the agent status failed to update 2 times").is_true() + + +if __name__ == "__main__": + AgentStatus.run_from_command_line() diff --git a/tests_e2e/tests/agent_update/rsm_update.py b/tests_e2e/tests/agent_update/rsm_update.py new file mode 100644 index 000000000..89c186a2f --- /dev/null +++ b/tests_e2e/tests/agent_update/rsm_update.py @@ -0,0 +1,279 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# BVT for the agent update scenario +# +# The test verifies agent update for rsm workflow. This test covers three scenarios downgrade, upgrade and no update. +# For each scenario, we initiate the rsm request with target version and then verify agent updated to that target version. +# +import json +import re +from typing import List, Dict, Any + +import requests +from assertpy import assert_that, fail +from azure.identity import DefaultAzureCredential +from azure.mgmt.compute.models import VirtualMachine +from msrestazure.azure_cloud import Cloud + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.azure_clouds import AZURE_CLOUDS +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry_if_false +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient + + +class RsmUpdateBvt(AgentVmTest): + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + self._installed_agent_version = "9.9.9.9" + self._downgrade_version = "9.9.9.9" + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + ignore_rules = [ + # + # This is expected as we validate the downgrade scenario + # + # WARNING ExtHandler ExtHandler Agent WALinuxAgent-9.9.9.9 is permanently blacklisted + # Note: Version varies depending on the pipeline branch the test is running on + { + 'message': rf"Agent WALinuxAgent-{self._installed_agent_version} is permanently blacklisted", + 'if': lambda r: r.prefix == 'ExtHandler' and self._installed_agent_version > self._downgrade_version + }, + # We don't allow downgrades below then daemon version + # 2023-07-11T02:28:21.249836Z WARNING ExtHandler ExtHandler [AgentUpdateError] The Agent received a request to downgrade to version 1.4.0.0, but downgrading to a version less than the Agent installed on the image (1.4.0.1) is not supported. Skipping downgrade. + # + { + 'message': r"downgrading to a version less than the Agent installed on the image.* is not supported" + } + + ] + return ignore_rules + + def run(self) -> None: + # retrieve the installed agent version in the vm before run the scenario + self._retrieve_installed_agent_version() + # Allow agent to send supported feature flag + self._verify_agent_reported_supported_feature_flag() + + log.info("*******Verifying the Agent Downgrade scenario*******") + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info("Current agent version running on the vm before update is \n%s", stdout) + self._downgrade_version: str = "2.3.15.0" + log.info("Attempting downgrade version %s", self._downgrade_version) + self._request_rsm_update(self._downgrade_version) + self._check_rsm_gs(self._downgrade_version) + self._prepare_agent() + # Verify downgrade scenario + self._verify_guest_agent_update(self._downgrade_version) + self._verify_agent_reported_update_status(self._downgrade_version) + + # Verify upgrade scenario + log.info("*******Verifying the Agent Upgrade scenario*******") + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info("Current agent version running on the vm before update is \n%s", stdout) + upgrade_version: str = "2.3.15.1" + log.info("Attempting upgrade version %s", upgrade_version) + self._request_rsm_update(upgrade_version) + self._check_rsm_gs(upgrade_version) + self._verify_guest_agent_update(upgrade_version) + self._verify_agent_reported_update_status(upgrade_version) + + # verify no version update. + log.info("*******Verifying the no version update scenario*******") + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info("Current agent version running on the vm before update is \n%s", stdout) + current_version: str = "2.3.15.1" + log.info("Attempting update version same as current version %s", current_version) + self._request_rsm_update(current_version) + self._check_rsm_gs(current_version) + self._verify_guest_agent_update(current_version) + self._verify_agent_reported_update_status(current_version) + + # verify requested version below daemon version + # All the daemons set to 2.2.53, so requesting version below daemon version + log.info("*******Verifying requested version below daemon version scenario*******") + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info("Current agent version running on the vm before update is \n%s", stdout) + version: str = "1.5.0.0" + log.info("Attempting requested version %s", version) + self._request_rsm_update(version) + self._check_rsm_gs(version) + self._verify_no_guest_agent_update(version) + self._verify_agent_reported_update_status(version) + + def _check_rsm_gs(self, requested_version: str) -> None: + # This checks if RSM GS available to the agent after we send the rsm update request + log.info( + 'Executing wait_for_rsm_gs.py remote script to verify latest GS contain requested version after rsm update requested') + self._run_remote_test(self._ssh_client, f"agent_update-wait_for_rsm_gs.py --version {requested_version}", + use_sudo=True) + log.info('Verified latest GS contain requested version after rsm update requested') + + def _prepare_agent(self) -> None: + """ + This method is to ensure agent is ready for accepting rsm updates. As part of that we update following flags + 1) Changing daemon version since daemon has a hard check on agent version in order to update agent. It doesn't allow versions which are less than daemon version. + 2) Updating GAFamily type "Test" and GAUpdates flag to process agent updates on test versions. + """ + log.info( + 'Executing modify_agent_version remote script to update agent installed version to lower than requested version') + output: str = self._ssh_client.run_command("agent_update-modify_agent_version 2.2.53", use_sudo=True) + log.info('Successfully updated agent installed version \n%s', output) + log.info( + 'Executing update-waagent-conf remote script to update agent update config flags to allow and download test versions') + output: str = self._ssh_client.run_command( + "update-waagent-conf AutoUpdate.UpdateToLatestVersion=y Debug.EnableGAVersioning=y AutoUpdate.GAFamily=Test", use_sudo=True) + log.info('Successfully updated agent update config \n %s', output) + + @staticmethod + def _verify_agent_update_flag_enabled(vm: VirtualMachineClient) -> bool: + result: VirtualMachine = vm.get_model() + flag: bool = result.os_profile.linux_configuration.enable_vm_agent_platform_updates + if flag is None: + return False + return flag + + def _enable_agent_update_flag(self, vm: VirtualMachineClient) -> None: + osprofile = { + "location": self._context.vm.location, # location is required field + "properties": { + "osProfile": { + "linuxConfiguration": { + "enableVMAgentPlatformUpdates": True + } + } + } + } + log.info("updating the vm with osProfile property:\n%s", osprofile) + vm.update(osprofile) + + def _request_rsm_update(self, requested_version: str) -> None: + """ + This method is to simulate the rsm request. + First we ensure the PlatformUpdates enabled in the vm and then make a request using rest api + """ + if not self._verify_agent_update_flag_enabled(self._context.vm): + # enable the flag + log.info("Attempting vm update to set the enableVMAgentPlatformUpdates flag") + self._enable_agent_update_flag(self._context.vm) + log.info("Updated the enableVMAgentPlatformUpdates flag to True") + else: + log.info("Already enableVMAgentPlatformUpdates flag set to True") + + cloud: Cloud = AZURE_CLOUDS[self._context.vm.cloud] + credential: DefaultAzureCredential = DefaultAzureCredential(authority=cloud.endpoints.active_directory) + token = credential.get_token(cloud.endpoints.resource_manager + "/.default") + headers = {'Authorization': 'Bearer ' + token.token, 'Content-Type': 'application/json'} + # Later this api call will be replaced by azure-python-sdk wrapper + base_url = cloud.endpoints.resource_manager + url = base_url + "/subscriptions/{0}/resourceGroups/{1}/providers/Microsoft.Compute/virtualMachines/{2}/" \ + "UpgradeVMAgent?api-version=2022-08-01".format(self._context.vm.subscription, + self._context.vm.resource_group, + self._context.vm.name) + data = { + "target": "Microsoft.OSTCLinuxAgent.Test", + "targetVersion": requested_version + } + + log.info("Attempting rsm upgrade post request to endpoint: {0} with data: {1}".format(url, data)) + response = requests.post(url, data=json.dumps(data), headers=headers) + if response.status_code == 202: + log.info("RSM upgrade request accepted") + else: + raise Exception("Error occurred while making RSM upgrade request. Status code : {0} and msg: {1}".format( + response.status_code, response.content)) + + def _verify_guest_agent_update(self, requested_version: str) -> None: + """ + Verify current agent version running on rsm requested version + """ + + def _check_agent_version(requested_version: str) -> bool: + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + expected_version = f"Goal state agent: {requested_version}" + if expected_version in waagent_version: + return True + else: + return False + + waagent_version: str = "" + log.info("Verifying agent updated to requested version: {0}".format(requested_version)) + success: bool = retry_if_false(lambda: _check_agent_version(requested_version)) + if not success: + fail("Guest agent didn't update to requested version {0} but found \n {1}. \n " + "To debug verify if CRP has upgrade operation around that time and also check if agent log has any errors ".format( + requested_version, waagent_version)) + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info( + f"Successfully verified agent updated to requested version. Current agent version running:\n {waagent_version}") + + def _verify_no_guest_agent_update(self, version: str) -> None: + """ + verify current agent version is not updated to requested version + """ + log.info("Verifying no update happened to agent") + current_agent: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + assert_that(current_agent).does_not_contain(version).described_as( + f"Agent version changed.\n Current agent {current_agent}") + log.info("Verified agent was not updated to requested version") + + def _verify_agent_reported_supported_feature_flag(self): + """ + RSM update rely on supported flag that agent sends to CRP.So, checking if GA reports feature flag from the agent log + """ + + log.info( + "Executing verify_versioning_supported_feature.py remote script to verify agent reported supported feature flag, so that CRP can send RSM update request") + self._run_remote_test(self._ssh_client, "agent_update-verify_versioning_supported_feature.py", use_sudo=True) + log.info("Successfully verified that Agent reported VersioningGovernance supported feature flag") + + def _verify_agent_reported_update_status(self, version: str): + """ + Verify if the agent reported update status to CRP after update performed + """ + + log.info( + "Executing verify_agent_reported_update_status.py remote script to verify agent reported update status for version {0}".format( + version)) + self._run_remote_test(self._ssh_client, + f"agent_update-verify_agent_reported_update_status.py --version {version}", use_sudo=True) + log.info("Successfully Agent reported update status for version {0}".format(version)) + + def _retrieve_installed_agent_version(self): + """ + Retrieve the installed agent version + """ + log.info("Retrieving installed agent version") + stdout: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info("Retrieved installed agent version \n {0}".format(stdout)) + match = re.search(r'.*Goal state agent: (\S*)', stdout) + if match: + self._installed_agent_version = match.groups()[0] + else: + log.warning("Unable to retrieve installed agent version and set to default value {0}".format( + self._installed_agent_version)) + + +if __name__ == "__main__": + RsmUpdateBvt.run_from_command_line() diff --git a/tests_e2e/tests/agent_update/self_update.py b/tests_e2e/tests/agent_update/self_update.py new file mode 100644 index 000000000..2aedb72f4 --- /dev/null +++ b/tests_e2e/tests/agent_update/self_update.py @@ -0,0 +1,172 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os +from pathlib import Path +from threading import RLock + +from assertpy import fail + +import azurelinuxagent +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry_if_false +from tests_e2e.tests.lib.shell import run_command + + +class SelfUpdateBvt(AgentVmTest): + """ + This test case is to verify that the agent can update itself to the latest version using self-update path when vm not enrolled to RSM updates + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + self._test_version = "2.8.9.9" + self._test_pkg_name = f"WALinuxAgent-{self._test_version}.zip" + + _setup_lock = RLock() + + def run(self): + log.info("Verifying agent updated to latest version from custom test version") + self._test_setup() + self._verify_agent_updated_to_latest_version() + + log.info("Verifying agent remains on custom test version when AutoUpdate.UpdateToLatestVersion=n") + self._test_setup_and_update_to_latest_version_false() + self._verify_agent_remains_on_custom_test_version() + + def _test_setup(self) -> None: + """ + Builds the custom test agent pkg as some lower version and installs it on the vm + """ + self._build_custom_test_agent() + output: str = self._ssh_client.run_command( + f"agent_update-self_update_test_setup --package ~/tmp/{self._test_pkg_name} --version {self._test_version} --update_to_latest_version y", + use_sudo=True) + log.info("Successfully installed custom test agent pkg version \n%s", output) + + def _build_custom_test_agent(self) -> None: + """ + Builds the custom test pkg + """ + with self._setup_lock: + agent_source_path: Path = self._context.working_directory / "source" + source_pkg_path: Path = agent_source_path / "eggs" / f"{self._test_pkg_name}" + if source_pkg_path.exists(): + log.info("The test pkg already exists at %s, skipping build", source_pkg_path) + else: + if agent_source_path.exists(): + os.rmdir(agent_source_path) # Remove if partial build exists + source_directory: Path = Path(azurelinuxagent.__path__[0]).parent + copy_cmd: str = f"cp -r {source_directory} {agent_source_path}" + log.info("Copying agent source %s to %s", source_directory, agent_source_path) + run_command(copy_cmd, shell=True) + if not agent_source_path.exists(): + raise Exception( + f"The agent source was not copied to the expected path {agent_source_path}") + version_file: Path = agent_source_path / "azurelinuxagent" / "common" / "version.py" + version_cmd = rf"""sed -E -i "s/^AGENT_VERSION\s+=\s+'[0-9.]+'/AGENT_VERSION = '{self._test_version}'/g" {version_file}""" + log.info("Setting agent version to %s to build new pkg", self._test_version) + run_command(version_cmd, shell=True) + makepkg_file: Path = agent_source_path / "makepkg.py" + build_cmd: str = f"env PYTHONPATH={agent_source_path} python3 {makepkg_file} -o {agent_source_path}" + log.info("Building custom test agent pkg version %s", self._test_version) + run_command(build_cmd, shell=True) + if not source_pkg_path.exists(): + raise Exception( + f"The test pkg was not created at the expected path {source_pkg_path}") + target_path: Path = Path("~") / "tmp" + log.info("Copying %s to %s:%s", source_pkg_path, self._context.vm, target_path) + self._ssh_client.copy_to_node(source_pkg_path, target_path) + + def _verify_agent_updated_to_latest_version(self) -> None: + """ + Verifies the agent updated to latest version from custom test version. + We retrieve latest version from goal state and compare with current agent version running as that latest version + """ + latest_version: str = self._ssh_client.run_command("agent_update-self_update_latest_version.py", + use_sudo=True).rstrip() + self._verify_guest_agent_update(latest_version) + # Verify agent updated to latest version by custom test agent + self._ssh_client.run_command( + "agent_update-self_update_check.py --latest-version {0} --current-version {1}".format(latest_version, + self._test_version)) + + def _verify_guest_agent_update(self, latest_version: str) -> None: + """ + Verify current agent version running on latest version + """ + + def _check_agent_version(latest_version: str) -> bool: + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + expected_version = f"Goal state agent: {latest_version}" + if expected_version in waagent_version: + return True + else: + return False + + waagent_version: str = "" + log.info("Verifying agent updated to latest version: {0}".format(latest_version)) + success: bool = retry_if_false(lambda: _check_agent_version(latest_version), delay=60) + if not success: + fail("Guest agent didn't update to latest version {0} but found \n {1}".format( + latest_version, waagent_version)) + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info( + f"Successfully verified agent updated to latest version. Current agent version running:\n {waagent_version}") + + def _test_setup_and_update_to_latest_version_false(self) -> None: + """ + Builds the custom test agent pkg as some lower version and installs it on the vm + Also modify the configuration AutoUpdate.UpdateToLatestVersion=n + """ + self._build_custom_test_agent() + output: str = self._ssh_client.run_command( + f"agent_update-self_update_test_setup --package ~/tmp/{self._test_pkg_name} --version {self._test_version} --update_to_latest_version n", + use_sudo=True) + log.info("Successfully installed custom test agent pkg version \n%s", output) + + def _verify_agent_remains_on_custom_test_version(self) -> None: + """ + Verifies the agent remains on custom test version when UpdateToLatestVersion=n + """ + + def _check_agent_version(version: str) -> bool: + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + expected_version = f"Goal state agent: {version}" + if expected_version in waagent_version: + return True + else: + return False + + waagent_version: str = "" + log.info("Verifying if current agent on version: {0}".format(self._test_version)) + success: bool = retry_if_false(lambda: _check_agent_version(self._test_version), delay=60) + if not success: + fail("Guest agent was on different version than expected version {0} and found \n {1}".format( + self._test_version, waagent_version)) + waagent_version: str = self._ssh_client.run_command("waagent-version", use_sudo=True) + log.info( + f"Successfully verified agent stayed on test version. Current agent version running:\n {waagent_version}") + + +if __name__ == "__main__": + SelfUpdateBvt.run_from_command_line() diff --git a/tests_e2e/tests/agent_wait_for_cloud_init/add_cloud_init_script.py b/tests_e2e/tests/agent_wait_for_cloud_init/add_cloud_init_script.py new file mode 100755 index 000000000..1fbc60adc --- /dev/null +++ b/tests_e2e/tests/agent_wait_for_cloud_init/add_cloud_init_script.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import base64 + +from typing import Any, Dict + +from tests_e2e.tests.agent_wait_for_cloud_init.agent_wait_for_cloud_init import AgentWaitForCloudInit +from tests_e2e.tests.lib.update_arm_template import UpdateArmTemplate + + +class AddCloudInitScript(UpdateArmTemplate): + """ + Adds AgentWaitForCloudInit.CloudInitScript to the ARM template as osProfile.customData. + """ + def update(self, template: Dict[str, Any], is_lisa_template: bool) -> None: + if not is_lisa_template: + raise Exception('This test can only customize LISA ARM templates.') + + # + # cloud-init configuration needs to be added in the osProfile.customData property as a base64-encoded string. + # + # LISA uses the getOSProfile function to generate the value for osProfile; add customData to its output, checking that we do not + # override any existing value (the current LISA template does not have any). + # + # "getOSProfile": { + # "parameters": [ + # ... + # ], + # "output": { + # "type": "object", + # "value": { + # "computername": "[parameters('computername')]", + # "adminUsername": "[parameters('admin_username')]", + # "adminPassword": "[if(parameters('has_password'), parameters('admin_password'), json('null'))]", + # "linuxConfiguration": "[if(parameters('has_linux_configuration'), parameters('linux_configuration'), json('null'))]" + # } + # } + # } + # + encoded_script = base64.b64encode(AgentWaitForCloudInit.CloudInitScript.encode('utf-8')).decode('utf-8') + + get_os_profile = self.get_lisa_function(template, 'getOSProfile') + output = self.get_function_output(get_os_profile) + if output.get('customData') is not None: + raise Exception(f"The getOSProfile function already has a 'customData'. Won't override it. Definition: {get_os_profile}") + output['customData'] = encoded_script + diff --git a/tests_e2e/tests/agent_wait_for_cloud_init/agent_wait_for_cloud_init.py b/tests_e2e/tests/agent_wait_for_cloud_init/agent_wait_for_cloud_init.py new file mode 100755 index 000000000..d9b4ecaef --- /dev/null +++ b/tests_e2e/tests/agent_wait_for_cloud_init/agent_wait_for_cloud_init.py @@ -0,0 +1,91 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import time + +from assertpy import fail + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient + + +class AgentWaitForCloudInit(AgentVmTest): + """ + This test verifies that the Agent waits for cloud-init to complete before it starts processing extensions. + + To do this, it adds 'CloudInitScript' in cloud-init's custom data. The script ensures first that the Agent + is waiting for cloud-init, and then sleeps for a couple of minutes before completing. The scripts appends + a set of known messages to waagent.log, and the test simply verifies that the messages are present in the + log in the expected order, and that they occur before the Agent reports that it is processing extensions. + """ + CloudInitScript = """#!/usr/bin/env bash + set -euox pipefail + + echo ">>> $(date) cloud-init script begin" >> /var/log/waagent.log + while ! grep 'Waiting for cloud-init to complete' /var/log/waagent.log; do + sleep 15 + done + echo ">>> $(date) The Agent is waiting for cloud-init, will pause for a couple of minutes" >> /var/log/waagent.log + sleep 120 + echo ">>> $(date) cloud-init script end" >> /var/log/waagent.log + """ + + def run(self): + ssh_client: SshClient = self._context.create_ssh_client() + + log.info("Waiting for Agent to start processing extensions") + for _ in range(15): + try: + ssh_client.run_command("grep 'ProcessExtensionsGoalState started' /var/log/waagent.log") + break + except CommandError: + log.info("The Agent has not started to process extensions, will check again after a short delay") + time.sleep(60) + else: + raise Exception("Timeout while waiting for the Agent to start processing extensions") + + log.info("The Agent has started to process extensions") + + output = ssh_client.run_command( + "grep -E '^>>>|" + + "INFO ExtHandler ExtHandler cloud-init completed|" + + "INFO ExtHandler ExtHandler ProcessExtensionsGoalState started' /var/log/waagent.log") + + output = output.rstrip().splitlines() + + expected = [ + 'cloud-init script begin', + 'The Agent is waiting for cloud-init, will pause for a couple of minutes', + 'cloud-init script end', + 'cloud-init completed', + 'ProcessExtensionsGoalState started' + ] + + indent = lambda lines: "\n".join([f" {ln}" for ln in lines]) + if len(output) == len(expected) and all([expected[i] in output[i] for i in range(len(expected))]): + log.info("The Agent waited for cloud-init before processing extensions.\nLog messages:\n%s", indent(output)) + else: + fail(f"The Agent did not wait for cloud-init before processing extensions.\nExpected:\n{indent(expected)}\nActual:\n{indent(output)}") + + +if __name__ == "__main__": + AgentWaitForCloudInit.run_from_command_line() + diff --git a/tests_e2e/tests/bvts/__init__.py b/tests_e2e/tests/bvts/__init__.py deleted file mode 100644 index e69de29bb..000000000 diff --git a/tests_e2e/tests/ext_cgroups/ext_cgroups.py b/tests_e2e/tests/ext_cgroups/ext_cgroups.py new file mode 100644 index 000000000..94a0c9725 --- /dev/null +++ b/tests_e2e/tests/ext_cgroups/ext_cgroups.py @@ -0,0 +1,43 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from tests_e2e.tests.ext_cgroups.install_extensions import InstallExtensions +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class ExtCgroups(AgentVmTest): + """ + This test verifies the installed extensions assigned correctly in their cgroups. + """ + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._ssh_client = self._context.create_ssh_client() + + def run(self): + log.info("=====Installing extensions to validate ext cgroups scenario") + InstallExtensions(self._context).run() + log.info("=====Executing remote script check_cgroups_extensions.py to validate extension cgroups") + self._run_remote_test(self._ssh_client, "ext_cgroups-check_cgroups_extensions.py", use_sudo=True) + log.info("Successfully verified that extensions present in correct cgroup") + + +if __name__ == "__main__": + ExtCgroups.run_from_command_line() diff --git a/tests_e2e/tests/ext_cgroups/install_extensions.py b/tests_e2e/tests/ext_cgroups/install_extensions.py new file mode 100644 index 000000000..aebc6e3c0 --- /dev/null +++ b/tests_e2e/tests/ext_cgroups/install_extensions.py @@ -0,0 +1,112 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from datetime import datetime, timedelta +from pathlib import Path + +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class InstallExtensions: + """ + This test installs the multiple extensions in order to verify extensions cgroups in the next test. + """ + + def __init__(self, context: AgentVmTestContext): + self._context = context + self._ssh_client = self._context.create_ssh_client() + + def run(self): + self._prepare_agent() + # Install the GATest extension to test service cgroups + self._install_gatest_extension() + # Install the Azure Monitor Agent to test long running process cgroup + self._install_ama() + # Install the VM Access extension to test sample extension + self._install_vmaccess() + # Install the CSE extension to test extension cgroup + self._install_cse() + + def _prepare_agent(self): + log.info("=====Executing update-waagent-conf remote script to update monitoring deadline flag for tracking azuremonitoragent service") + future_date = datetime.utcnow() + timedelta(days=2) + expiry_time = future_date.date().strftime("%Y-%m-%d") + # Agent needs extension info and it's services info in the handlermanifest.xml to monitor and limit the resource usage. + # As part of pilot testing , agent hardcoded azuremonitoragent service name to monitor it for sometime in production without need of manifest update from extesnion side. + # So that they can get sense of resource usage for their extensions. This we did for few months and now we no logner monitoring it in production. + # But we are changing the config flag expiry time to future date in this test. So that test agent will start track the cgroups that is used by the service. + result = self._ssh_client.run_command(f"update-waagent-conf Debug.CgroupMonitorExpiryTime={expiry_time}", use_sudo=True) + log.info(result) + log.info("Updated agent cgroups config(CgroupMonitorExpiryTime)") + + def _install_ama(self): + ama_extension = VirtualMachineExtensionClient( + self._context.vm, VmExtensionIds.AzureMonitorLinuxAgent, + resource_name="AMAAgent") + log.info("Installing %s", ama_extension) + ama_extension.enable() + ama_extension.assert_instance_view() + + def _install_vmaccess(self): + # fetch the public key + public_key_file: Path = Path(self._context.identity_file).with_suffix(".pub") + with public_key_file.open() as f: + public_key = f.read() + # Invoke the extension + vm_access = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.VmAccess, resource_name="VmAccess") + log.info("Installing %s", vm_access) + vm_access.enable( + protected_settings={ + 'username': self._context.username, + 'ssh_key': public_key, + 'reset_ssh': 'false' + } + ) + vm_access.assert_instance_view() + + def _install_gatest_extension(self): + gatest_extension = VirtualMachineExtensionClient( + self._context.vm, VmExtensionIds.GATestExtension, + resource_name="GATestExt") + log.info("Installing %s", gatest_extension) + gatest_extension.enable() + gatest_extension.assert_instance_view() + + + def _install_cse(self): + # Use custom script to output the cgroups assigned to it at runtime and save to /var/lib/waagent/tmp/custom_script_check. + script_contents = """ +mkdir /var/lib/waagent/tmp +cp /proc/$$/cgroup /var/lib/waagent/tmp/custom_script_check +""" + custom_script_2_0 = VirtualMachineExtensionClient( + self._context.vm, + VmExtensionIds.CustomScript, + resource_name="CustomScript") + + log.info("Installing %s", custom_script_2_0) + custom_script_2_0.enable( + protected_settings={ + 'commandToExecute': f"echo \'{script_contents}\' | bash" + } + ) + custom_script_2_0.assert_instance_view() + diff --git a/tests_e2e/tests/ext_sequencing/ext_seq_test_cases.py b/tests_e2e/tests/ext_sequencing/ext_seq_test_cases.py new file mode 100644 index 000000000..d1c942d0a --- /dev/null +++ b/tests_e2e/tests/ext_sequencing/ext_seq_test_cases.py @@ -0,0 +1,318 @@ +def add_one_dependent_ext_without_settings(): + # Dependent extensions without settings should be enabled with dependencies + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": ["CustomScript"], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "CustomScript", + "properties": { + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def add_two_extensions_with_dependencies(): + # Checks that extensions are enabled in the correct order when there is only one valid sequence + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": [], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "RunCommandLinux", + "properties": { + "provisionAfterExtensions": ["AzureMonitorLinuxAgent"], + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "provisionAfterExtensions": ["RunCommandLinux", "AzureMonitorLinuxAgent"], + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def remove_one_dependent_extension(): + # Checks that remaining extensions with dependencies are enabled in the correct order after removing a dependent + # extension + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "CustomScript", + "properties": { + "provisionAfterExtensions": ["AzureMonitorLinuxAgent"], + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def remove_all_dependencies(): + # Checks that extensions are enabled after adding and removing dependencies + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "RunCommandLinux", + "properties": { + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def add_one_dependent_extension(): + # Checks that a valid enable sequence occurs when only one extension has dependencies + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": ["RunCommandLinux", "CustomScript"], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "RunCommandLinux", + "properties": { + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def add_single_dependencies(): + # Checks that extensions are enabled in the correct order when there is only one valid sequence and each extension + # has no more than one dependency + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": [], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "RunCommandLinux", + "properties": { + "provisionAfterExtensions": ["CustomScript"], + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "provisionAfterExtensions": ["AzureMonitorLinuxAgent"], + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + } + ] + + +def remove_all_dependent_extensions(): + # Checks that remaining extensions with dependencies are enabled in the correct order after removing all dependent + # extension + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + } + ] + + +def add_failing_dependent_extension_with_one_dependency(): + # This case tests that extensions dependent on a failing extensions are skipped, but extensions that are not + # dependent on the failing extension still get enabled + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": ["CustomScript"], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True, + "settings": {} + } + }, + { + "name": "RunCommandLinux", + "properties": { + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "exit 1" + } + } + } + ] + + +def add_failing_dependent_extension_with_two_dependencies(): + # This case tests that all extensions dependent on a failing extensions are skipped + return [ + { + "name": "AzureMonitorLinuxAgent", + "properties": { + "provisionAfterExtensions": ["CustomScript"], + "publisher": "Microsoft.Azure.Monitor", + "type": "AzureMonitorLinuxAgent", + "typeHandlerVersion": "1.5", + "autoUpgradeMinorVersion": True + } + }, + { + "name": "RunCommandLinux", + "properties": { + "provisionAfterExtensions": ["CustomScript"], + "publisher": "Microsoft.CPlat.Core", + "type": "RunCommandLinux", + "typeHandlerVersion": "1.0", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "date" + } + } + }, + { + "name": "CustomScript", + "properties": { + "publisher": "Microsoft.Azure.Extensions", + "type": "CustomScript", + "typeHandlerVersion": "2.1", + "autoUpgradeMinorVersion": True, + "settings": { + "commandToExecute": "exit 1" + } + } + } + ] diff --git a/tests_e2e/tests/ext_sequencing/ext_sequencing.py b/tests_e2e/tests/ext_sequencing/ext_sequencing.py new file mode 100644 index 000000000..e50b0d6ab --- /dev/null +++ b/tests_e2e/tests/ext_sequencing/ext_sequencing.py @@ -0,0 +1,309 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test adds extensions with multiple dependencies to a VMSS using the 'provisionAfterExtensions' property and +# validates they are enabled in order of dependencies. +# +import copy +import re +import uuid +from datetime import datetime +from typing import List, Dict, Any + +from assertpy import fail +from azure.mgmt.compute.models import VirtualMachineScaleSetVMExtensionsSummary + +from tests_e2e.tests.ext_sequencing.ext_seq_test_cases import add_one_dependent_ext_without_settings, add_two_extensions_with_dependencies, \ + remove_one_dependent_extension, remove_all_dependencies, add_one_dependent_extension, \ + add_single_dependencies, remove_all_dependent_extensions, add_failing_dependent_extension_with_one_dependency, add_failing_dependent_extension_with_two_dependencies +from tests_e2e.tests.lib.agent_test import AgentVmssTest, TestSkipped +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.virtual_machine_scale_set_client import VmssInstanceIpAddress +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.resource_group_client import ResourceGroupClient +from tests_e2e.tests.lib.ssh_client import SshClient + + +class ExtSequencing(AgentVmssTest): + + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._scenario_start = datetime.min + + # Cases to test different dependency scenarios + _test_cases = [ + add_one_dependent_ext_without_settings, + add_two_extensions_with_dependencies, + # remove_one_dependent_extension should only be run after another test case which has RunCommandLinux in the + # model + remove_one_dependent_extension, + # remove_all_dependencies should only be run after another test case which has extension dependencies in the + # model + remove_all_dependencies, + add_one_dependent_extension, + add_single_dependencies, + # remove_all_dependent_extensions should only be run after another test case which has dependent extension in + # the model + remove_all_dependent_extensions, + add_failing_dependent_extension_with_one_dependency, + add_failing_dependent_extension_with_two_dependencies + ] + + @staticmethod + def _get_dependency_map(extensions: List[Dict[str, Any]]) -> Dict[str, Dict[str, Any]]: + dependency_map: Dict[str, Dict[str, Any]] = dict() + + for ext in extensions: + ext_name = ext['name'] + provisioned_after = ext['properties'].get('provisionAfterExtensions') + depends_on = provisioned_after if provisioned_after else [] + # We know an extension should fail if commandToExecute is exactly "exit 1" + ext_settings = ext['properties'].get("settings") + ext_command = ext['properties']['settings'].get("commandToExecute") if ext_settings else None + should_fail = ext_command == "exit 1" + dependency_map[ext_name] = {"should_fail": should_fail, "depends_on": depends_on} + + return dependency_map + + @staticmethod + def _get_sorted_extension_names(extensions: List[VirtualMachineScaleSetVMExtensionsSummary], ssh_client: SshClient, test_case_start: datetime) -> List[str]: + # Using VmExtensionIds to get publisher for each ext to be used in remote script + extension_full_names = { + "AzureMonitorLinuxAgent": VmExtensionIds.AzureMonitorLinuxAgent, + "RunCommandLinux": VmExtensionIds.RunCommand, + "CustomScript": VmExtensionIds.CustomScript + } + enabled_times = [] + for ext in extensions: + # Only check extensions which succeeded provisioning + if "succeeded" in ext.statuses_summary[0].code: + enabled_time = ssh_client.run_command(f"ext_sequencing-get_ext_enable_time.py --ext '{extension_full_names[ext.name]}'", use_sudo=True) + formatted_time = datetime.strptime(enabled_time.strip(), u'%Y-%m-%dT%H:%M:%SZ') + if formatted_time < test_case_start: + fail("Extension {0} was not enabled".format(extension_full_names[ext.name])) + enabled_times.append( + { + "name": ext.name, + "enabled_time": formatted_time + } + ) + + # sort the extensions based on their enabled datetime + sorted_extensions = sorted(enabled_times, key=lambda ext_: ext_["enabled_time"]) + log.info("") + log.info("Extensions sorted by time they were enabled: {0}".format( + ', '.join(["{0}: {1}".format(ext["name"], ext["enabled_time"]) for ext in sorted_extensions]))) + sorted_extension_names = [ext["name"] for ext in sorted_extensions] + return sorted_extension_names + + @staticmethod + def _validate_extension_sequencing(dependency_map: Dict[str, Dict[str, Any]], sorted_extension_names: List[str], relax_check: bool): + installed_ext = dict() + + # Iterate through the extensions in the enabled order and validate if their depending extensions are already + # enabled prior to that. + for ext in sorted_extension_names: + # Check if the depending extension are already installed + if ext not in dependency_map: + # There should not be any unexpected extensions on the scale set, even in the case we share the VMSS, + # because we update the scale set model with the extensions. Any extensions that are not in the scale + # set model would be disabled. + fail("Unwanted extension found in VMSS Instance view: {0}".format(ext)) + if dependency_map[ext] is not None: + dependencies = dependency_map[ext].get('depends_on') + for dep in dependencies: + if installed_ext.get(dep) is None: + # The depending extension is not installed prior to the current extension + if relax_check: + log.info("{0} is not installed prior to {1}".format(dep, ext)) + else: + fail("{0} is not installed prior to {1}".format(dep, ext)) + + # Mark the current extension as installed + installed_ext[ext] = ext + + # Validate that only extensions expected to fail, and their dependent extensions, failed + for ext, details in dependency_map.items(): + failing_ext_dependencies = [dep for dep in details['depends_on'] if dependency_map[dep]['should_fail']] + if ext not in installed_ext: + if details['should_fail']: + log.info("Extension {0} failed as expected".format(ext)) + elif failing_ext_dependencies: + log.info("Extension {0} failed as expected because it is dependent on {1}".format(ext, ' and '.join(failing_ext_dependencies))) + else: + fail("{0} unexpectedly failed. Only extensions that are expected to fail or depend on a failing extension should fail".format(ext)) + + log.info("Validated extension sequencing") + + def run(self): + instances_ip_address: List[VmssInstanceIpAddress] = self._context.vmss.get_instances_ip_address() + ssh_clients: Dict[str, SshClient] = dict() + for instance in instances_ip_address: + ssh_clients[instance.instance_name] = SshClient(ip_address=instance.ip_address, username=self._context.username, identity_file=self._context.identity_file) + + if not VmExtensionIds.AzureMonitorLinuxAgent.supports_distro(next(iter(ssh_clients.values())).run_command("get_distro.py").rstrip()): + raise TestSkipped("Currently AzureMonitorLinuxAgent is not supported on this distro") + + # This is the base ARM template that's used for deploying extensions for this scenario + base_extension_template = { + "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json", + "contentVersion": "1.0.0.0", + "resources": [ + { + "type": "Microsoft.Compute/virtualMachineScaleSets", + "name": f"{self._context.vmss.name}", + "location": "[resourceGroup().location]", + "apiVersion": "2018-06-01", + "properties": { + "virtualMachineProfile": { + "extensionProfile": { + "extensions": [] + } + } + } + } + ] + } + + for case in self._test_cases: + test_case_start = datetime.now() + if self._scenario_start == datetime.min: + self._scenario_start = test_case_start + + # Assign unique guid to forceUpdateTag for each extension to make sure they're always unique to force CRP + # to generate a new sequence number each time + test_guid = str(uuid.uuid4()) + extensions = case() + for ext in extensions: + ext["properties"].update({ + "forceUpdateTag": test_guid + }) + + # We update the extension template here with extensions that are specific to the scenario that we want to + # test out + log.info("") + log.info("Test case: {0}".format(case.__name__.replace('_', ' '))) + ext_template = copy.deepcopy(base_extension_template) + ext_template['resources'][0]['properties']['virtualMachineProfile']['extensionProfile'][ + 'extensions'] = extensions + + # Log the dependency map for the extensions in this test case + dependency_map = self._get_dependency_map(extensions) + log.info("") + log.info("The dependency map of the extensions for this test case is:") + for ext, details in dependency_map.items(): + dependencies = details.get('depends_on') + dependency_list = "-" if not dependencies else ' and '.join(dependencies) + log.info("{0} depends on {1}".format(ext, dependency_list)) + + # Deploy updated extension template to the scale set. + log.info("") + log.info("Deploying extensions with the above dependencies to the scale set...") + rg_client = ResourceGroupClient(self._context.vmss.cloud, self._context.vmss.subscription, + self._context.vmss.resource_group, self._context.vmss.location) + try: + rg_client.deploy_template(template=ext_template) + except Exception as e: + # We only expect to catch an exception during deployment if we are forcing one of the extensions to + # fail. We know an extension should fail if "failing" is in the case name. Otherwise, report the + # failure. + deployment_failure_pattern = r"[\s\S]*\"details\": [\s\S]* \"code\": \"(?P.*)\"[\s\S]* \"message\": \"(?P.*)\"[\s\S]*" + msg_pattern = r"Multiple VM extensions failed to be provisioned on the VM. Please see the VM extension instance view for other failures. The first extension failed due to the error: VM Extension '.*' is marked as failed since it depends upon the VM Extension 'CustomScript' which has failed." + deployment_failure_match = re.match(deployment_failure_pattern, str(e)) + if "failing" not in case.__name__: + fail("Extension template deployment unexpectedly failed: {0}".format(e)) + elif not deployment_failure_match or deployment_failure_match.group("code") != "VMExtensionProvisioningError" or not re.match(msg_pattern, deployment_failure_match.group("msg")): + fail("Extension template deployment failed as expected, but with an unexpected error: {0}".format(e)) + + # Get the extensions on the VMSS from the instance view + log.info("") + instance_view_extensions = self._context.vmss.get_instance_view().extensions + + # Validate that the extensions were enabled in the correct order on each instance of the scale set + for instance_name, ssh_client in ssh_clients.items(): + log.info("") + log.info("Validate extension sequencing on {0}:{1}...".format(instance_name, ssh_client.ip_address)) + + # Sort the VM extensions by the time they were enabled + sorted_extension_names = self._get_sorted_extension_names(instance_view_extensions, ssh_client, test_case_start) + + # Validate that the extensions were enabled in the correct order. We relax this check if no settings + # are provided for a dependent extension, since the guest agent currently ignores dependencies in this + # case. + relax_check = True if "settings" in case.__name__ else False + self._validate_extension_sequencing(dependency_map, sorted_extension_names, relax_check) + + log.info("------") + + def get_ignore_errors_before_timestamp(self) -> datetime: + # Ignore errors in the agent log before the first test case starts + return self._scenario_start + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + ignore_rules = [ + # + # WARNING ExtHandler ExtHandler Missing dependsOnExtension on extension Microsoft.Azure.Monitor.AzureMonitorLinuxAgent + # This message appears when an extension doesn't depend on another extension + # + { + 'message': r"Missing dependsOnExtension on extension .*" + }, + # + # WARNING ExtHandler ExtHandler Extension Microsoft.Azure.Monitor.AzureMonitorLinuxAgent does not have any settings. Will ignore dependency (dependency level: 1) + # We currently ignore dependencies for extensions without settings + # + { + 'message': r"Extension .* does not have any settings\. Will ignore dependency \(dependency level: \d\)" + }, + # + # 2023-10-31T17:46:59.675959Z WARNING ExtHandler ExtHandler Dependent extension Microsoft.Azure.Extensions.CustomScript failed or timed out, will skip processing the rest of the extensions + # We intentionally make CustomScript fail to test that dependent extensions are skipped + # + { + 'message': r"Dependent extension Microsoft.Azure.Extensions.CustomScript failed or timed out, will skip processing the rest of the extensions" + }, + # + # 2023-10-31T17:48:13.349214Z ERROR ExtHandler ExtHandler Event: name=Microsoft.Azure.Extensions.CustomScript, op=ExtensionProcessing, message=Dependent Extension Microsoft.Azure.Extensions.CustomScript did not succeed. Status was error, duration=0 + # We intentionally make CustomScript fail to test that dependent extensions are skipped + # + { + 'message': r"Event: name=Microsoft.Azure.Extensions.CustomScript, op=ExtensionProcessing, message=Dependent Extension Microsoft.Azure.Extensions.CustomScript did not succeed. Status was error, duration=0" + }, + # + # 2023-10-31T17:47:07.689083Z WARNING ExtHandler ExtHandler [PERIODIC] This status is being reported by the Guest Agent since no status file was reported by extension Microsoft.Azure.Monitor.AzureMonitorLinuxAgent: [ExtensionStatusError] Status file /var/lib/waagent/Microsoft.Azure.Monitor.AzureMonitorLinuxAgent-1.28.11/status/6.status does not exist + # We expect extensions that are dependent on a failing extension to not report status + # + { + 'message': r"\[PERIODIC\] This status is being reported by the Guest Agent since no status file was reported by extension .*: \[ExtensionStatusError\] Status file \/var\/lib\/waagent\/.*\/status\/\d.status does not exist" + }, + # + # 2023-10-31T17:48:11.306835Z WARNING ExtHandler ExtHandler A new goal state was received, but not all the extensions in the previous goal state have completed: [('Microsoft.Azure.Extensions.CustomScript', 'error'), ('Microsoft.Azure.Monitor.AzureMonitorLinuxAgent', 'transitioning'), ('Microsoft.CPlat.Core.RunCommandLinux', 'success')] + # This message appears when the previous test scenario had failing extensions due to extension dependencies + # + { + 'message': r"A new goal state was received, but not all the extensions in the previous goal state have completed: \[(\(u?'.*', u?'(error|transitioning|success)'\),?)+\]" + } + ] + return ignore_rules + + +if __name__ == "__main__": + ExtSequencing.run_from_command_line() diff --git a/tests_e2e/tests/ext_telemetry_pipeline/ext_telemetry_pipeline.py b/tests_e2e/tests/ext_telemetry_pipeline/ext_telemetry_pipeline.py new file mode 100755 index 000000000..e13f0ce6a --- /dev/null +++ b/tests_e2e/tests/ext_telemetry_pipeline/ext_telemetry_pipeline.py @@ -0,0 +1,111 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test ensures that the agent does not throw any errors while trying to transmit events to wireserver. It does not +# validate if the events actually make it to wireserver +# TODO: Update this test suite to verify that the agent picks up AND sends telemetry produced by extensions +# (work item https://dev.azure.com/msazure/One/_workitems/edit/24903999) +# + +import random +from typing import List, Dict, Any + +from azurelinuxagent.common.conf import get_etp_collection_period + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class ExtTelemetryPipeline(AgentVmTest): + def run(self): + ssh_client: SshClient = self._context.create_ssh_client() + + # Extensions we will create events for + extensions = ["Microsoft.Azure.Extensions.CustomScript"] + if VmExtensionIds.VmAccess.supports_distro(ssh_client.run_command("get_distro.py").rstrip()): + extensions.append("Microsoft.OSTCExtensions.VMAccessForLinux") + + # Set the etp collection period to 30 seconds instead of default 5 minutes + default_collection_period = get_etp_collection_period() + log.info("") + log.info("Set ETP collection period to 30 seconds on the test VM [%s]", self._context.vm.name) + output = ssh_client.run_command("update-waagent-conf Debug.EtpCollectionPeriod=30", use_sudo=True) + log.info("Updated waagent conf with Debug.ETPCollectionPeriod=30 completed:\n%s", output) + + # Add CSE to the test VM twice to ensure its events directory still exists after re-enabling + log.info("") + log.info("Add CSE to the test VM...") + cse = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript, resource_name="CustomScript") + cse.enable(settings={'commandToExecute': "echo 'enable'"}, protected_settings={}) + cse.assert_instance_view() + + log.info("") + log.info("Add CSE to the test VM again...") + cse.enable(settings={'commandToExecute': "echo 'enable again'"}, protected_settings={}) + cse.assert_instance_view() + + # Check agent log to verify ETP is enabled + command = "agent_ext_workflow-check_data_in_agent_log.py --data 'Extension Telemetry pipeline enabled: True'" + log.info("") + log.info("Check agent log to verify ETP is enabled...") + log.info("Remote command [%s] completed:\n%s", command, ssh_client.run_command(command)) + + # Add good extension events for each extension and check that the TelemetryEventsCollector collects them + # TODO: Update test suite to check that the agent is picking up the events generated by the extension, instead + # of generating on the extensions' behalf + # (work item - https://dev.azure.com/msazure/One/_workitems/edit/24903999) + log.info("") + log.info("Add good extension events and check they are reported...") + max_events = random.randint(10, 50) + self._run_remote_test(ssh_client, + f"ext_telemetry_pipeline-add_extension_events.py " + f"--extensions {','.join(extensions)} " + f"--num_events_total {max_events}", use_sudo=True) + log.info("") + log.info("Good extension events were successfully reported.") + + # Add invalid events for each extension and check that the TelemetryEventsCollector drops them + log.info("") + log.info("Add bad extension events and check they are reported...") + self._run_remote_test(ssh_client, + f"ext_telemetry_pipeline-add_extension_events.py " + f"--extensions {','.join(extensions)} " + f"--num_events_total {max_events} " + f"--num_events_bad {random.randint(5, max_events-5)}", use_sudo=True) + log.info("") + log.info("Bad extension events were successfully dropped.") + + # Reset the etp collection period to the default value so this VM can be shared with other suites + log.info("") + log.info("Reset ETP collection period to {0} seconds on the test VM [{1}]".format(default_collection_period, self._context.vm.name)) + output = ssh_client.run_command("update-waagent-conf Debug.EtpCollectionPeriod={0}".format(default_collection_period), use_sudo=True) + log.info("Updated waagent conf with default collection period completed:\n%s", output) + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + return [ + {'message': r"Dropped events for Extension.*"} + ] + + +if __name__ == "__main__": + ExtTelemetryPipeline.run_from_command_line() diff --git a/tests_e2e/tests/extensions_disabled/extensions_disabled.py b/tests_e2e/tests/extensions_disabled/extensions_disabled.py new file mode 100755 index 000000000..002d83357 --- /dev/null +++ b/tests_e2e/tests/extensions_disabled/extensions_disabled.py @@ -0,0 +1,142 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test disables extension processing on waagent.conf and verifies that extensions are not processed, but the +# agent continues reporting status. +# + +import datetime +import pytz +import uuid + +from assertpy import assert_that, fail +from typing import Any + +from azure.mgmt.compute.models import VirtualMachineInstanceView + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class ExtensionsDisabled(AgentVmTest): + class TestCase: + def __init__(self, extension: VirtualMachineExtensionClient, settings: Any): + self.extension = extension + self.settings = settings + + def run(self): + ssh_client: SshClient = self._context.create_ssh_client() + + # Disable extension processing on the test VM + log.info("") + log.info("Disabling extension processing on the test VM [%s]", self._context.vm.name) + output = ssh_client.run_command("update-waagent-conf Extensions.Enabled=n", use_sudo=True) + log.info("Disable completed:\n%s", output) + disabled_timestamp: datetime.datetime = datetime.datetime.utcnow() - datetime.timedelta(minutes=60) + + # Prepare test cases + unique = str(uuid.uuid4()) + test_file = f"waagent-test.{unique}" + test_cases = [ + ExtensionsDisabled.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript, + resource_name="CustomScript"), + {'commandToExecute': f"echo '{unique}' > /tmp/{test_file}"} + ), + ExtensionsDisabled.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="RunCommandHandler"), + {'source': {'script': f"echo '{unique}' > /tmp/{test_file}"}} + ) + ] + + for t in test_cases: + log.info("") + log.info("Test case: %s", t.extension) + # + # Validate that the agent is not processing extensions by attempting to enable extension & checking that + # provisioning fails fast + # + log.info( + "Executing {0}; the agent should report a VMExtensionProvisioningError without processing the extension" + .format(t.extension.__str__())) + + try: + t.extension.enable(settings=t.settings, force_update=True, timeout=6 * 60) + fail("The agent should have reported an error processing the goal state") + except Exception as error: + assert_that("VMExtensionProvisioningError" in str(error)) \ + .described_as(f"Expected a VMExtensionProvisioningError error, but actual error was: {error}") \ + .is_true() + assert_that("Extension will not be processed since extension processing is disabled" in str(error)) \ + .described_as( + f"Error message should communicate that extension will not be processed, but actual error " + f"was: {error}").is_true() + log.info("Goal state processing for {0} failed as expected".format(t.extension.__str__())) + + # + # Validate the agent did not process the extension by checking it did not execute the extension settings + # + output = ssh_client.run_command("dir /tmp", use_sudo=True) + assert_that(output) \ + .described_as( + f"Contents of '/tmp' on test VM contains {test_file}. Contents: {output}. \n This indicates " + f"{t.extension.__str__()} was unexpectedly processed") \ + .does_not_contain(f"{test_file}") + log.info("The agent did not process the extension settings for {0} as expected".format(t.extension.__str__())) + + # + # Validate that the agent continued reporting status even if it is not processing extensions + # + log.info("") + instance_view: VirtualMachineInstanceView = self._context.vm.get_instance_view() + log.info("Instance view of VM Agent:\n%s", instance_view.vm_agent.serialize()) + assert_that(instance_view.vm_agent.statuses).described_as("The VM agent should have exactly 1 status").is_length(1) + assert_that(instance_view.vm_agent.statuses[0].display_status).described_as("The VM Agent should be ready").is_equal_to('Ready') + # The time in the status is time zone aware and 'disabled_timestamp' is not; we need to make the latter time zone aware before comparing them + assert_that(instance_view.vm_agent.statuses[0].time)\ + .described_as("The VM Agent should be have reported status even after extensions were disabled")\ + .is_greater_than(pytz.utc.localize(disabled_timestamp)) + log.info("The VM Agent reported status after extensions were disabled, as expected.") + + # + # Validate that the agent processes extensions after re-enabling extension processing + # + log.info("") + log.info("Enabling extension processing on the test VM [%s]", self._context.vm.name) + output = ssh_client.run_command("update-waagent-conf Extensions.Enabled=y", use_sudo=True) + log.info("Enable completed:\n%s", output) + + for t in test_cases: + try: + log.info("") + log.info("Executing {0}; the agent should process the extension".format(t.extension.__str__())) + t.extension.enable(settings=t.settings, force_update=True, timeout=15 * 60) + log.info("Goal state processing for {0} succeeded as expected".format(t.extension.__str__())) + except Exception as error: + fail(f"Unexpected error while processing {t.extension.__str__()} after re-enabling extension " + f"processing: {error}") + + +if __name__ == "__main__": + ExtensionsDisabled.run_from_command_line() diff --git a/tests_e2e/tests/fips/fips.py b/tests_e2e/tests/fips/fips.py new file mode 100755 index 000000000..a5e2438a4 --- /dev/null +++ b/tests_e2e/tests/fips/fips.py @@ -0,0 +1,73 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import uuid +from assertpy import fail + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds + + +class Fips(AgentVmTest): + """ + Enables FIPS on the test VM, which is Mariner 2 VM, and verifies that extensions with protected settings are handled correctly under FIPS. + """ + def run(self): + ssh_client: SshClient = self._context.create_ssh_client() + + try: + command = "fips-enable_fips_mariner" + log.info("Enabling FIPS on the test VM [%s]", command) + output = ssh_client.run_command(command) + log.info("Enable FIPS completed\n%s", output) + except CommandError as e: + raise Exception(f"Failed to enable FIPS: {e}") + + log.info("Restarting test VM") + self._context.vm.restart(wait_for_boot=True, ssh_client=ssh_client) + + try: + command = "fips-check_fips_mariner" + log.info("Verifying that FIPS is enabled [%s]", command) + output = ssh_client.run_command(command).rstrip() + if output != "FIPS mode is enabled.": + fail(f"FIPS is not enabled - '{command}' returned '{output}'") + log.info(output) + except CommandError as e: + raise Exception(f"Failed to verify that FIPS is enabled: {e}") + + # Execute an extension with protected settings to ensure the tenant certificate can be decrypted under FIPS + custom_script = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript, resource_name="CustomScript") + log.info("Installing %s", custom_script) + message = f"Hello {uuid.uuid4()}!" + custom_script.enable( + protected_settings={ + 'commandToExecute': f"echo \'{message}\'" + } + ) + custom_script.assert_instance_view(expected_message=message) + + +if __name__ == "__main__": + Fips.run_from_command_line() + diff --git a/tests_e2e/tests/keyvault_certificates/keyvault_certificates.py b/tests_e2e/tests/keyvault_certificates/keyvault_certificates.py new file mode 100755 index 000000000..7be3f272c --- /dev/null +++ b/tests_e2e/tests/keyvault_certificates/keyvault_certificates.py @@ -0,0 +1,95 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test verifies that the Agent can download and extract KeyVault certificates that use different encryption algorithms (currently EC and RSA). +# +from assertpy import fail + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient + + +class KeyvaultCertificates(AgentVmTest): + def run(self): + test_certificates = { + 'C49A06B3044BD1778081366929B53EBF154133B3': { + 'AzureCloud': 'https://waagenttests.vault.azure.net/secrets/ec-cert/39862f0c6dff4b35bc8a83a5770c2102', + 'AzureChinaCloud': 'https://waagenttests.vault.azure.cn/secrets/ec-cert/bb610217ef70412bb3b3c8d7a7fabfdc', + 'AzureUSGovernment': 'https://waagenttests.vault.usgovcloudapi.net/secrets/ec-cert/9c20ef55c7074a468f04a168b3488933' + }, + '2F846E657258E50C7011E1F68EA9AD129BA4AB31': { + 'AzureCloud': 'https://waagenttests.vault.azure.net/secrets/rsa-cert/0b5eac1e66fb457bb3c3419fce17e705', + 'AzureChinaCloud': 'https://waagenttests.vault.azure.cn/secrets/rsa-cert/98679243f8d6493e95281a852d8cee00', + 'AzureUSGovernment': 'https://waagenttests.vault.usgovcloudapi.net/secrets/rsa-cert/463a8a6be3b3436d85d3d4e406621c9e' + } + } + thumbprints = test_certificates.keys() + certificate_urls = [u[self._context.vm.cloud] for u in test_certificates.values()] + + # The test certificates should be downloaded to these locations + expected_certificates = " ".join([f"/var/lib/waagent/{t}.{{crt,prv}}" for t in thumbprints]) + + # The test may be running on a VM that has already been tested (e.g. while debugging the test), so we need to delete any existing test certificates first + # (note that rm -f does not fail if the given files do not exist) + ssh_client: SshClient = self._context.create_ssh_client() + log.info("Deleting any existing test certificates on the test VM.") + existing_certificates = ssh_client.run_command(f"rm -f -v {expected_certificates}", use_sudo=True) + if existing_certificates == "": + log.info("No existing test certificates were found on the test VM.") + else: + log.info("Some test certificates had already been downloaded to the test VM (they have been deleted now):\n%s", existing_certificates) + + osprofile = { + "location": self._context.vm.location, + "properties": { + "osProfile": { + "secrets": [ + { + "sourceVault": { + "id": f"/subscriptions/{self._context.vm.subscription}/resourceGroups/waagent-tests/providers/Microsoft.KeyVault/vaults/waagenttests" + }, + "vaultCertificates": [{"certificateUrl": url} for url in certificate_urls] + } + ], + } + } + } + log.info("updating the vm's osProfile with the certificates to download:\n%s", osprofile) + self._context.vm.update(osprofile) + + # If the test has already run on the VM, force a new goal state to ensure the certificates are downloaded since the VM model most likely already had the certificates + # and the update operation would not have triggered a goal state + if existing_certificates != "": + log.info("Reapplying the goal state to ensure the test certificates are downloaded.") + self._context.vm.reapply() + + try: + output = ssh_client.run_command(f"ls {expected_certificates}", use_sudo=True) + log.info("Found all the expected certificates:\n%s", output) + except CommandError as error: + if error.stdout != "": + log.info("Found some of the expected certificates:\n%s", error.stdout) + fail(f"Failed to find certificates\n{error.stderr}") + + +if __name__ == "__main__": + KeyvaultCertificates.run_from_command_line() diff --git a/tests_e2e/tests/lib/agent_log.py b/tests_e2e/tests/lib/agent_log.py index 657b72928..60d42ec75 100644 --- a/tests_e2e/tests/lib/agent_log.py +++ b/tests_e2e/tests/lib/agent_log.py @@ -64,8 +64,22 @@ def from_dictionary(dictionary: Dict[str, str]): @property def timestamp(self) -> datetime: + # Extension logs may follow different timestamp formats + # 2023/07/10 20:50:13.459260 + ext_timestamp_regex_1 = r"\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}[.\d]+" + # 2023/07/10 20:50:13 + ext_timestamp_regex_2 = r"\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}" + + if re.match(ext_timestamp_regex_1, self.when): + return datetime.strptime(self.when, u'%Y/%m/%d %H:%M:%S.%f') + elif re.match(ext_timestamp_regex_2, self.when): + return datetime.strptime(self.when, u'%Y/%m/%d %H:%M:%S') + # Logs from agent follow this format: 2023-07-10T20:50:13.038599Z return datetime.strptime(self.when, u'%Y-%m-%dT%H:%M:%S.%fZ') + def __str__(self): + return self.text + class AgentLog(object): """ @@ -92,19 +106,6 @@ def get_errors(self) -> List[AgentLogRecord]: # # NOTE: This list was taken from the older agent tests and needs to be cleaned up. Feel free to un-comment rules as new tests are added. # - # # This warning is expected on CentOS/RedHat 7.4, 7.8 and Redhat 7.6 - # { - # 'message': r"Move rules file 70-persistent-net.rules to /var/lib/waagent/70-persistent-net.rules", - # 'if': lambda r: - # re.match(r"(((centos|redhat)7\.[48])|(redhat7\.6)|(redhat8\.2))\D*", DISTRO_NAME, flags=re.IGNORECASE) is not None - # and r.level == "WARNING" - # and r.prefix == "ExtHandler" and r.thread in ("", "EnvHandler") - # }, - # # This warning is expected on SUSE 12 - # { - # 'message': r"WARNING EnvHandler ExtHandler Move rules file 75-persistent-net-generator.rules to /var/lib/waagent/75-persistent-net-generator.rules", - # 'if': lambda _: re.match(r"((sles15\.2)|suse12)\D*", DISTRO_NAME, flags=re.IGNORECASE) is not None - # }, # # The following message is expected to log an error if systemd is not enabled on it # { # 'message': r"Did not detect Systemd, unable to set wa(|linux)agent-network-setup.service", @@ -146,15 +147,14 @@ def get_errors(self) -> List[AgentLogRecord]: # and r.level == "ERROR" # and r.prefix == "Daemon" # }, - # # - # # 2022-01-20T06:52:21.515447Z WARNING Daemon Daemon Fetch failed: [HttpError] [HTTP Failed] GET https://dcrgajhx62.blob.core.windows.net/$system/edprpwqbj6.5c2ddb5b-d6c3-4d73-9468-54419ca87a97.vmSettings -- IOError timed out -- 6 attempts made - # # - # # The daemon does not need the artifacts profile blob, but the request is done as part of protocol initialization. This timeout can be ignored, if the issue persist the log would include additional instances. - # # - # { - # 'message': r"\[HTTP Failed\] GET https://.*\.vmSettings -- IOError timed out", - # 'if': lambda r: r.level == "WARNING" and r.prefix == "Daemon" - # }, + # + # 2023-06-28T09:31:38.903835Z WARNING EnvHandler ExtHandler Move rules file 75-persistent-net-generator.rules to /var/lib/waagent/75-persistent-net-generator.rules + # The environment thread performs this operation periodically + # + { + 'message': r"Move rules file (70|75)-persistent.*.rules to /var/lib/waagent/(70|75)-persistent.*.rules", + 'if': lambda r: r.level == "WARNING" + }, # # Probably the agent should log this as INFO, but for now it is a warning # e.g. @@ -209,6 +209,24 @@ def get_errors(self) -> List[AgentLogRecord]: 'if': lambda r: DISTRO_NAME == 'ubuntu' and DISTRO_VERSION >= '22.00' }, # + # Old daemons can produce this message + # + # 2023-05-24T18:04:27.467009Z WARNING Daemon Daemon Could not mount cgroups: [Errno 1] Operation not permitted: '/sys/fs/cgroup/cpu,cpuacct' -> '/sys/fs/cgroup/cpu' + # + { + 'message': r"Could not mount cgroups: \[Errno 1\] Operation not permitted", + 'if': lambda r: r.prefix == 'Daemon' + }, + # + # The daemon does not need the artifacts profile blob, but the request is done as part of protocol initialization. This timeout can be ignored, if the issue persist the log would include additional instances. + # + # 2022-01-20T06:52:21.515447Z WARNING Daemon Daemon Fetch failed: [HttpError] [HTTP Failed] GET https://dcrgajhx62.blob.core.windows.net/$system/edprpwqbj6.5c2ddb5b-d6c3-4d73-9468-54419ca87a97.vmSettings -- IOError timed out -- 6 attempts made + # + { + 'message': r"\[HTTP Failed\] GET https://.*\.vmSettings -- IOError timed out", + 'if': lambda r: r.level == "WARNING" and r.prefix == "Daemon" + }, + # # 2022-02-09T04:50:37.384810Z ERROR ExtHandler ExtHandler Error fetching the goal state: [ProtocolError] GET vmSettings [correlation ID: 2bed9b62-188e-4668-b1a8-87c35cfa4927 eTag: 7031887032544600793]: [Internal error in HostGAPlugin] [HTTP Failed] [502: Bad Gateway] b'{ "errorCode": "VMArtifactsProfileBlobContentNotFound", "message": "VM artifacts profile blob has no content in it.", "details": ""}' # # Fetching the goal state may catch the HostGAPlugin in the process of computing the vmSettings. This can be ignored, if the issue persist the log would include other errors as well. @@ -289,12 +307,21 @@ def get_errors(self) -> List[AgentLogRecord]: 'message': r"SendHostPluginHeartbeat:.*ResourceGoneError.*410", 'if': lambda r: r.level == "WARNING" and self._increment_counter("SendHostPluginHeartbeat-ResourceGoneError-410") < 2 # ignore unless there are 2 or more instances }, + # # 2023-01-18T02:58:25.589492Z ERROR SendTelemetryHandler ExtHandler Event: name=WALinuxAgent, op=ReportEventErrors, message=DroppedEventsCount: 1 # Reasons (first 5 errors): [ProtocolError] [Wireserver Exception] [ProtocolError] [Wireserver Failed] URI http://168.63.129.16/machine?comp=telemetrydata [HTTP Failed] Status Code 400: Traceback (most recent call last): # { - 'message': r"(?s)SendTelemetryHandler.*http://168.63.129.16/machine\?comp=telemetrydata.*Status Code 400", - 'if': lambda _: self._increment_counter("SendTelemetryHandler-telemetrydata-Status Code 400") < 2 # ignore unless there are 2 or more instances + 'message': r"(?s)\[ProtocolError\].*http://168.63.129.16/machine\?comp=telemetrydata.*Status Code 400", + 'if': lambda r: r.thread == 'SendTelemetryHandler' and self._increment_counter("SendTelemetryHandler-telemetrydata-Status Code 400") < 2 # ignore unless there are 2 or more instances + }, + # + # 2023-07-26T22:05:42.841692Z ERROR SendTelemetryHandler ExtHandler Event: name=WALinuxAgent, op=ReportEventErrors, message=DroppedEventsCount: 1 + # Reasons (first 5 errors): [ProtocolError] Failed to send events:[ResourceGoneError] [HTTP Failed] [410: Gone] b'\n\n ResourceNotAvailable\n The resource requested is no longer available. Please refresh your cache.\n
\n
': Traceback (most recent call last): + # + { + 'message': r"(?s)\[ProtocolError\].*Failed to send events.*\[410: Gone\]", + 'if': lambda r: r.thread == 'SendTelemetryHandler' and self._increment_counter("SendTelemetryHandler-telemetrydata-Status Code 410") < 2 # ignore unless there are 2 or more instances }, # # Ignore these errors in flatcar: @@ -324,7 +351,23 @@ def get_errors(self) -> List[AgentLogRecord]: { 'message': r"Microsoft.Azure.Security.Monitoring.AzureSecurityLinuxAgent.*op=Install.*Non-zero exit code: 56,", }, - + # + # Ignore LogCollector failure to fetch vmSettings if it recovers + # + # 2023-08-27T08:13:42.520557Z WARNING MainThread LogCollector Fetch failed: [HttpError] [HTTP Failed] GET https://md-hdd-tkst3125n3x0.blob.core.chinacloudapi.cn/$system/lisa-WALinuxAgent-20230827-080144-029-e0-n0.cb9a406f-584b-4702-98bb-41a3ad5e334f.vmSettings -- IOError timed out -- 6 attempts made + # + { + 'message': r"Fetch failed:.*GET.*vmSettings.*timed out", + 'if': lambda r: r.prefix == 'LogCollector' and self.agent_log_contains("LogCollector Log collection successfully completed", after_timestamp=r.timestamp) + }, + # + # In tests, we use both autoupdate flags to install test agent with different value and changing it depending on the scenario. So, we can ignore this warning. + # + # 2024-01-30T22:22:37.299911Z WARNING ExtHandler ExtHandler AutoUpdate.Enabled property is **Deprecated** now but it's set to different value from AutoUpdate.UpdateToLatestVersion. Please consider removing it if added by mistake + { + 'message': r"AutoUpdate.Enabled property is \*\*Deprecated\*\* now but it's set to different value from AutoUpdate.UpdateToLatestVersion", + 'if': lambda r: r.prefix == 'ExtHandler' and r.thread == 'ExtHandler' + } ] def is_error(r: AgentLogRecord) -> bool: @@ -354,6 +397,19 @@ def is_error(r: AgentLogRecord) -> bool: return errors + def agent_log_contains(self, data: str, after_timestamp: str = datetime.min): + """ + This function looks for the specified test data string in the WALinuxAgent logs and returns if found or not. + :param data: The string to look for in the agent logs + :param after_timestamp: A timestamp + appears after this timestamp + :return: True if test data string found in the agent log after after_timestamp and False if not. + """ + for record in self.read(): + if data in record.text and record.timestamp > after_timestamp: + return True + return False + @staticmethod def _is_systemd(): # Taken from azurelinuxagent/common/osutil/systemd.py; repeated here because it is available only on agents >= 2.3 @@ -390,12 +446,15 @@ def matches_ignore_rule(record: AgentLogRecord, ignore_rules: List[Dict[str, Any # # Older Agent: 2021/03/30 19:35:35.971742 INFO Daemon Azure Linux Agent Version:2.2.45 # + # Oldest Agent: 2023/06/07 08:04:35.336313 WARNING Disabling guest agent in accordance with ovf-env.xml + # # Extension: 2021/03/30 19:45:31 Azure Monitoring Agent for Linux started to handle. # 2021/03/30 19:45:31 [Microsoft.Azure.Monitor.AzureMonitorLinuxAgent-1.7.0] cwd is /var/lib/waagent/Microsoft.Azure.Monitor.AzureMonitorLinuxAgent-1.7.0 # - _NEWER_AGENT_RECORD = re.compile(r'(?P[\d-]+T[\d:.]+Z)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P\S+)\s(?P(Daemon)|(ExtHandler)|(\[\S+\]))\s(?P.*)') + _NEWER_AGENT_RECORD = re.compile(r'(?P[\d-]+T[\d:.]+Z)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P\S+)\s(?P(Daemon)|(ExtHandler)|(LogCollector)|(\[\S+\]))\s(?P.*)') _2_2_46_AGENT_RECORD = re.compile(r'(?P[\d-]+T[\d:.]+Z)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P)(?PDaemon|ExtHandler|\[\S+\])\s(?P.*)') - _OLDER_AGENT_RECORD = re.compile(r'(?P[\d/]+\s[\d:.]+)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P)(?P\S*)\s(?P.*)') + _OLDER_AGENT_RECORD = re.compile(r'(?P[\d/]+\s[\d:.]+)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P)(?PDaemon|ExtHandler)\s(?P.*)') + _OLDEST_AGENT_RECORD = re.compile(r'(?P[\d/]+\s[\d:.]+)\s(?PVERBOSE|INFO|WARNING|ERROR)\s(?P)(?P)(?P.*)') _EXTENSION_RECORD = re.compile(r'(?P[\d/]+\s[\d:.]+)\s(?P)(?P)((?P\[[^\]]+\])\s)?(?P.*)') def read(self) -> Iterable[AgentLogRecord]: @@ -412,7 +471,7 @@ def read(self) -> Iterable[AgentLogRecord]: raise IOError('{0} does not exist'.format(self._path)) def match_record(): - for regex in [self._NEWER_AGENT_RECORD, self._2_2_46_AGENT_RECORD, self._OLDER_AGENT_RECORD]: + for regex in [self._NEWER_AGENT_RECORD, self._2_2_46_AGENT_RECORD, self._OLDER_AGENT_RECORD, self._OLDEST_AGENT_RECORD]: m = regex.match(line) if m is not None: return m diff --git a/tests_e2e/tests/lib/agent_test.py b/tests_e2e/tests/lib/agent_test.py index 22f865a6f..0021a8d74 100644 --- a/tests_e2e/tests/lib/agent_test.py +++ b/tests_e2e/tests/lib/agent_test.py @@ -20,10 +20,16 @@ import sys from abc import ABC, abstractmethod +from datetime import datetime + +from assertpy import fail from typing import Any, Dict, List -from tests_e2e.tests.lib.agent_test_context import AgentTestContext +from tests_e2e.tests.lib.agent_test_context import AgentTestContext, AgentVmTestContext, AgentVmssTestContext from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import FAIL_EXIT_CODE +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import ATTEMPTS, ATTEMPT_DELAY, SshClient class TestSkipped(Exception): @@ -33,34 +39,84 @@ class TestSkipped(Exception): """ +class RemoteTestError(CommandError): + """ + Raised when a remote test fails with an unexpected error. + """ + + class AgentTest(ABC): """ - Defines the interface for agent tests, which are simply constructed from an AgentTestContext and expose a single method, - run(), to execute the test. + Abstract base class for Agent tests """ def __init__(self, context: AgentTestContext): - self._context = context + self._context: AgentTestContext = context @abstractmethod def run(self): - pass + """ + Test must define this method, which is used to execute the test. + """ def get_ignore_error_rules(self) -> List[Dict[str, Any]]: - # Tests can override this method to return a list with rules to ignore errors in the agent log (see agent_log.py for sample rules). + """ + Tests can override this method to return a list with rules to ignore errors in the agent log (see agent_log.py for sample rules). + """ return [] + def get_ignore_errors_before_timestamp(self) -> datetime: + # Ignore errors in the agent log before this timestamp + return datetime.min + @classmethod def run_from_command_line(cls): """ Convenience method to execute the test when it is being invoked directly from the command line (as opposed as - being invoked from a test framework or library. + being invoked from a test framework or library.) """ try: - cls(AgentTestContext.from_args()).run() + if issubclass(cls, AgentVmTest): + cls(AgentVmTestContext.from_args()).run() + elif issubclass(cls, AgentVmssTest): + cls(AgentVmssTestContext.from_args()).run() + else: + raise Exception(f"Class {cls.__name__} is not a valid test class") except SystemExit: # Bad arguments pass + except AssertionError as e: + log.error("%s", e) + sys.exit(1) except: # pylint: disable=bare-except log.exception("Test failed") sys.exit(1) sys.exit(0) + + def _run_remote_test(self, ssh_client: SshClient, command: str, use_sudo: bool = False, attempts: int = ATTEMPTS, attempt_delay: int = ATTEMPT_DELAY) -> None: + """ + Derived classes can use this method to execute a remote test (a test that runs over SSH). + """ + try: + output = ssh_client.run_command(command=command, use_sudo=use_sudo, attempts=attempts, attempt_delay=attempt_delay) + log.info("*** PASSED: [%s]\n%s", command, self._indent(output)) + except CommandError as error: + if error.exit_code == FAIL_EXIT_CODE: + fail(f"[{command}] {error.stderr}{self._indent(error.stdout)}") + raise RemoteTestError(command=error.command, exit_code=error.exit_code, stdout=self._indent(error.stdout), stderr=error.stderr) + + @staticmethod + def _indent(text: str, indent: str = " " * 8): + return "\n".join(f"{indent}{line}" for line in text.splitlines()) + + +class AgentVmTest(AgentTest): + """ + Base class for Agent tests that run on a single VM + """ + + +class AgentVmssTest(AgentTest): + """ + Base class for Agent tests that run on a scale set + """ + diff --git a/tests_e2e/tests/lib/agent_test_context.py b/tests_e2e/tests/lib/agent_test_context.py index ca9fc64ad..b818b1298 100644 --- a/tests_e2e/tests/lib/agent_test_context.py +++ b/tests_e2e/tests/lib/agent_test_context.py @@ -17,148 +17,107 @@ import argparse import os +from abc import ABC from pathlib import Path -import tests_e2e -from tests_e2e.tests.lib.identifiers import VmIdentifier +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient +from tests_e2e.tests.lib.virtual_machine_scale_set_client import VirtualMachineScaleSetClient +from tests_e2e.tests.lib.ssh_client import SshClient -class AgentTestContext: +class AgentTestContext(ABC): """ - Execution context for agent tests. Defines the test VM, working directories and connection info for the tests. - - NOTE: The context is shared by all tests in the same runbook execution. Tests within the same test suite - are executed sequentially, but multiple test suites may be executed concurrently depending on the - concurrency level of the runbook. + Base class for the execution context of agent tests; includes the working directories and SSH info for the tests. """ - class Paths: - DEFAULT_TEST_SOURCE_DIRECTORY = Path(tests_e2e.__path__[0]) - - def __init__( - self, - working_directory: Path, - remote_working_directory: Path, - test_source_directory: Path = DEFAULT_TEST_SOURCE_DIRECTORY - ): - self._test_source_directory: Path = test_source_directory - self._working_directory: Path = working_directory - self._remote_working_directory: Path = remote_working_directory - - class Connection: - DEFAULT_SSH_PORT = 22 - - def __init__( - self, - ip_address: str, - username: str, - private_key_file: Path, - ssh_port: int = DEFAULT_SSH_PORT - ): - self._ip_address: str = ip_address - self._username: str = username - self._private_key_file: Path = private_key_file - self._ssh_port: int = ssh_port - - def __init__(self, vm: VmIdentifier, paths: Paths, connection: Connection): - self._vm: VmIdentifier = vm - self._paths = paths - self._connection = connection - - @property - def vm(self) -> VmIdentifier: - """ - The test VM (the VM on which the tested Agent is running) - """ - return self._vm + DEFAULT_SSH_PORT = 22 - @property - def vm_ip_address(self) -> str: - """ - The IP address of the test VM - """ - return self._connection._ip_address + def __init__(self, working_directory: Path, username: str, identity_file: Path, ssh_port: int): + self.working_directory: Path = working_directory + self.username: str = username + self.identity_file: Path = identity_file + self.ssh_port: int = ssh_port - @property - def test_source_directory(self) -> Path: + @staticmethod + def _create_argument_parser() -> argparse.ArgumentParser: """ - Root directory for the source code of the tests. Used to build paths to specific scripts. + Creates an ArgumentParser that includes the arguments common to the concrete classes derived from AgentTestContext """ - return self._paths._test_source_directory + parser = argparse.ArgumentParser() + parser.add_argument('-c', '--cloud', dest="cloud", required=False, choices=['AzureCloud', 'AzureChinaCloud', 'AzureUSGovernment'], default="AzureCloud") + parser.add_argument('-g', '--group', required=True) + parser.add_argument('-l', '--location', required=True) + parser.add_argument('-s', '--subscription', required=True) - @property - def working_directory(self) -> Path: - """ - Tests can create temporary files under this directory. + parser.add_argument('-w', '--working-directory', dest="working_directory", required=False, default=str(Path().home() / "tmp")) - """ - return self._paths._working_directory + parser.add_argument('-u', '--username', required=False, default=os.getenv("USER")) + parser.add_argument('-k', '--identity-file', dest="identity_file", required=False, default=str(Path.home() / ".ssh" / "id_rsa")) + parser.add_argument('-p', '--ssh-port', dest="ssh_port", required=False, default=AgentTestContext.DEFAULT_SSH_PORT) - @property - def remote_working_directory(self) -> Path: - """ - Tests can create temporary files under this directory on the test VM. - """ - return self._paths._remote_working_directory + return parser - @property - def username(self) -> str: - """ - The username to use for SSH connections - """ - return self._connection._username - @property - def private_key_file(self) -> Path: - """ - The file containing the private SSH key for the username - """ - return self._connection._private_key_file +class AgentVmTestContext(AgentTestContext): + """ + Execution context for agent tests targeted to individual VMs. + """ + def __init__(self, working_directory: Path, vm: VirtualMachineClient, ip_address: str, username: str, identity_file: Path, ssh_port: int = AgentTestContext.DEFAULT_SSH_PORT): + super().__init__(working_directory, username, identity_file, ssh_port) + self.vm: VirtualMachineClient = vm + self.ip_address: str = ip_address - @property - def ssh_port(self) -> int: + def create_ssh_client(self) -> SshClient: """ - Port for SSH connections + Convenience method to create an SSH client using the connection info from the context. """ - return self._connection._ssh_port + return SshClient( + ip_address=self.ip_address, + username=self.username, + identity_file=self.identity_file, + port=self.ssh_port) @staticmethod def from_args(): """ - Creates an AgentTestContext from the command line arguments. + Creates an AgentVmTestContext from the command line arguments. """ - parser = argparse.ArgumentParser() - parser.add_argument('-g', '--group', required=True) - parser.add_argument('-l', '--location', required=True) - parser.add_argument('-s', '--subscription', required=True) + parser = AgentTestContext._create_argument_parser() parser.add_argument('-vm', '--vm', required=True) + parser.add_argument('-a', '--ip-address', dest="ip_address", required=False) # Use the vm name as default - parser.add_argument('-rw', '--remote-working-directory', dest="remote_working_directory", required=False, default=str(Path('/home')/os.getenv("USER"))) - parser.add_argument('-t', '--test-source-directory', dest="test_source_directory", required=False, default=str(AgentTestContext.Paths.DEFAULT_TEST_SOURCE_DIRECTORY)) - parser.add_argument('-w', '--working-directory', dest="working_directory", required=False, default=str(Path().home()/"tmp")) + args = parser.parse_args() - parser.add_argument('-a', '--ip-address', dest="ip_address", required=False) # Use the vm name as default - parser.add_argument('-u', '--username', required=False, default=os.getenv("USER")) - parser.add_argument('-k', '--private-key-file', dest="private_key_file", required=False, default=Path.home()/".ssh"/"id_rsa") - parser.add_argument('-p', '--ssh-port', dest="ssh_port", required=False, default=AgentTestContext.Connection.DEFAULT_SSH_PORT) + working_directory: Path = Path(args.working_directory) + if not working_directory.exists(): + working_directory.mkdir(exist_ok=True) + + vm: VirtualMachineClient = VirtualMachineClient(cloud=args.cloud, location=args.location, subscription=args.subscription, resource_group=args.group, name=args.vm) + ip_address = args.ip_address if args.ip_address is not None else args.vm + return AgentVmTestContext(working_directory=working_directory, vm=vm, ip_address=ip_address, username=args.username, identity_file=Path(args.identity_file), ssh_port=args.ssh_port) + + +class AgentVmssTestContext(AgentTestContext): + """ + Execution context for agent tests targeted to VM Scale Sets. + """ + def __init__(self, working_directory: Path, vmss: VirtualMachineScaleSetClient, username: str, identity_file: Path, ssh_port: int = AgentTestContext.DEFAULT_SSH_PORT): + super().__init__(working_directory, username, identity_file, ssh_port) + self.vmss: VirtualMachineScaleSetClient = vmss + + @staticmethod + def from_args(): + """ + Creates an AgentVmssTestContext from the command line arguments. + """ + parser = AgentTestContext._create_argument_parser() + parser.add_argument('-vmss', '--vmss', required=True) args = parser.parse_args() - working_directory = Path(args.working_directory) + working_directory: Path = Path(args.working_directory) if not working_directory.exists(): working_directory.mkdir(exist_ok=True) - return AgentTestContext( - vm=VmIdentifier( - location=args.location, - subscription=args.subscription, - resource_group=args.group, - name=args.vm), - paths=AgentTestContext.Paths( - working_directory=working_directory, - remote_working_directory=Path(args.remote_working_directory), - test_source_directory=Path(args.test_source_directory)), - connection=AgentTestContext.Connection( - ip_address=args.ip_address if args.ip_address is not None else args.vm, - username=args.username, - private_key_file=Path(args.private_key_file), - ssh_port=args.ssh_port)) + vmss = VirtualMachineScaleSetClient(cloud=args.cloud, location=args.location, subscription=args.subscription, resource_group=args.group, name=args.vmss) + return AgentVmssTestContext(working_directory=working_directory, vmss=vmss, username=args.username, identity_file=Path(args.identity_file), ssh_port=args.ssh_port) + diff --git a/tests_e2e/tests/lib/azure_clouds.py b/tests_e2e/tests/lib/azure_clouds.py new file mode 100644 index 000000000..2e1f5674e --- /dev/null +++ b/tests_e2e/tests/lib/azure_clouds.py @@ -0,0 +1,24 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from typing import Dict +from msrestazure.azure_cloud import Cloud, AZURE_PUBLIC_CLOUD, AZURE_CHINA_CLOUD, AZURE_US_GOV_CLOUD + +AZURE_CLOUDS: Dict[str, Cloud] = { + "AzureCloud": AZURE_PUBLIC_CLOUD, + "AzureChinaCloud": AZURE_CHINA_CLOUD, + "AzureUSGovernment": AZURE_US_GOV_CLOUD +} diff --git a/tests_e2e/tests/lib/azure_sdk_client.py b/tests_e2e/tests/lib/azure_sdk_client.py new file mode 100644 index 000000000..f76d83ca7 --- /dev/null +++ b/tests_e2e/tests/lib/azure_sdk_client.py @@ -0,0 +1,59 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from typing import Any, Callable + +from azure.identity import DefaultAzureCredential +from azure.core.polling import LROPoller + +from tests_e2e.tests.lib.azure_clouds import AZURE_CLOUDS +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import execute_with_retry + + +class AzureSdkClient: + """ + Base class for classes implementing clients of the Azure SDK. + """ + _DEFAULT_TIMEOUT = 10 * 60 # (in seconds) + + @staticmethod + def create_client(client_type: type, cloud: str, subscription_id: str): + """ + Creates an SDK client of the given 'client_type' + """ + azure_cloud = AZURE_CLOUDS[cloud] + return client_type( + base_url=azure_cloud.endpoints.resource_manager, + credential=DefaultAzureCredential(authority=azure_cloud.endpoints.active_directory), + credential_scopes=[azure_cloud.endpoints.resource_manager + "/.default"], + subscription_id=subscription_id) + + @staticmethod + def _execute_async_operation(operation: Callable[[], LROPoller], operation_name: str, timeout: int) -> Any: + """ + Starts an async operation and waits its completion. Returns the operation's result. + """ + log.info("Starting [%s]", operation_name) + poller: LROPoller = execute_with_retry(operation) + log.info("Waiting for [%s]", operation_name) + poller.wait(timeout=timeout) + if not poller.done(): + raise TimeoutError(f"[{operation_name}] did not complete within {timeout} seconds") + log.info("[%s] completed", operation_name) + return poller.result() + diff --git a/tests_e2e/tests/lib/cgroup_helpers.py b/tests_e2e/tests/lib/cgroup_helpers.py new file mode 100644 index 000000000..6da2865c2 --- /dev/null +++ b/tests_e2e/tests/lib/cgroup_helpers.py @@ -0,0 +1,150 @@ +import os +import re + +from assertpy import assert_that, fail + +from azurelinuxagent.common.osutil import systemd +from azurelinuxagent.common.utils import shellutil +from azurelinuxagent.common.version import DISTRO_NAME, DISTRO_VERSION +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.logging import log + +BASE_CGROUP = '/sys/fs/cgroup' +AGENT_CGROUP_NAME = 'WALinuxAgent' +AGENT_SERVICE_NAME = systemd.get_agent_unit_name() +AGENT_CONTROLLERS = ['cpu', 'memory'] +EXT_CONTROLLERS = ['cpu', 'memory'] + +CGROUP_TRACKED_PATTERN = re.compile(r'Started tracking cgroup ([^\s]+)\s+\[(?P[^\s]+)\]') + +GATESTEXT_FULL_NAME = "Microsoft.Azure.Extensions.Edp.GATestExtGo" +GATESTEXT_SERVICE = "gatestext.service" +AZUREMONITOREXT_FULL_NAME = "Microsoft.Azure.Monitor.AzureMonitorLinuxAgent" +AZUREMONITORAGENT_SERVICE = "azuremonitoragent.service" +MDSD_SERVICE = "mdsd.service" + + +def verify_if_distro_supports_cgroup(): + """ + checks if agent is running in a distro that supports cgroups + """ + log.info("===== Checking if distro supports cgroups") + + base_cgroup_fs_exists = os.path.exists(BASE_CGROUP) + + assert_that(base_cgroup_fs_exists).is_true().described_as("Cgroup file system:{0} not found in Distro {1}-{2}".format(BASE_CGROUP, DISTRO_NAME, DISTRO_VERSION)) + + log.info('Distro %s-%s supports cgroups\n', DISTRO_NAME, DISTRO_VERSION) + + +def print_cgroups(): + """ + log the mounted cgroups information + """ + log.info("====== Currently mounted cgroups ======") + for m in shellutil.run_command(['mount']).splitlines(): + # output is similar to + # mount + # sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel) + # proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) + # devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1842988k,nr_inodes=460747,mode=755) + # cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) + # cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids) + # cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory) + # cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio) + # cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb) + if 'type cgroup' in m: + log.info('\t%s', m) + + +def print_service_status(): + log.info("====== Agent Service status ======") + output = shellutil.run_command(["systemctl", "status", systemd.get_agent_unit_name()]) + for line in output.splitlines(): + log.info("\t%s", line) + + +def get_agent_cgroup_mount_path(): + return os.path.join('/', 'azure.slice', AGENT_SERVICE_NAME) + + +def get_extension_cgroup_mount_path(extension_name): + return os.path.join('/', 'azure.slice/azure-vmextensions.slice', + "azure-vmextensions-" + extension_name + ".slice") + + +def get_unit_cgroup_mount_path(unit_name): + """ + Returns the cgroup mount path for the given unit + """ + output = shellutil.run_command(["systemctl", "show", unit_name, "--property", "ControlGroup"]) + # Output is similar to + # systemctl show walinuxagent.service --property ControlGroup + # ControlGroup=/azure.slice/walinuxagent.service + # matches above output and extract right side value + match = re.match("[^=]+=(?P.+)", output) + if match is not None: + return match.group('value') + return None + + +def verify_agent_cgroup_assigned_correctly(): + """ + This method checks agent is running and assigned to the correct cgroup using service status output + """ + log.info("===== Verifying the daemon and the agent are assigned to the same correct cgroup using systemd") + service_status = shellutil.run_command(["systemctl", "status", systemd.get_agent_unit_name()]) + log.info("Agent service status output:\n%s", service_status) + is_active = False + is_cgroup_assigned = False + cgroup_mount_path = get_agent_cgroup_mount_path() + is_active_pattern = re.compile(r".*Active:\s+active.*") + + for line in service_status.splitlines(): + if re.match(is_active_pattern, line): + is_active = True + elif cgroup_mount_path in line: + is_cgroup_assigned = True + + if not is_active: + fail('walinuxagent service was not active/running. Service status:{0}'.format(service_status)) + if not is_cgroup_assigned: + fail('walinuxagent service was not assigned to the expected cgroup:{0}'.format(cgroup_mount_path)) + + log.info("Successfully verified the agent cgroup assigned correctly by systemd\n") + + +def get_agent_cpu_quota(): + """ + Returns the cpu quota for the agent service + """ + output = shellutil.run_command(["systemctl", "show", AGENT_SERVICE_NAME, "--property", "CPUQuotaPerSecUSec"]) + # Output is similar to + # systemctl show walinuxagent --property CPUQuotaPerSecUSec + # CPUQuotaPerSecUSec=infinity + match = re.match("[^=]+=(?P.+)", output) + if match is not None: + return match.group('value') + return None + + +def check_agent_quota_disabled(): + """ + Returns True if the cpu quota is infinity + """ + cpu_quota = get_agent_cpu_quota() + # the quota can be expressed as seconds (s) or milliseconds (ms); no quota is expressed as "infinity" + return cpu_quota == 'infinity' + + +def check_cgroup_disabled_with_unknown_process(): + """ + Returns True if the cgroup is disabled with unknown process + """ + for record in AgentLog().read(): + match = re.search("Disabling resource usage monitoring. Reason: Check on cgroups failed:.+UNKNOWN", + record.message, flags=re.DOTALL) + if match is not None: + log.info("Found message:\n\t%s", record.text.replace("\n", "\n\t")) + return True + return False diff --git a/tests_e2e/tests/lib/firewall_helpers.py b/tests_e2e/tests/lib/firewall_helpers.py new file mode 100644 index 000000000..0e6ddd405 --- /dev/null +++ b/tests_e2e/tests/lib/firewall_helpers.py @@ -0,0 +1,209 @@ +from typing import List, Tuple + +from assertpy import fail + +from azurelinuxagent.common.future import ustr +from azurelinuxagent.common.utils import shellutil +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry_if_false + +WIRESERVER_ENDPOINT_FILE = '/var/lib/waagent/WireServerEndpoint' +WIRESERVER_IP = '168.63.129.16' +FIREWALL_PERIOD = 30 + +# helper methods shared by multiple tests + +class IPTableRules(object): + # -D deletes the specific rule in the iptable chain + DELETE_COMMAND = "-D" + + # -C checks if a specific rule exists + CHECK_COMMAND = "-C" + + +class FirewalldRules(object): + # checks if a specific rule exists + QUERY_PASSTHROUGH = "--query-passthrough" + + # removes a specific rule + REMOVE_PASSTHROUGH = "--remove-passthrough" + + +def get_wireserver_ip() -> str: + try: + with open(WIRESERVER_ENDPOINT_FILE, 'r') as f: + wireserver_ip = f.read() + except Exception: + wireserver_ip = WIRESERVER_IP + return wireserver_ip + + +def get_root_accept_rule_command(command: str) -> List[str]: + return ['sudo', 'iptables', '-t', 'security', command, 'OUTPUT', '-d', get_wireserver_ip(), '-p', 'tcp', '-m', + 'owner', + '--uid-owner', + '0', '-j', 'ACCEPT', '-w'] + + +def get_non_root_accept_rule_command(command: str) -> List[str]: + return ['sudo', 'iptables', '-t', 'security', command, 'OUTPUT', '-d', get_wireserver_ip(), '-p', 'tcp', + '--destination-port', '53', '-j', + 'ACCEPT', '-w'] + + +def get_non_root_drop_rule_command(command: str) -> List[str]: + return ['sudo', 'iptables', '-t', 'security', command, 'OUTPUT', '-d', get_wireserver_ip(), '-p', 'tcp', '-m', + 'conntrack', '--ctstate', + 'INVALID,NEW', '-j', 'DROP', '-w'] + + +def get_non_root_accept_tcp_firewalld_rule(command): + return ["firewall-cmd", "--permanent", "--direct", command, "ipv4", "-t", "security", "-A", "OUTPUT", "-d", + get_wireserver_ip(), + "-p", "tcp", "--destination-port", "53", "-j", "ACCEPT"] + + +def get_root_accept_firewalld_rule(command): + return ["firewall-cmd", "--permanent", "--direct", command, "ipv4", "-t", "security", "-A", "OUTPUT", "-d", + get_wireserver_ip(), + "-p", "tcp", "-m", "owner", "--uid-owner", "0", "-j", "ACCEPT"] + + +def get_non_root_drop_firewalld_rule(command): + return ["firewall-cmd", "--permanent", "--direct", command, "ipv4", "-t", "security", "-A", "OUTPUT", "-d", + get_wireserver_ip(), + "-p", "tcp", "-m", "conntrack", "--ctstate", "INVALID,NEW", "-j", "DROP"] + + +def execute_cmd(cmd: List[str]): + """ + Note: The shellutil.run_command return stdout if exit_code=0, otherwise returns Exception + """ + return shellutil.run_command(cmd, track_process=False) + + +def execute_cmd_return_err_code(cmd: List[str]): + """ + Note: The shellutil.run_command return err_code plus stdout/stderr + """ + try: + stdout = execute_cmd(cmd) + return 0, stdout + except Exception as error: + return -1, ustr(error) + + +def check_if_iptable_rule_is_available(full_command: List[str]) -> bool: + """ + This function is used to check if given rule is present in iptable rule set + "-C" return exit code 0 if the rule is available. + """ + exit_code, _ = execute_cmd_return_err_code(full_command) + return exit_code == 0 + + +def print_current_iptable_rules() -> None: + """ + This function prints the current iptable rules + """ + try: + cmd = ["sudo", "iptables", "-t", "security", "-L", "-nxv"] + stdout = execute_cmd(cmd) + for line in stdout.splitlines(): + log.info(str(line)) + except Exception as error: + log.warning("Error -- Failed to fetch the ip table rule set {0}".format(error)) + + +def get_all_iptable_rule_commands(command: str) -> Tuple[List[str], List[str], List[str]]: + return get_root_accept_rule_command(command), get_non_root_accept_rule_command(command), get_non_root_drop_rule_command(command) + + +def verify_all_rules_exist() -> None: + """ + This function is used to verify all the iptable rules are present in the rule set + """ + def check_all_iptables() -> bool: + root_accept, non_root_accept, non_root_drop = get_all_iptable_rule_commands(IPTableRules.CHECK_COMMAND) + found: bool = check_if_iptable_rule_is_available(root_accept) and check_if_iptable_rule_is_available( + non_root_accept) and check_if_iptable_rule_is_available(non_root_drop) + return found + + log.info("Verifying all ip table rules are present in rule set") + # Agent will re-add rules within OS.EnableFirewallPeriod, So waiting that time + some buffer + found: bool = retry_if_false(check_all_iptables, attempts=2, delay=FIREWALL_PERIOD+15) + + if not found: + fail("IP table rules missing in rule set.\n Current iptable rules: {0}".format( + print_current_iptable_rules())) + + log.info("verified All ip table rules are present in rule set") + + +def firewalld_service_running(): + """ + Checks if firewalld service is running on the VM + Eg: firewall-cmd --state + > running + """ + cmd = ["firewall-cmd", "--state"] + exit_code, output = execute_cmd_return_err_code(cmd) + if exit_code != 0: + log.warning("Firewall service not running: {0}".format(output)) + return exit_code == 0 and output.rstrip() == "running" + + +def get_all_firewalld_rule_commands(command): + return get_root_accept_firewalld_rule(command), get_non_root_accept_tcp_firewalld_rule( + command), get_non_root_drop_firewalld_rule(command) + + +def check_if_firewalld_rule_is_available(command): + """ + This function is used to check if given firewalld rule is present in rule set + --query-passthrough return exit code 0 if the rule is available + """ + exit_code, _ = execute_cmd_return_err_code(command) + if exit_code == 0: + return True + return False + + +def verify_all_firewalld_rules_exist(): + """ + This function is used to verify all the firewalld rules are present in the rule set + """ + + def check_all_firewalld_rules(): + root_accept, non_root_accept, non_root_drop = get_all_firewalld_rule_commands(FirewalldRules.QUERY_PASSTHROUGH) + found = check_if_firewalld_rule_is_available(root_accept) and check_if_firewalld_rule_is_available( + non_root_accept) and check_if_firewalld_rule_is_available(non_root_drop) + return found + + log.info("Verifying all firewalld rules are present in rule set") + found = retry_if_false(check_all_firewalld_rules, attempts=2) + + if not found: + fail("Firewalld rules missing in rule set. {0}".format( + print_current_firewalld_rules())) + + print_current_firewalld_rules() + log.info("verified All firewalld rules are present in rule set") + + +def print_current_firewalld_rules(): + """ + This function prints the current firewalld rules + """ + try: + cmd = ["firewall-cmd", "--permanent", "--direct", "--get-all-passthroughs"] + exit_code, stdout = execute_cmd_return_err_code(cmd) + if exit_code != 0: + log.warning("Warning -- Failed to fetch firewalld rules with error code: %s and error: %s", exit_code, + stdout) + else: + log.info("Current firewalld rules:") + for line in stdout.splitlines(): + log.info(str(line)) + except Exception as error: + raise Exception("Error -- Failed to fetch the firewalld rule set {0}".format(error)) diff --git a/tests_e2e/tests/lib/logging.py b/tests_e2e/tests/lib/logging.py index ff636b63d..e713dce9d 100644 --- a/tests_e2e/tests/lib/logging.py +++ b/tests_e2e/tests/lib/logging.py @@ -20,6 +20,8 @@ # for logging. # import contextlib +import sys + from logging import FileHandler, Formatter, Handler, Logger, StreamHandler, INFO from pathlib import Path from threading import current_thread @@ -46,7 +48,7 @@ class _AgentLoggingHandler(Handler): def __init__(self): super().__init__() self.formatter: Formatter = Formatter('%(asctime)s.%(msecs)03d [%(levelname)s] %(message)s', datefmt="%Y-%m-%dT%H:%M:%SZ") - self.default_handler = StreamHandler() + self.default_handler = StreamHandler(sys.stdout) self.default_handler.setFormatter(self.formatter) self.per_thread_handlers: Dict[int, FileHandler] = {} @@ -153,3 +155,18 @@ def set_current_thread_log(log_file: Path): log.close_current_thread_log() if initial_value is not None: log.set_current_thread_log(initial_value) + + +@contextlib.contextmanager +def set_thread_name(name: str): + """ + Context Manager to change the name of the current thread temporarily + """ + initial_name = current_thread().name + current_thread().name = name + try: + yield + finally: + current_thread().name = initial_name + + diff --git a/tests_e2e/tests/lib/network_security_rule.py b/tests_e2e/tests/lib/network_security_rule.py new file mode 100644 index 000000000..8df51b204 --- /dev/null +++ b/tests_e2e/tests/lib/network_security_rule.py @@ -0,0 +1,182 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import json + +from typing import Any, Dict, List + +from tests_e2e.tests.lib.update_arm_template import UpdateArmTemplate + + +class NetworkSecurityRule: + """ + Provides methods to add network security rules to the given ARM template. + + The security rules are added under _NETWORK_SECURITY_GROUP, which is also added to the template. + """ + def __init__(self, template: Dict[str, Any], is_lisa_template: bool): + self._template = template + self._is_lisa_template = is_lisa_template + + _NETWORK_SECURITY_GROUP: str = "waagent-nsg" + + def add_allow_ssh_rule(self, ip_address: str) -> None: + self.add_security_rule( + json.loads(f"""{{ + "name": "waagent-ssh", + "properties": {{ + "description": "Allows inbound SSH connections from the orchestrator machine.", + "protocol": "Tcp", + "sourcePortRange": "*", + "destinationPortRange": "22", + "sourceAddressPrefix": "{ip_address}", + "destinationAddressPrefix": "*", + "access": "Allow", + "priority": 100, + "direction": "Inbound" + }} + }}""")) + + def add_security_rule(self, security_rule: Dict[str, Any]) -> None: + self._get_network_security_group()["properties"]["securityRules"].append(security_rule) + + def _get_network_security_group(self) -> Dict[str, Any]: + resources: List[Dict[str, Any]] = self._template["resources"] + # + # If the NSG already exists, just return it + # + try: + return UpdateArmTemplate.get_resource_by_name(resources, self._NETWORK_SECURITY_GROUP, "Microsoft.Network/networkSecurityGroups") + except KeyError: + pass + + # + # Otherwise, create it and append it to the list of resources + # + network_security_group = json.loads(f"""{{ + "type": "Microsoft.Network/networkSecurityGroups", + "name": "{self._NETWORK_SECURITY_GROUP}", + "location": "[resourceGroup().location]", + "apiVersion": "2020-05-01", + "properties": {{ + "securityRules": [] + }} + }}""") + resources.append(network_security_group) + + # + # Add a dependency on the NSG to the virtual network + # + network_resource = UpdateArmTemplate.get_resource(resources, "Microsoft.Network/virtualNetworks") + network_resource_dependencies = network_resource.get("dependsOn") + nsg_reference = f"[resourceId('Microsoft.Network/networkSecurityGroups', '{self._NETWORK_SECURITY_GROUP}')]" + if network_resource_dependencies is None: + network_resource["dependsOn"] = [nsg_reference] + else: + network_resource_dependencies.append(nsg_reference) + + # + # Add a reference to the NSG to the properties of the subnets. + # + nsg_reference = json.loads(f"""{{ + "networkSecurityGroup": {{ + "id": "[resourceId('Microsoft.Network/networkSecurityGroups', '{self._NETWORK_SECURITY_GROUP}')]" + }} + }}""") + + if self._is_lisa_template: + # The subnets are a copy property of the virtual network in LISA's ARM template: + # + # { + # "condition": "[empty(parameters('virtual_network_resource_group'))]", + # "apiVersion": "2020-05-01", + # "type": "Microsoft.Network/virtualNetworks", + # "name": "[parameters('virtual_network_name')]", + # "location": "[parameters('location')]", + # "properties": { + # "addressSpace": { + # "addressPrefixes": [ + # "10.0.0.0/16" + # ] + # }, + # "copy": [ + # { + # "name": "subnets", + # "count": "[parameters('subnet_count')]", + # "input": { + # "name": "[concat(parameters('subnet_prefix'), copyIndex('subnets'))]", + # "properties": { + # "addressPrefix": "[concat('10.0.', copyIndex('subnets'), '.0/24')]" + # } + # } + # } + # ] + # } + # } + # + subnets_copy = network_resource["properties"].get("copy") if network_resource.get("properties") is not None else None + if subnets_copy is None: + raise Exception("Cannot find the copy property of the virtual network in the ARM template") + + subnets = [i for i in subnets_copy if "name" in i and i["name"] == 'subnets'] + if len(subnets) == 0: + raise Exception("Cannot find the subnets of the virtual network in the ARM template") + + subnets_input = subnets[0].get("input") + if subnets_input is None: + raise Exception("Cannot find the input property of the subnets in the ARM template") + + subnets_properties = subnets_input.get("properties") + if subnets_properties is None: + subnets_input["properties"] = nsg_reference + else: + subnets_properties.update(nsg_reference) + else: + # + # The subnets are simple property of the virtual network in template for scale sets: + # { + # "apiVersion": "2023-06-01", + # "type": "Microsoft.Network/virtualNetworks", + # "name": "[variables('virtualNetworkName')]", + # "location": "[resourceGroup().location]", + # "properties": { + # "addressSpace": { + # "addressPrefixes": [ + # "[variables('vnetAddressPrefix')]" + # ] + # }, + # "subnets": [ + # { + # "name": "[variables('subnetName')]", + # "properties": { + # "addressPrefix": "[variables('subnetPrefix')]", + # } + # } + # ] + # } + # } + subnets = network_resource["properties"].get("subnets") if network_resource.get("properties") is not None else None + if subnets is None: + raise Exception("Cannot find the subnets property of the virtual network in the ARM template") + + subnets_properties = subnets[0].get("properties") + if subnets_properties is None: + subnets["properties"] = nsg_reference + else: + subnets_properties.update(nsg_reference) + + return network_security_group diff --git a/tests_e2e/tests/lib/remote_test.py b/tests_e2e/tests/lib/remote_test.py new file mode 100644 index 000000000..ad71ae69b --- /dev/null +++ b/tests_e2e/tests/lib/remote_test.py @@ -0,0 +1,48 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import sys + +from typing import Callable + +from tests_e2e.tests.lib.logging import log + +SUCCESS_EXIT_CODE = 0 +FAIL_EXIT_CODE = 100 +ERROR_EXIT_CODE = 200 + + +def run_remote_test(test_method: Callable[[], None]) -> None: + """ + Helper function to run a remote test; implements coding conventions for remote tests, e.g. error message goes + to stderr, test log goes to stdout, etc. + """ + try: + test_method() + log.info("*** PASSED") + except AssertionError as e: + print(f"{e}", file=sys.stderr) + log.error("%s", e) + sys.exit(FAIL_EXIT_CODE) + except Exception as e: + print(f"UNEXPECTED ERROR: {e}", file=sys.stderr) + log.exception("*** UNEXPECTED ERROR") + sys.exit(ERROR_EXIT_CODE) + + sys.exit(SUCCESS_EXIT_CODE) + diff --git a/tests_e2e/tests/lib/resource_group_client.py b/tests_e2e/tests/lib/resource_group_client.py new file mode 100644 index 000000000..9ca07a260 --- /dev/null +++ b/tests_e2e/tests/lib/resource_group_client.py @@ -0,0 +1,74 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This module includes facilities to create a resource group and deploy an arm template to it +# +from typing import Dict, Any + +from azure.mgmt.compute import ComputeManagementClient +from azure.mgmt.resource import ResourceManagementClient +from azure.mgmt.resource.resources.models import DeploymentProperties, DeploymentMode + +from tests_e2e.tests.lib.azure_sdk_client import AzureSdkClient +from tests_e2e.tests.lib.logging import log + + +class ResourceGroupClient(AzureSdkClient): + """ + Provides operations on resource groups (create, template deployment, etc). + """ + def __init__(self, cloud: str, subscription: str, name: str, location: str = ""): + super().__init__() + self.cloud: str = cloud + self.location = location + self.subscription: str = subscription + self.name: str = name + self._compute_client = AzureSdkClient.create_client(ComputeManagementClient, cloud, subscription) + self._resource_client = AzureSdkClient.create_client(ResourceManagementClient, cloud, subscription) + + def create(self) -> None: + """ + Creates a resource group + """ + log.info("Creating resource group %s", self) + self._resource_client.resource_groups.create_or_update(self.name, {"location": self.location}) + + def deploy_template(self, template: Dict[str, Any], parameters: Dict[str, Any] = None): + """ + Deploys an ARM template to the resource group + """ + if parameters: + properties = DeploymentProperties(template=template, parameters=parameters, mode=DeploymentMode.incremental) + else: + properties = DeploymentProperties(template=template, mode=DeploymentMode.incremental) + + log.info("Deploying template to resource group %s...", self) + self._execute_async_operation( + operation=lambda: self._resource_client.deployments.begin_create_or_update(self.name, 'TestDeployment', {'properties': properties}), + operation_name=f"Deploy template to resource group {self}", + timeout=AzureSdkClient._DEFAULT_TIMEOUT) + + def delete(self) -> None: + """ + Deletes the resource group + """ + log.info("Deleting resource group %s (no wait)", self) + self._resource_client.resource_groups.begin_delete(self.name) # Do not wait for the deletion to complete + + def __str__(self): + return f"{self.name}" diff --git a/tests_e2e/tests/lib/retry.py b/tests_e2e/tests/lib/retry.py index bbd327cda..db0a52fcf 100644 --- a/tests_e2e/tests/lib/retry.py +++ b/tests_e2e/tests/lib/retry.py @@ -40,20 +40,55 @@ def execute_with_retry(operation: Callable[[], Any]) -> Any: time.sleep(30) -def retry_ssh_run(operation: Callable[[], Any]) -> Any: +def retry_ssh_run(operation: Callable[[], Any], attempts: int, attempt_delay: int) -> Any: """ This method attempts to retry ssh run command a few times if operation failed with connection time out """ - attempts = 3 + i = 0 + while True: + i += 1 + try: + return operation() + except CommandError as e: + retryable = ((e.exit_code == 255 and ("Connection timed out" in e.stderr or "Connection refused" in e.stderr)) or + "Unprivileged users are not permitted to log in yet" in e.stderr) + if not retryable or i >= attempts: + raise + log.warning("The SSH operation failed, retrying in %s secs [Attempt %s/%s].\n%s", attempt_delay, i, attempts, e) + time.sleep(attempt_delay) + + +def retry_if_false(operation: Callable[[], bool], attempts: int = 5, delay: int = 30) -> bool: + """ + This method attempts the given operation retrying a few times + (after a short delay) + Note: Method used for operations which are return True or False + """ + success: bool = False + while attempts > 0 and not success: + attempts -= 1 + try: + success = operation() + except Exception as e: + log.warning("Error in operation: %s", e) + if attempts == 0: + raise + if not success and attempts != 0: + log.info("Current operation failed, retrying in %s secs.", delay) + time.sleep(delay) + return success + + +def retry(operation: Callable[[], Any], attempts: int = 5, delay: int = 30) -> Any: + """ + This method attempts the given operation retrying a few times on exceptions. Returns the value returned by the operation. + """ while attempts > 0: attempts -= 1 try: return operation() except Exception as e: - # We raise CommandError on !=0 exit codes in the called method - if isinstance(e, CommandError): - # Instance of 'Exception' has no 'exit_code' member (no-member) - Disabled: e is actually an CommandError - if e.exit_code != 255 or attempts == 0: # pylint: disable=no-member - raise - log.warning("The operation failed with %s, retrying in 30 secs.", e) - time.sleep(30) + if attempts == 0: + raise + log.warning("Error in operation, retrying in %s secs: %s", delay, e) + time.sleep(delay) diff --git a/tests_e2e/tests/lib/shell.py b/tests_e2e/tests/lib/shell.py index a5288439a..af5b30b80 100644 --- a/tests_e2e/tests/lib/shell.py +++ b/tests_e2e/tests/lib/shell.py @@ -38,7 +38,7 @@ def __str__(self): def run_command(command: Any, shell=False) -> str: """ This function is a thin wrapper around Popen/communicate in the subprocess module. It executes the given command - and returns its stdout. If the command returns a non-zero exit code, the function raises a RunCommandException. + and returns its stdout. If the command returns a non-zero exit code, the function raises a CommandError. Similarly to Popen, the 'command' can be a string or a list of strings, and 'shell' indicates whether to execute the command through the shell. diff --git a/tests_e2e/tests/lib/ssh_client.py b/tests_e2e/tests/lib/ssh_client.py index a6e1ab9fd..ae7600c11 100644 --- a/tests_e2e/tests/lib/ssh_client.py +++ b/tests_e2e/tests/lib/ssh_client.py @@ -23,30 +23,36 @@ from tests_e2e.tests.lib import shell from tests_e2e.tests.lib.retry import retry_ssh_run +ATTEMPTS: int = 3 +ATTEMPT_DELAY: int = 30 + class SshClient(object): - def __init__(self, ip_address: str, username: str, private_key_file: Path, port: int = 22): - self._ip_address: str = ip_address - self._username: str = username - self._private_key_file: Path = private_key_file - self._port: int = port + def __init__(self, ip_address: str, username: str, identity_file: Path, port: int = 22): + self.ip_address: str = ip_address + self.username: str = username + self.identity_file: Path = identity_file + self.port: int = port - def run_command(self, command: str, use_sudo: bool = False) -> str: + def run_command(self, command: str, use_sudo: bool = False, attempts: int = ATTEMPTS, attempt_delay: int = ATTEMPT_DELAY) -> str: """ Executes the given command over SSH and returns its stdout. If the command returns a non-zero exit code, - the function raises a RunCommandException. + the function raises a CommandError. """ if re.match(r"^\s*sudo\s*", command): raise Exception("Do not include 'sudo' in the 'command' argument, use the 'use_sudo' parameter instead") - destination = f"ssh://{self._username}@{self._ip_address}:{self._port}" + destination = f"ssh://{self.username}@{self.ip_address}:{self.port}" # Note that we add ~/bin to the remote PATH, since Python (Pypy) and other test tools are installed there. # Note, too, that when using sudo we need to carry over the value of PATH to the sudo session sudo = "sudo env PATH=$PATH PYTHONPATH=$PYTHONPATH" if use_sudo else '' - return retry_ssh_run(lambda: shell.run_command([ - "ssh", "-o", "StrictHostKeyChecking=no", "-i", self._private_key_file, destination, - f"if [[ -e ~/bin/set-agent-env ]]; then source ~/bin/set-agent-env; fi; {sudo} {command}"])) + command = [ + "ssh", "-o", "StrictHostKeyChecking=no", "-i", self.identity_file, + destination, + f"if [[ -e ~/bin/set-agent-env ]]; then source ~/bin/set-agent-env; fi; {sudo} {command}" + ] + return retry_ssh_run(lambda: shell.run_command(command), attempts, attempt_delay) @staticmethod def generate_ssh_key(private_key_file: Path): @@ -59,27 +65,27 @@ def generate_ssh_key(private_key_file: Path): def get_architecture(self): return self.run_command("uname -m").rstrip() - def copy_to_node(self, local_path: Path, remote_path: Path, recursive: bool = False) -> None: + def copy_to_node(self, local_path: Path, remote_path: Path, recursive: bool = False, attempts: int = ATTEMPTS, attempt_delay: int = ATTEMPT_DELAY) -> None: """ File copy to a remote node """ - self._copy(local_path, remote_path, remote_source=False, remote_target=True, recursive=recursive) + self._copy(local_path, remote_path, remote_source=False, remote_target=True, recursive=recursive, attempts=attempts, attempt_delay=attempt_delay) - def copy_from_node(self, remote_path: Path, local_path: Path, recursive: bool = False) -> None: + def copy_from_node(self, remote_path: Path, local_path: Path, recursive: bool = False, attempts: int = ATTEMPTS, attempt_delay: int = ATTEMPT_DELAY) -> None: """ File copy from a remote node """ - self._copy(remote_path, local_path, remote_source=True, remote_target=False, recursive=recursive) + self._copy(remote_path, local_path, remote_source=True, remote_target=False, recursive=recursive, attempts=attempts, attempt_delay=attempt_delay) - def _copy(self, source: Path, target: Path, remote_source: bool, remote_target: bool, recursive: bool) -> None: + def _copy(self, source: Path, target: Path, remote_source: bool, remote_target: bool, recursive: bool, attempts: int, attempt_delay: int) -> None: if remote_source: - source = f"{self._username}@{self._ip_address}:{source}" + source = f"{self.username}@{self.ip_address}:{source}" if remote_target: - target = f"{self._username}@{self._ip_address}:{target}" + target = f"{self.username}@{self.ip_address}:{target}" - command = ["scp", "-o", "StrictHostKeyChecking=no", "-i", self._private_key_file] + command = ["scp", "-o", "StrictHostKeyChecking=no", "-i", self.identity_file] if recursive: command.append("-r") command.extend([str(source), str(target)]) - shell.run_command(command) + return retry_ssh_run(lambda: shell.run_command(command), attempts, attempt_delay) diff --git a/tests_e2e/tests/lib/update_arm_template.py b/tests_e2e/tests/lib/update_arm_template.py new file mode 100644 index 000000000..010178ab9 --- /dev/null +++ b/tests_e2e/tests/lib/update_arm_template.py @@ -0,0 +1,141 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from abc import ABC, abstractmethod +from typing import Any, Dict, List + + +class UpdateArmTemplate(ABC): + + @abstractmethod + def update(self, template: Dict[str, Any], is_lisa_template: bool) -> None: + """ + Derived classes implement this method to customize the ARM template used to create the test VMs. The 'template' parameter is a dictionary + created from the template's JSON document, as parsed by json.loads(). + + If the 'is_lisa_template' parameter is True, the template was created by LISA. The original JSON document is located at + https://github.com/microsoft/lisa/blob/main/lisa/sut_orchestrator/azure/arm_template.json + """ + + @staticmethod + def get_resource(resources: List[Dict[str, Any]], type_name: str) -> Any: + """ + Returns the first resource of the specified type in the given 'resources' list. + + Raises KeyError if no resource of the specified type is found. + """ + for item in resources: + if item["type"] == type_name: + return item + raise KeyError(f"Cannot find a resource of type {type_name} in the ARM template") + + @staticmethod + def get_resource_by_name(resources: List[Dict[str, Any]], resource_name: str, type_name: str) -> Any: + """ + Returns the first resource of the specified type and name in the given 'resources' list. + + Raises KeyError if no resource of the specified type and name is found. + """ + for item in resources: + if item["type"] == type_name and item["name"] == resource_name: + return item + raise KeyError(f"Cannot find a resource {resource_name} of type {type_name} in the ARM template") + + @staticmethod + def get_lisa_function(template: Dict[str, Any], function_name: str) -> Dict[str, Any]: + """ + Looks for the given function name in the LISA namespace and returns its definition. Raises KeyError if the function is not found. + """ + # + # NOTE: LISA's functions are in the "lisa" namespace, for example: + # + # "functions": [ + # { + # "namespace": "lisa", + # "members": { + # "getOSProfile": { + # "parameters": [ + # { + # "name": "computername", + # "type": "string" + # }, + # etc. + # ], + # "output": { + # "type": "object", + # "value": { + # "computername": "[parameters('computername')]", + # "adminUsername": "[parameters('admin_username')]", + # "adminPassword": "[if(parameters('has_password'), parameters('admin_password'), json('null'))]", + # "linuxConfiguration": "[if(parameters('has_linux_configuration'), parameters('linux_configuration'), json('null'))]" + # } + # } + # }, + # } + # } + # ] + functions = template.get("functions") + if functions is None: + raise Exception('Cannot find "functions" in the LISA template.') + + for namespace in functions: + name = namespace.get("namespace") + if name is None: + raise Exception(f'Cannot find "namespace" in the LISA template: {namespace}') + if name == "lisa": + lisa_functions = namespace.get('members') + if lisa_functions is None: + raise Exception(f'Cannot find the members of the lisa namespace in the LISA template: {namespace}') + function_definition = lisa_functions.get(function_name) + if function_definition is None: + raise KeyError(f'Cannot find function {function_name} in the lisa namespace in the LISA template: {namespace}') + return function_definition + raise Exception(f'Cannot find the "lisa" namespace in the LISA template: {functions}') + + @staticmethod + def get_function_output(function: Dict[str, Any]) -> Dict[str, Any]: + """ + Returns the "value" property of the output for the given function. + + Sample function: + + { + "parameters": [ + { + "name": "computername", + "type": "string" + }, + etc. + ], + "output": { + "type": "object", + "value": { + "computername": "[parameters('computername')]", + "adminUsername": "[parameters('admin_username')]", + "adminPassword": "[if(parameters('has_password'), parameters('admin_password'), json('null'))]", + "linuxConfiguration": "[if(parameters('has_linux_configuration'), parameters('linux_configuration'), json('null'))]" + } + } + } + """ + output = function.get('output') + if output is None: + raise Exception(f'Cannot find the "output" of the given function: {function}') + value = output.get('value') + if value is None: + raise Exception(f"Cannot find the output's value of the given function: {function}") + return value diff --git a/tests_e2e/tests/lib/virtual_machine.py b/tests_e2e/tests/lib/virtual_machine.py deleted file mode 100644 index 032a7e0f5..000000000 --- a/tests_e2e/tests/lib/virtual_machine.py +++ /dev/null @@ -1,143 +0,0 @@ -# Microsoft Azure Linux Agent -# -# Copyright 2018 Microsoft Corporation -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# -# This module includes facilities to execute some operations on virtual machines and scale sets (list extensions, restart, etc). -# - -from abc import ABC, abstractmethod -from builtins import TimeoutError -from typing import Any, List - -from azure.core.polling import LROPoller -from azure.identity import DefaultAzureCredential -from azure.mgmt.compute import ComputeManagementClient -from azure.mgmt.compute.models import VirtualMachineExtension, VirtualMachineScaleSetExtension, VirtualMachineInstanceView, VirtualMachineScaleSetInstanceView -from azure.mgmt.resource import ResourceManagementClient - -from tests_e2e.tests.lib.identifiers import VmIdentifier -from tests_e2e.tests.lib.logging import log -from tests_e2e.tests.lib.retry import execute_with_retry - - -class VirtualMachineBaseClass(ABC): - """ - Abstract base class for VirtualMachine and VmScaleSet. - - Defines the interface common to both classes and provides the implementation of some methods in that interface. - """ - def __init__(self, vm: VmIdentifier): - super().__init__() - self._identifier: VmIdentifier = vm - self._compute_client = ComputeManagementClient(credential=DefaultAzureCredential(), subscription_id=vm.subscription) - self._resource_client = ResourceManagementClient(credential=DefaultAzureCredential(), subscription_id=vm.subscription) - - @abstractmethod - def get_instance_view(self) -> Any: # Returns VirtualMachineInstanceView or VirtualMachineScaleSetInstanceView - """ - Retrieves the instance view of the virtual machine or scale set - """ - - @abstractmethod - def get_extensions(self) -> Any: # Returns List[VirtualMachineExtension] or List[VirtualMachineScaleSetExtension] - """ - Retrieves the extensions installed on the virtual machine or scale set - """ - - def restart(self, timeout=5 * 60) -> None: - """ - Restarts the virtual machine or scale set - """ - log.info("Initiating restart of %s", self._identifier) - - poller: LROPoller = execute_with_retry(self._begin_restart) - - poller.wait(timeout=timeout) - - if not poller.done(): - raise TimeoutError(f"Failed to restart {self._identifier.name} after {timeout} seconds") - - log.info("Restarted %s", self._identifier.name) - - @abstractmethod - def _begin_restart(self) -> LROPoller: - """ - Derived classes must provide the implementation for this method using their corresponding begin_restart() implementation - """ - - def __str__(self): - return f"{self._identifier}" - - -class VirtualMachine(VirtualMachineBaseClass): - def get_instance_view(self) -> VirtualMachineInstanceView: - log.info("Retrieving instance view for %s", self._identifier) - return execute_with_retry(lambda: self._compute_client.virtual_machines.get( - resource_group_name=self._identifier.resource_group, - vm_name=self._identifier.name, - expand="instanceView" - ).instance_view) - - def get_extensions(self) -> List[VirtualMachineExtension]: - log.info("Retrieving extensions for %s", self._identifier) - return execute_with_retry(lambda: self._compute_client.virtual_machine_extensions.list( - resource_group_name=self._identifier.resource_group, - vm_name=self._identifier.name)) - - def _begin_restart(self) -> LROPoller: - return self._compute_client.virtual_machines.begin_restart( - resource_group_name=self._identifier.resource_group, - vm_name=self._identifier.name) - - -class VmScaleSet(VirtualMachineBaseClass): - def get_instance_view(self) -> VirtualMachineScaleSetInstanceView: - log.info("Retrieving instance view for %s", self._identifier) - - # TODO: Revisit this implementation. Currently this method returns the instance view of the first VM instance available. - # For the instance view of the complete VMSS, use the compute_client.virtual_machine_scale_sets function - # https://docs.microsoft.com/en-us/python/api/azure-mgmt-compute/azure.mgmt.compute.v2019_12_01.operations.virtualmachinescalesetsoperations?view=azure-python - for vm in execute_with_retry(lambda: self._compute_client.virtual_machine_scale_set_vms.list(self._identifier.resource_group, self._identifier.name)): - try: - return execute_with_retry(lambda: self._compute_client.virtual_machine_scale_set_vms.get_instance_view( - resource_group_name=self._identifier.resource_group, - vm_scale_set_name=self._identifier.name, - instance_id=vm.instance_id)) - except Exception as e: - log.warning("Unable to retrieve instance view for scale set instance %s. Trying out other instances.\nError: %s", vm, e) - - raise Exception(f"Unable to retrieve instance view of any instances for scale set {self._identifier}") - - - @property - def vm_func(self): - return self._compute_client.virtual_machine_scale_set_vms - - @property - def extension_func(self): - return self._compute_client.virtual_machine_scale_set_extensions - - def get_extensions(self) -> List[VirtualMachineScaleSetExtension]: - log.info("Retrieving extensions for %s", self._identifier) - return execute_with_retry(lambda: self._compute_client.virtual_machine_scale_set_extensions.list( - resource_group_name=self._identifier.resource_group, - vm_scale_set_name=self._identifier.name)) - - def _begin_restart(self) -> LROPoller: - return self._compute_client.virtual_machine_scale_sets.begin_restart( - resource_group_name=self._identifier.resource_group, - vm_scale_set_name=self._identifier.name) diff --git a/tests_e2e/tests/lib/virtual_machine_client.py b/tests_e2e/tests/lib/virtual_machine_client.py new file mode 100644 index 000000000..5d6e471b9 --- /dev/null +++ b/tests_e2e/tests/lib/virtual_machine_client.py @@ -0,0 +1,196 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This module includes facilities to execute operations on virtual machines (list extensions, restart, etc). +# + +import datetime +import json +import time +from typing import Any, Dict, List + +from azure.mgmt.compute import ComputeManagementClient +from azure.mgmt.compute.models import VirtualMachineExtension, VirtualMachineInstanceView, VirtualMachine +from azure.mgmt.network import NetworkManagementClient +from azure.mgmt.network.models import NetworkInterface, PublicIPAddress +from azure.mgmt.resource import ResourceManagementClient + +from tests_e2e.tests.lib.azure_sdk_client import AzureSdkClient +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import execute_with_retry +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient + + +class VirtualMachineClient(AzureSdkClient): + """ + Provides operations on virtual machines (get instance view, update, restart, etc). + """ + def __init__(self, cloud: str, location: str, subscription: str, resource_group: str, name: str): + super().__init__() + self.cloud: str = cloud + self.location = location + self.subscription: str = subscription + self.resource_group: str = resource_group + self.name: str = name + self._compute_client = AzureSdkClient.create_client(ComputeManagementClient, cloud, subscription) + self._resource_client = AzureSdkClient.create_client(ResourceManagementClient, cloud, subscription) + self._network_client = AzureSdkClient.create_client(NetworkManagementClient, cloud, subscription) + + def get_ip_address(self) -> str: + """ + Retrieves the public IP address of the virtual machine + """ + vm_model = self.get_model() + nic: NetworkInterface = self._network_client.network_interfaces.get( + resource_group_name=self.resource_group, + network_interface_name=vm_model.network_profile.network_interfaces[0].id.split('/')[-1]) # the name of the interface is the last component of the id + public_ip: PublicIPAddress = self._network_client.public_ip_addresses.get( + resource_group_name=self.resource_group, + public_ip_address_name=nic.ip_configurations[0].public_ip_address.id.split('/')[-1]) # the name of the ip address is the last component of the id + return public_ip.ip_address + + def get_private_ip_address(self) -> str: + """ + Retrieves the private IP address of the virtual machine + """ + vm_model = self.get_model() + nic: NetworkInterface = self._network_client.network_interfaces.get( + resource_group_name=self.resource_group, + network_interface_name=vm_model.network_profile.network_interfaces[0].id.split('/')[ + -1]) # the name of the interface is the last component of the id + private_ip = nic.ip_configurations[0].private_ip_address + return private_ip + + def get_model(self) -> VirtualMachine: + """ + Retrieves the model of the virtual machine. + """ + log.info("Retrieving VM model for %s", self) + return execute_with_retry( + lambda: self._compute_client.virtual_machines.get( + resource_group_name=self.resource_group, + vm_name=self.name)) + + def get_instance_view(self) -> VirtualMachineInstanceView: + """ + Retrieves the instance view of the virtual machine + """ + log.info("Retrieving instance view for %s", self) + return execute_with_retry(lambda: self._compute_client.virtual_machines.get( + resource_group_name=self.resource_group, + vm_name=self.name, + expand="instanceView" + ).instance_view) + + def get_extensions(self) -> List[VirtualMachineExtension]: + """ + Retrieves the extensions installed on the virtual machine + """ + log.info("Retrieving extensions for %s", self) + return execute_with_retry( + lambda: self._compute_client.virtual_machine_extensions.list( + resource_group_name=self.resource_group, + vm_name=self.name)) + + def update(self, properties: Dict[str, Any], timeout: int = AzureSdkClient._DEFAULT_TIMEOUT) -> None: + """ + Updates a set of properties on the virtual machine + """ + # location is a required by begin_create_or_update, always add it + properties_copy = properties.copy() + properties_copy["location"] = self.location + + log.info("Updating %s with properties: %s", self, properties_copy) + + self._execute_async_operation( + lambda: self._compute_client.virtual_machines.begin_create_or_update( + self.resource_group, + self.name, + properties_copy), + operation_name=f"Update {self}", + timeout=timeout) + + def reapply(self, timeout: int = AzureSdkClient._DEFAULT_TIMEOUT) -> None: + """ + Reapplies the goal state on the virtual machine + """ + self._execute_async_operation( + lambda: self._compute_client.virtual_machines.begin_reapply(self.resource_group, self.name), + operation_name=f"Reapply {self}", + timeout=timeout) + + def restart( + self, + wait_for_boot, + ssh_client: SshClient = None, + boot_timeout: datetime.timedelta = datetime.timedelta(minutes=5), + timeout: int = AzureSdkClient._DEFAULT_TIMEOUT) -> None: + """ + Restarts (reboots) the virtual machine. + + NOTES: + * If wait_for_boot is True, an SshClient must be provided in order to verify that the restart was successful. + * 'timeout' is the timeout for the restart operation itself, while 'boot_timeout' is the timeout for waiting + the boot to complete. + """ + if wait_for_boot and ssh_client is None: + raise ValueError("An SshClient must be provided if wait_for_boot is True") + + before_restart = datetime.datetime.utcnow() + + self._execute_async_operation( + lambda: self._compute_client.virtual_machines.begin_restart( + resource_group_name=self.resource_group, + vm_name=self.name), + operation_name=f"Restart {self}", + timeout=timeout) + + if not wait_for_boot: + return + + start = datetime.datetime.utcnow() + while datetime.datetime.utcnow() < start + boot_timeout: + log.info("Waiting for VM %s to boot", self) + time.sleep(15) # Note that we always sleep at least 1 time, to give the reboot time to start + instance_view = self.get_instance_view() + power_state = [s.code for s in instance_view.statuses if "PowerState" in s.code] + if len(power_state) != 1: + raise Exception(f"Could not find PowerState in the instance view statuses:\n{json.dumps(instance_view.statuses)}") + log.info("VM's Power State: %s", power_state[0]) + if power_state[0] == "PowerState/running": + # We may get an instance view captured before the reboot actually happened; verify + # that the reboot actually happened by checking the system's uptime. + log.info("Verifying VM's uptime to ensure the reboot has completed...") + try: + uptime = ssh_client.run_command("cat /proc/uptime | sed 's/ .*//'", attempts=1).rstrip() # The uptime is the first field in the file + log.info("Uptime: %s", uptime) + boot_time = datetime.datetime.utcnow() - datetime.timedelta(seconds=float(uptime)) + if boot_time > before_restart: + log.info("VM %s completed boot and is running. Boot time: %s", self, boot_time) + return + log.info("The VM has not rebooted yet. Restart time: %s. Boot time: %s", before_restart, boot_time) + except CommandError as e: + if (e.exit_code == 255 and ("Connection refused" in str(e) or "Connection timed out" in str(e))) or "Unprivileged users are not permitted to log in yet" in str(e): + log.info("VM %s is not yet accepting SSH connections", self) + else: + raise + raise Exception(f"VM {self} did not boot after {boot_timeout}") + + def __str__(self): + return f"{self.resource_group}:{self.name}" diff --git a/tests_e2e/tests/lib/vm_extension.py b/tests_e2e/tests/lib/virtual_machine_extension_client.py similarity index 51% rename from tests_e2e/tests/lib/vm_extension.py rename to tests_e2e/tests/lib/virtual_machine_extension_client.py index eab676e75..d1f3e61a1 100644 --- a/tests_e2e/tests/lib/vm_extension.py +++ b/tests_e2e/tests/lib/virtual_machine_extension_client.py @@ -16,42 +16,47 @@ # # -# This module includes facilities to execute VM extension operations (enable, remove, etc) on single virtual machines (using -# class VmExtension) or virtual machine scale sets (using class VmssExtension). +# This module includes facilities to execute VM extension operations (enable, remove, etc). # - +import json import uuid -from abc import ABC, abstractmethod from assertpy import assert_that, soft_assertions -from typing import Any, Callable, Dict, Type +from typing import Any, Callable, Dict -from azure.core.polling import LROPoller from azure.mgmt.compute import ComputeManagementClient -from azure.mgmt.compute.models import VirtualMachineExtension, VirtualMachineScaleSetExtension, VirtualMachineExtensionInstanceView -from azure.identity import DefaultAzureCredential +from azure.mgmt.compute.models import VirtualMachineExtension, VirtualMachineExtensionInstanceView -from tests_e2e.tests.lib.identifiers import VmIdentifier, VmExtensionIdentifier +from tests_e2e.tests.lib.azure_sdk_client import AzureSdkClient +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIdentifier from tests_e2e.tests.lib.logging import log from tests_e2e.tests.lib.retry import execute_with_retry +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient -_TIMEOUT = 5 * 60 # Timeout for extension operations (in seconds) - - -class _VmExtensionBaseClass(ABC): +class VirtualMachineExtensionClient(AzureSdkClient): """ - Abstract base class for VmExtension and VmssExtension. - - Implements the operations that are common to virtual machines and scale sets. Derived classes must provide the specific types and methods for the - virtual machine or scale set. + Client for operations virtual machine extensions. """ - def __init__(self, vm: VmIdentifier, extension: VmExtensionIdentifier, resource_name: str): + def __init__(self, vm: VirtualMachineClient, extension: VmExtensionIdentifier, resource_name: str = None): super().__init__() - self._vm: VmIdentifier = vm + self._vm: VirtualMachineClient = vm self._identifier = extension - self._resource_name = resource_name - self._compute_client: ComputeManagementClient = ComputeManagementClient(credential=DefaultAzureCredential(), subscription_id=vm.subscription) + self._resource_name = resource_name or extension.type + self._compute_client: ComputeManagementClient = AzureSdkClient.create_client(ComputeManagementClient, self._vm.cloud, self._vm.subscription) + + def get_instance_view(self) -> VirtualMachineExtensionInstanceView: + """ + Retrieves the instance view of the extension + """ + log.info("Retrieving instance view for %s...", self._identifier) + + return execute_with_retry(lambda: self._compute_client.virtual_machine_extensions.get( + resource_group_name=self._vm.resource_group, + vm_name=self._vm.name, + vm_extension_name=self._resource_name, + expand="instanceView" + ).instance_view) def enable( self, @@ -59,12 +64,13 @@ def enable( protected_settings: Dict[str, Any] = None, auto_upgrade_minor_version: bool = True, force_update: bool = False, - force_update_tag: str = None + force_update_tag: str = None, + timeout: int = AzureSdkClient._DEFAULT_TIMEOUT ) -> None: """ Performs an enable operation on the extension. - NOTE: 'force_update' is not a parameter of the actual ARM API. It is provided for convenience: If set to True, + NOTE: 'force_update' is not a parameter of the actual ARM API. It is provided here for convenience: If set to True, the 'force_update_tag' can be left unspecified and this method will generate a random tag. """ if force_update_tag is not None and not force_update: @@ -73,7 +79,7 @@ def enable( if force_update and force_update_tag is None: force_update_tag = str(uuid.uuid4()) - extension_parameters = self._ExtensionType( + extension_parameters = VirtualMachineExtension( publisher=self._identifier.publisher, location=self._vm.location, type_properties_type=self._identifier.type, @@ -91,30 +97,28 @@ def enable( # Now set the actual protected settings before invoking the extension extension_parameters.protected_settings = protected_settings - result: VirtualMachineExtension = execute_with_retry( - lambda: self._begin_create_or_update( + result: VirtualMachineExtension = self._execute_async_operation( + lambda: self._compute_client.virtual_machine_extensions.begin_create_or_update( self._vm.resource_group, self._vm.name, self._resource_name, - extension_parameters - ).result(timeout=_TIMEOUT)) + extension_parameters), + operation_name=f"Enable {self._identifier}", + timeout=timeout) - if result.provisioning_state not in ('Succeeded', 'Updating'): - raise Exception(f"Enable {self._identifier} failed. Provisioning state: {result.provisioning_state}") - log.info("Enable completed (provisioning state: %s).", result.provisioning_state) + log.info("Provisioning state: %s", result.provisioning_state) - def get_instance_view(self) -> VirtualMachineExtensionInstanceView: # TODO: Check type for scale sets + def delete(self, timeout: int = AzureSdkClient._DEFAULT_TIMEOUT) -> None: """ - Retrieves the instance view of the extension + Performs a delete operation on the extension """ - log.info("Retrieving instance view for %s...", self._identifier) - - return execute_with_retry(lambda: self._get( - resource_group_name=self._vm.resource_group, - vm_name=self._vm.name, - vm_extension_name=self._resource_name, - expand="instanceView" - ).instance_view) + self._execute_async_operation( + lambda: self._compute_client.virtual_machine_extensions.begin_delete( + self._vm.resource_group, + self._vm.name, + self._resource_name), + operation_name=f"Delete {self._identifier}", + timeout=timeout) def assert_instance_view( self, @@ -130,7 +134,15 @@ def assert_instance_view( If 'assert_function' is provided, it is invoked passing as parameter the instance view. This function can be used to perform additional validations. """ + # Sometimes we get incomplete instance view with only 'name' property which causes issues during assertions. + # Retry attempt to get instance view if only 'name' property is populated. + attempt = 1 instance_view = self.get_instance_view() + while instance_view.name is not None and instance_view.type_handler_version is None and instance_view.statuses is None and attempt < 3: + log.info("Instance view is incomplete: %s\nRetrying attempt to get instance view...", instance_view.serialize()) + instance_view = self.get_instance_view() + attempt += 1 + log.info("Instance view:\n%s", json.dumps(instance_view.serialize(), indent=4)) with soft_assertions(): if expected_version is not None: @@ -151,89 +163,9 @@ def assert_instance_view( log.info("The instance view matches the expected values") - @abstractmethod - def delete(self) -> None: - """ - Performs a delete operation on the extension - """ - - @property - @abstractmethod - def _ExtensionType(self) -> Type: - """ - Type of the extension object for the virtual machine or scale set (i.e. VirtualMachineExtension or VirtualMachineScaleSetExtension) - """ - - @property - @abstractmethod - def _begin_create_or_update(self) -> Callable[[str, str, str, Any], LROPoller[Any]]: # "Any" can be VirtualMachineExtension or VirtualMachineScaleSetExtension - """ - The begin_create_or_update method for the virtual machine or scale set extension - """ - - @property - @abstractmethod - def _get(self) -> Any: # VirtualMachineExtension or VirtualMachineScaleSetExtension - """ - The get method for the virtual machine or scale set extension - """ - def __str__(self): return f"{self._identifier}" -class VmExtension(_VmExtensionBaseClass): - """ - Extension operations on a single virtual machine. - """ - @property - def _ExtensionType(self) -> Type: - return VirtualMachineExtension - - @property - def _begin_create_or_update(self) -> Callable[[str, str, str, VirtualMachineExtension], LROPoller[VirtualMachineExtension]]: - return self._compute_client.virtual_machine_extensions.begin_create_or_update - @property - def _get(self) -> VirtualMachineExtension: - return self._compute_client.virtual_machine_extensions.get - - def delete(self) -> None: - log.info("Deleting %s", self._identifier) - - execute_with_retry(lambda: self._compute_client.virtual_machine_extensions.begin_delete( - self._vm.resource_group, - self._vm.name, - self._resource_name - ).wait(timeout=_TIMEOUT)) - - -class VmssExtension(_VmExtensionBaseClass): - """ - Extension operations on virtual machine scale sets. - """ - @property - def _ExtensionType(self) -> Type: - return VirtualMachineScaleSetExtension - - @property - def _begin_create_or_update(self) -> Callable[[str, str, str, VirtualMachineScaleSetExtension], LROPoller[VirtualMachineScaleSetExtension]]: - return self._compute_client.virtual_machine_scale_set_extensions.begin_create_or_update - - @property - def _get(self) -> VirtualMachineScaleSetExtension: - return self._compute_client.virtual_machine_scale_set_extensions.get - - def delete(self) -> None: # TODO: Implement this method - raise NotImplementedError() - - def delete_from_instance(self, instance_id: str) -> None: - log.info("Deleting %s from scale set instance %s", self._identifier, instance_id) - - execute_with_retry(lambda: self._compute_client.virtual_machine_scale_set_vm_extensions.begin_delete( - resource_group_name=self._vm.resource_group, - vm_scale_set_name=self._vm.name, - vm_extension_name=self._resource_name, - instance_id=instance_id - ).wait(timeout=_TIMEOUT)) diff --git a/tests_e2e/tests/lib/virtual_machine_scale_set_client.py b/tests_e2e/tests/lib/virtual_machine_scale_set_client.py new file mode 100644 index 000000000..92738576c --- /dev/null +++ b/tests_e2e/tests/lib/virtual_machine_scale_set_client.py @@ -0,0 +1,107 @@ +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This module includes facilities to execute operations on virtual machines scale sets (list instances, delete, etc). +# + +import re + +from typing import List + +from azure.mgmt.compute import ComputeManagementClient +from azure.mgmt.compute.models import VirtualMachineScaleSetVM, VirtualMachineScaleSetInstanceView +from azure.mgmt.network import NetworkManagementClient + +from tests_e2e.tests.lib.azure_sdk_client import AzureSdkClient +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import execute_with_retry + + +class VmssInstanceIpAddress(object): + """ + IP address of a virtual machine scale set instance + """ + def __init__(self, instance_name: str, ip_address: str): + self.instance_name: str = instance_name + self.ip_address: str = ip_address + + def __str__(self): + return f"{self.instance_name}:{self.ip_address}" + + +class VirtualMachineScaleSetClient(AzureSdkClient): + """ + Provides operations on virtual machine scale sets. + """ + def __init__(self, cloud: str, location: str, subscription: str, resource_group: str, name: str): + super().__init__() + self.cloud: str = cloud + self.location = location + self.subscription: str = subscription + self.resource_group: str = resource_group + self.name: str = name + self._compute_client = AzureSdkClient.create_client(ComputeManagementClient, cloud, subscription) + self._network_client = AzureSdkClient.create_client(NetworkManagementClient, cloud, subscription) + + def list_vms(self) -> List[VirtualMachineScaleSetVM]: + """ + Returns the VM instances of the virtual machine scale set + """ + log.info("Retrieving instances of scale set %s", self) + return list(self._compute_client.virtual_machine_scale_set_vms.list(resource_group_name=self.resource_group, virtual_machine_scale_set_name=self.name)) + + def get_instances_ip_address(self) -> List[VmssInstanceIpAddress]: + """ + Returns a list containing the IP addresses of scale set instances + """ + log.info("Retrieving IP addresses of scale set %s", self) + ip_addresses = self._network_client.public_ip_addresses.list_virtual_machine_scale_set_public_ip_addresses(resource_group_name=self.resource_group, virtual_machine_scale_set_name=self.name) + ip_addresses = list(ip_addresses) + + def parse_instance(resource_id: str) -> str: + # the resource_id looks like /subscriptions/{subs}}/resourceGroups/{rg}/providers/Microsoft.Compute/virtualMachineScaleSets/{vmss}/virtualMachines/{instance}/networkInterfaces/{netiace}/ipConfigurations/ipconfig1/publicIPAddresses/{name} + match = re.search(r'virtualMachines/(?P[0-9])/networkInterfaces', resource_id) + if match is None: + raise Exception(f"Unable to parse instance from IP address ID:{resource_id}") + return match.group('instance') + + return [VmssInstanceIpAddress(instance_name=f"{self.name}_{parse_instance(a.id)}", ip_address=a.ip_address) for a in ip_addresses if a.ip_address is not None] + + def delete_extension(self, extension: str, timeout: int = AzureSdkClient._DEFAULT_TIMEOUT) -> None: + """ + Deletes the given operation + """ + log.info("Deleting extension %s from %s", extension, self) + self._execute_async_operation( + operation=lambda: self._compute_client.virtual_machine_scale_set_extensions.begin_delete(resource_group_name=self.resource_group, vm_scale_set_name=self.name, vmss_extension_name=extension), + operation_name=f"Delete {extension} from {self}", + timeout=timeout) + + def get_instance_view(self) -> VirtualMachineScaleSetInstanceView: + """ + Retrieves the instance view of the virtual machine + """ + log.info("Retrieving instance view for %s", self) + return execute_with_retry(lambda: self._compute_client.virtual_machine_scale_sets.get_instance_view( + resource_group_name=self.resource_group, + vm_scale_set_name=self.name + )) + + def __str__(self): + return f"{self.resource_group}:{self.name}" + diff --git a/tests_e2e/tests/lib/identifiers.py b/tests_e2e/tests/lib/vm_extension_identifier.py similarity index 55% rename from tests_e2e/tests/lib/identifiers.py rename to tests_e2e/tests/lib/vm_extension_identifier.py index 48794140b..9a11e4352 100644 --- a/tests_e2e/tests/lib/identifiers.py +++ b/tests_e2e/tests/lib/vm_extension_identifier.py @@ -15,35 +15,37 @@ # limitations under the License. # - -class VmIdentifier(object): - def __init__(self, location, subscription, resource_group, name): - """ - Represents the information that identifies a VM to the ARM APIs - """ - self.location = location - self.subscription: str = subscription - self.resource_group: str = resource_group - self.name: str = name - - def __str__(self): - return f"{self.resource_group}:{self.name}" +from typing import Dict, List class VmExtensionIdentifier(object): - def __init__(self, publisher, ext_type, version): - """ - Represents the information that identifies an extension to the ARM APIs + """ + Represents the information that identifies an extension to the ARM APIs - publisher - e.g. Microsoft.Azure.Extensions - type - e.g. CustomScript - version - e.g. 2.1, 2.* - name - arbitrary name for the extension ARM resource - """ + publisher - e.g. Microsoft.Azure.Extensions + type - e.g. CustomScript + version - e.g. 2.1, 2.* + name - arbitrary name for the extension ARM resource + """ + def __init__(self, publisher: str, ext_type: str, version: str): self.publisher: str = publisher self.type: str = ext_type self.version: str = version + unsupported_distros: Dict[str, List[str]] = { + "Microsoft.OSTCExtensions.VMAccessForLinux": ["flatcar"], + "Microsoft.Azure.Monitor.AzureMonitorLinuxAgent": ["flatcar", "mariner_1", "ubuntu_2404"] + } + + def supports_distro(self, system_info: str) -> bool: + """ + Returns true if an unsupported distro name for the extension is found in the provided system info + """ + ext_unsupported_distros = VmExtensionIdentifier.unsupported_distros.get(self.publisher + "." + self.type) + if ext_unsupported_distros is not None and any(distro in system_info for distro in ext_unsupported_distros): + return False + return True + def __str__(self): return f"{self.publisher}.{self.type}" @@ -61,3 +63,6 @@ class VmExtensionIds(object): # New run command extension, with support for multi-config RunCommandHandler: VmExtensionIdentifier = VmExtensionIdentifier(publisher='Microsoft.CPlat.Core', ext_type='RunCommandHandlerLinux', version="1.0") VmAccess: VmExtensionIdentifier = VmExtensionIdentifier(publisher='Microsoft.OSTCExtensions', ext_type='VMAccessForLinux', version="1.0") + GuestAgentDcrTestExtension: VmExtensionIdentifier = VmExtensionIdentifier(publisher='Microsoft.Azure.TestExtensions.Edp', ext_type='GuestAgentDcrTest', version='1.0') + AzureMonitorLinuxAgent: VmExtensionIdentifier = VmExtensionIdentifier(publisher='Microsoft.Azure.Monitor', ext_type='AzureMonitorLinuxAgent', version="1.5") + GATestExtension: VmExtensionIdentifier = VmExtensionIdentifier(publisher='Microsoft.Azure.Extensions.Edp', ext_type='GATestExtGo', version="1.2") diff --git a/tests_e2e/tests/multi_config_ext/multi_config_ext.py b/tests_e2e/tests/multi_config_ext/multi_config_ext.py new file mode 100644 index 000000000..4df75fd2b --- /dev/null +++ b/tests_e2e/tests/multi_config_ext/multi_config_ext.py @@ -0,0 +1,162 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test adds multiple instances of RCv2 and verifies that the extensions are processed and deleted as expected. +# + +import uuid +from typing import Dict, Callable, Any + +from assertpy import fail +from azure.mgmt.compute.models import VirtualMachineInstanceView + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.virtual_machine_client import VirtualMachineClient +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient + + +class MultiConfigExt(AgentVmTest): + class TestCase: + def __init__(self, extension: VirtualMachineExtensionClient, get_settings: Callable[[str], Dict[str, str]]): + self.extension = extension + self.get_settings = get_settings + self.test_guid: str = str(uuid.uuid4()) + + def enable_and_assert_test_cases(self, cases_to_enable: Dict[str, TestCase], cases_to_assert: Dict[str, TestCase], delete_extensions: bool = False): + for resource_name, test_case in cases_to_enable.items(): + log.info("") + log.info("Adding {0} to the test VM. guid={1}".format(resource_name, test_case.test_guid)) + test_case.extension.enable(settings=test_case.get_settings(test_case.test_guid)) + test_case.extension.assert_instance_view() + + log.info("") + log.info("Check that each extension has the expected guid in its status message...") + for resource_name, test_case in cases_to_assert.items(): + log.info("") + log.info("Checking {0} has expected status message with {1}".format(resource_name, test_case.test_guid)) + test_case.extension.assert_instance_view(expected_message=f"{test_case.test_guid}") + + # Delete each extension on the VM + if delete_extensions: + log.info("") + log.info("Delete each extension...") + self.delete_extensions(cases_to_assert) + + def delete_extensions(self, test_cases: Dict[str, TestCase]): + for resource_name, test_case in test_cases.items(): + log.info("") + log.info("Deleting {0} from the test VM".format(resource_name)) + test_case.extension.delete() + + log.info("") + + vm: VirtualMachineClient = VirtualMachineClient( + cloud=self._context.vm.cloud, + location=self._context.vm.location, + subscription=self._context.vm.subscription, + resource_group=self._context.vm.resource_group, + name=self._context.vm.name) + + instance_view: VirtualMachineInstanceView = vm.get_instance_view() + + if instance_view.extensions is not None: + for ext in instance_view.extensions: + if ext.name in test_cases.keys(): + fail("Extension was not deleted: \n{0}".format(ext)) + log.info("") + log.info("All extensions were successfully deleted.") + + def run(self): + # Create 3 different RCv2 extensions and a single config extension (CSE) and assign each a unique guid. Each + # extension will have settings that echo its assigned guid. We will use this guid to verify the extension + # statuses later. + mc_settings: Callable[[Any], Dict[str, Dict[str, str]]] = lambda s: { + "source": {"script": f"echo {s}"}} + sc_settings: Callable[[Any], Dict[str, str]] = lambda s: {'commandToExecute': f"echo {s}"} + + test_cases: Dict[str, MultiConfigExt.TestCase] = { + "MCExt1": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt1"), mc_settings), + "MCExt2": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt2"), mc_settings), + "MCExt3": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt3"), mc_settings), + "CSE": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript), sc_settings) + } + + # Add each extension to the VM and validate the instance view has succeeded status with its assigned guid in the + # status message + log.info("") + log.info("Add CSE and 3 instances of RCv2 to the VM. Each instance will echo a unique guid...") + self.enable_and_assert_test_cases(cases_to_enable=test_cases, cases_to_assert=test_cases) + + # Update MCExt3 and CSE with new guids and add a new instance of RCv2 to the VM + updated_test_cases: Dict[str, MultiConfigExt.TestCase] = { + "MCExt3": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt3"), mc_settings), + "MCExt4": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt4"), mc_settings), + "CSE": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript), sc_settings) + } + test_cases.update(updated_test_cases) + + # Enable only the updated extensions, verify every extension has the correct test guid is in status message, and + # remove all extensions from the test vm + log.info("") + log.info("Update MCExt3 and CSE with new guids and add a new instance of RCv2 to the VM...") + self.enable_and_assert_test_cases(cases_to_enable=updated_test_cases, cases_to_assert=test_cases, + delete_extensions=True) + + # Enable, verify, and remove only multi config extensions + log.info("") + log.info("Add only multi-config extensions to the VM...") + mc_test_cases: Dict[str, MultiConfigExt.TestCase] = { + "MCExt5": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt5"), mc_settings), + "MCExt6": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.RunCommandHandler, + resource_name="MCExt6"), mc_settings) + } + self.enable_and_assert_test_cases(cases_to_enable=mc_test_cases, cases_to_assert=mc_test_cases, + delete_extensions=True) + + # Enable, verify, and delete only single config extensions + log.info("") + log.info("Add only single-config extension to the VM...") + sc_test_cases: Dict[str, MultiConfigExt.TestCase] = { + "CSE": MultiConfigExt.TestCase( + VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript), sc_settings) + } + self.enable_and_assert_test_cases(cases_to_enable=sc_test_cases, cases_to_assert=sc_test_cases, + delete_extensions=True) + + +if __name__ == "__main__": + MultiConfigExt.run_from_command_line() diff --git a/tests_e2e/tests/no_outbound_connections/check_fallback_to_hgap.py b/tests_e2e/tests/no_outbound_connections/check_fallback_to_hgap.py new file mode 100755 index 000000000..48827dbe1 --- /dev/null +++ b/tests_e2e/tests/no_outbound_connections/check_fallback_to_hgap.py @@ -0,0 +1,51 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from assertpy import assert_that + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient + + +class CheckFallbackToHGAP(AgentVmTest): + """ + Check the agent log to verify that the default channel was changed to HostGAPlugin before executing any extensions. + """ + def run(self): + # 2023-04-14T14:49:43.005530Z INFO ExtHandler ExtHandler Default channel changed to HostGAPlugin channel. + # 2023-04-14T14:49:44.625061Z INFO ExtHandler [Microsoft.Azure.Monitor.AzureMonitorLinuxAgent-1.25.2] Target handler state: enabled [incarnation_2] + + ssh_client: SshClient = self._context.create_ssh_client() + log.info("Parsing agent log on the test VM") + output = ssh_client.run_command("grep -E 'INFO ExtHandler.*(Default channel changed to HostGAPlugin)|(Target handler state:)' /var/log/waagent.log | head").split('\n') + log.info("Output (first 10 lines) from the agent log:\n\t\t%s", '\n\t\t'.join(output)) + + assert_that(len(output) > 1).is_true().described_as( + "The agent log should contain multiple matching records" + ) + assert_that(output[0]).contains("Default channel changed to HostGAPlugin").described_as( + "The agent log should contain a record indicating that the default channel was changed to HostGAPlugin before executing any extensions" + ) + + log.info("The agent log indicates that the default channel was changed to HostGAPlugin before executing any extensions") + + +if __name__ == "__main__": + CheckFallbackToHGAP.run_from_command_line() + diff --git a/tests_e2e/tests/no_outbound_connections/check_no_outbound_connections.py b/tests_e2e/tests/no_outbound_connections/check_no_outbound_connections.py new file mode 100755 index 000000000..985e77b70 --- /dev/null +++ b/tests_e2e/tests/no_outbound_connections/check_no_outbound_connections.py @@ -0,0 +1,59 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from assertpy import fail + +from tests_e2e.tests.lib.agent_test import AgentVmTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.ssh_client import SshClient + + +class CheckNoOutboundConnections(AgentVmTest): + """ + Verifies that there is no outbound connectivity on the test VM. + """ + def run(self): + # This script is executed on the test VM. It tries to connect to a well-known DNS server (DNS is on port 53). + script: str = """ +import socket, sys + +try: + socket.setdefaulttimeout(5) + socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(("8.8.8.8", 53)) +except socket.timeout: + print("No outbound connectivity [expected]") + exit(0) +print("There is outbound connectivity [unexpected: the custom ARM template should not allow it]", file=sys.stderr) +exit(1) +""" + ssh_client: SshClient = self._context.create_ssh_client() + try: + log.info("Verifying that there is no outbound connectivity on the test VM") + ssh_client.run_command("pypy3 -c '{0}'".format(script.replace('"', '\"'))) + log.info("There is no outbound connectivity, as expected.") + except CommandError as e: + if e.exit_code == 1 and "There is outbound connectivity" in e.stderr: + fail("There is outbound connectivity on the test VM, the custom ARM template should not allow it") + else: + raise Exception(f"Unexpected error while checking outbound connectivity on the test VM: {e}") + + +if __name__ == "__main__": + CheckNoOutboundConnections.run_from_command_line() + diff --git a/tests_e2e/tests/no_outbound_connections/deny_outbound_connections.py b/tests_e2e/tests/no_outbound_connections/deny_outbound_connections.py new file mode 100755 index 000000000..b7cc87886 --- /dev/null +++ b/tests_e2e/tests/no_outbound_connections/deny_outbound_connections.py @@ -0,0 +1,47 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import json + +from typing import Any, Dict + +from tests_e2e.tests.lib.network_security_rule import NetworkSecurityRule +from tests_e2e.tests.lib.update_arm_template import UpdateArmTemplate + + +class DenyOutboundConnections(UpdateArmTemplate): + """ + Updates the ARM template to add a security rule that denies all outbound connections. + """ + def update(self, template: Dict[str, Any], is_lisa_template: bool) -> None: + NetworkSecurityRule(template, is_lisa_template).add_security_rule( + json.loads("""{ + "name": "waagent-no-outbound", + "properties": { + "description": "Denies all outbound connections.", + "protocol": "*", + "sourcePortRange": "*", + "destinationPortRange": "*", + "sourceAddressPrefix": "*", + "destinationAddressPrefix": "Internet", + "access": "Deny", + "priority": 200, + "direction": "Outbound" + } + }""")) diff --git a/tests_e2e/tests/publish_hostname/publish_hostname.py b/tests_e2e/tests/publish_hostname/publish_hostname.py new file mode 100644 index 000000000..45a7be85f --- /dev/null +++ b/tests_e2e/tests/publish_hostname/publish_hostname.py @@ -0,0 +1,209 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test updates the hostname and checks that the agent published the hostname to DNS. It also checks that the +# primary network is up after publishing the hostname. This test was added in response to a bug in publishing the +# hostname on fedora distros, where there was a race condition between NetworkManager restart and Network Interface +# restart which caused the primary interface to go down. +# + +import datetime +import re + +from assertpy import fail +from time import sleep + +from tests_e2e.tests.lib.shell import CommandError +from tests_e2e.tests.lib.agent_test import AgentVmTest, TestSkipped +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log + + +class PublishHostname(AgentVmTest): + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._context = context + self._ssh_client = context.create_ssh_client() + self._private_ip = context.vm.get_private_ip_address() + self._vm_password = "" + + def add_vm_password(self): + # Add password to VM to help with debugging in case of failure + # REMOVE PWD FROM LOGS IF WE EVER MAKE THESE RUNS/LOGS PUBLIC + username = self._ssh_client.username + pwd = self._ssh_client.run_command("openssl rand -base64 32 | tr : .").rstrip() + self._vm_password = pwd + log.info("VM Username: {0}; VM Password: {1}".format(username, pwd)) + self._ssh_client.run_command("echo '{0}:{1}' | sudo -S chpasswd".format(username, pwd)) + + def check_and_install_dns_tools(self): + lookup_cmd = "dig -x {0}".format(self._private_ip) + dns_regex = r"[\S\s]*;; ANSWER SECTION:\s.*PTR\s*(?P.*)\.internal\.(cloudapp\.net|chinacloudapp\.cn|usgovcloudapp\.net).*[\S\s]*" + + # Not all distros come with dig. Install dig if not on machine + try: + self._ssh_client.run_command("dig -v") + except CommandError as e: + if "dig: command not found" in e.stderr: + distro = self._ssh_client.run_command("get_distro.py").rstrip().lower() + if "debian_9" in distro: + # Debian 9 hostname look up needs to be done with "host" instead of dig + lookup_cmd = "host {0}".format(self._private_ip) + dns_regex = r".*pointer\s(?P.*)\.internal\.(cloudapp\.net|chinacloudapp\.cn|usgovcloudapp\.net).*" + elif "debian" in distro: + self._ssh_client.run_command("apt install -y dnsutils", use_sudo=True) + elif "alma" in distro or "rocky" in distro: + self._ssh_client.run_command("dnf install -y bind-utils", use_sudo=True) + else: + raise + else: + raise + + return lookup_cmd, dns_regex + + def check_agent_reports_status(self): + status_updated = False + last_agent_status_time = self._context.vm.get_instance_view().vm_agent.statuses[0].time + log.info("Agent reported status at {0}".format(last_agent_status_time)) + retries = 3 + + while retries > 0 and not status_updated: + agent_status_time = self._context.vm.get_instance_view().vm_agent.statuses[0].time + if agent_status_time != last_agent_status_time: + status_updated = True + log.info("Agent reported status at {0}".format(last_agent_status_time)) + else: + retries -= 1 + sleep(60) + + if not status_updated: + fail("Agent hasn't reported status since {0} and ssh connection failed. Use the serial console in portal " + "to check the contents of '/sys/class/net/eth0/operstate'. If the contents of this file are 'up', " + "no further action is needed. If contents are 'down', that indicates the network interface is down " + "and more debugging needs to be done to confirm this is not caused by the agent.\n VM: {1}\n RG: {2}" + "\nSubscriptionId: {3}\nUsername: {4}\nPassword: {5}".format(last_agent_status_time, + self._context.vm, + self._context.vm.resource_group, + self._context.vm.subscription, + self._context.username, + self._vm_password)) + + def retry_ssh_if_connection_reset(self, command: str, use_sudo=False): + # The agent may bring the network down and back up to publish the hostname, which can reset the ssh connection. + # Adding retry here for connection reset. + retries = 3 + while retries > 0: + try: + return self._ssh_client.run_command(command, use_sudo=use_sudo) + except CommandError as e: + retries -= 1 + retryable = e.exit_code == 255 and "Connection reset by peer" in e.stderr + if not retryable or retries == 0: + raise + log.warning("The SSH operation failed, retrying in 30 secs") + sleep(30) + + def run(self): + # TODO: Investigate why hostname is not being published on Ubuntu, alma, and rocky as expected + distros_with_known_publishing_issues = ["ubuntu", "alma", "rocky"] + distro = self._ssh_client.run_command("get_distro.py").lower() + if any(d in distro for d in distros_with_known_publishing_issues): + raise TestSkipped("Known issue with hostname publishing on this distro. Will skip test until we continue " + "investigation.") + + # Add password to VM and log. This allows us to debug with serial console if necessary + self.add_vm_password() + + # This test looks up what hostname is published to dns. Check that the tools necessary to get hostname are + # installed, and if not install them. + lookup_cmd, dns_regex = self.check_and_install_dns_tools() + + # Check if this distro monitors hostname changes. If it does, we should check that the agent detects the change + # and publishes the host name. If it doesn't, we should check that the hostname is automatically published. + monitors_hostname = self._ssh_client.run_command("get-waagent-conf-value Provisioning.MonitorHostName", use_sudo=True).rstrip().lower() + + hostname_change_ctr = 0 + # Update the hostname 3 times + while hostname_change_ctr < 3: + try: + hostname = "hostname-monitor-{0}".format(hostname_change_ctr) + log.info("Update hostname to {0}".format(hostname)) + self.retry_ssh_if_connection_reset("hostnamectl set-hostname {0}".format(hostname), use_sudo=True) + + # Wait for the agent to detect the hostname change for up to 2 minutes if hostname monitoring is enabled + if monitors_hostname == "y" or monitors_hostname == "yes": + log.info("Agent hostname monitoring is enabled") + timeout = datetime.datetime.now() + datetime.timedelta(minutes=2) + hostname_detected = "" + while datetime.datetime.now() <= timeout: + try: + hostname_detected = self.retry_ssh_if_connection_reset("grep -n 'Detected hostname change:.*-> {0}' /var/log/waagent.log".format(hostname), use_sudo=True) + if hostname_detected: + log.info("Agent detected hostname change: {0}".format(hostname_detected)) + break + except CommandError as e: + # Exit code 1 indicates grep did not find a match. Sleep if exit code is 1, otherwise raise. + if e.exit_code != 1: + raise + sleep(15) + + if not hostname_detected: + fail("Agent did not detect hostname change: {0}".format(hostname)) + else: + log.info("Agent hostname monitoring is disabled") + + # Check that the expected hostname is published with 4 minute timeout + timeout = datetime.datetime.now() + datetime.timedelta(minutes=4) + published_hostname = "" + while datetime.datetime.now() <= timeout: + try: + dns_info = self.retry_ssh_if_connection_reset(lookup_cmd) + actual_hostname = re.match(dns_regex, dns_info) + if actual_hostname: + # Compare published hostname to expected hostname + published_hostname = actual_hostname.group('hostname') + if hostname == published_hostname: + log.info("SUCCESS Hostname {0} was published successfully".format(hostname)) + break + else: + log.info("Unable to parse the dns info: {0}".format(dns_info)) + except CommandError as e: + if "NXDOMAIN" in e.stdout: + log.info("DNS Lookup could not find domain. Will try again.") + else: + raise + sleep(30) + + if published_hostname == "" or published_hostname != hostname: + fail("Hostname {0} was not published successfully. Actual host name is: {1}".format(hostname, published_hostname)) + + hostname_change_ctr += 1 + + except CommandError as e: + # If failure is ssh issue, we should confirm that the VM did not lose network connectivity due to the + # agent's operations on the network. If agent reports status after this failure, then we know the + # network is up. + if e.exit_code == 255 and ("Connection timed out" in e.stderr or "Connection refused" in e.stderr): + self.check_agent_reports_status() + raise + + +if __name__ == "__main__": + PublishHostname.run_from_command_line() diff --git a/tests_e2e/tests/recover_network_interface/recover_network_interface.py b/tests_e2e/tests/recover_network_interface/recover_network_interface.py new file mode 100644 index 000000000..39799d375 --- /dev/null +++ b/tests_e2e/tests/recover_network_interface/recover_network_interface.py @@ -0,0 +1,139 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# This test uses CSE to bring the network down and call check_and_recover_nic_state to bring the network back into an +# 'up' and 'connected' state. The intention of the test is to alert us if there is some change in newer distros which +# affects this logic. +# + +import json +from typing import List, Dict, Any + +from assertpy import fail, assert_that +from time import sleep + +from tests_e2e.tests.lib.agent_test import AgentVmTest, TestSkipped +from tests_e2e.tests.lib.agent_test_context import AgentVmTestContext +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.virtual_machine_extension_client import VirtualMachineExtensionClient +from tests_e2e.tests.lib.vm_extension_identifier import VmExtensionIds + + +class RecoverNetworkInterface(AgentVmTest): + def __init__(self, context: AgentVmTestContext): + super().__init__(context) + self._context = context + self._ssh_client = context.create_ssh_client() + self._private_ip = context.vm.get_private_ip_address() + self._vm_password = "" + + def add_vm_password(self): + # Add password to VM to help with debugging in case of failure + # REMOVE PWD FROM LOGS IF WE EVER MAKE THESE RUNS/LOGS PUBLIC + username = self._ssh_client.username + pwd = self._ssh_client.run_command("openssl rand -base64 32 | tr : .").rstrip() + self._vm_password = pwd + log.info("VM Username: {0}; VM Password: {1}".format(username, pwd)) + self._ssh_client.run_command("echo '{0}:{1}' | sudo -S chpasswd".format(username, pwd)) + + def check_agent_reports_status(self): + status_updated = False + last_agent_status_time = self._context.vm.get_instance_view().vm_agent.statuses[0].time + log.info("Agent reported status at {0}".format(last_agent_status_time)) + retries = 3 + + while retries > 0 and not status_updated: + agent_status_time = self._context.vm.get_instance_view().vm_agent.statuses[0].time + if agent_status_time != last_agent_status_time: + status_updated = True + log.info("Agent reported status at {0}".format(last_agent_status_time)) + else: + retries -= 1 + sleep(60) + + if not status_updated: + fail("Agent hasn't reported status since {0} and ssh connection failed. Use the serial console in portal " + "to debug".format(last_agent_status_time)) + + def run(self): + # Add password to VM and log. This allows us to debug with serial console if necessary + log.info("") + log.info("Adding password to the VM to use for debugging in case necessary...") + self.add_vm_password() + + # Skip the test if NM_CONTROLLED=n. The current recover logic does not work in this case + result = self._ssh_client.run_command("recover_network_interface-get_nm_controlled.py", use_sudo=True) + if "Interface is NOT NM controlled" in result: + raise TestSkipped("Current recover method will not work on interfaces where NM_Controlled=n") + + # Get the primary network interface name + ifname = self._ssh_client.run_command("pypy3 -c 'from azurelinuxagent.common.osutil.redhat import RedhatOSUtil; print(RedhatOSUtil().get_if_name())'").rstrip() + # The interface name needs to be in double quotes for the pypy portion of the script + formatted_ifname = f'"{ifname}"' + + # The script should bring the primary network interface down and use the agent to recover the interface. These + # commands will bring the network down, so they should be executed on the machine using CSE instead of ssh. + script = f""" + set -euxo pipefail + ifdown {ifname}; + nic_state=$(nmcli -g general.state device show {ifname}) + echo Primary network interface state before recovering: $nic_state + source /home/{self._context.username}/bin/set-agent-env; + pypy3 -c 'from azurelinuxagent.common.osutil.redhat import RedhatOSUtil; RedhatOSUtil().check_and_recover_nic_state({formatted_ifname})'; + nic_state=$(nmcli -g general.state device show {ifname}); + echo Primary network interface state after recovering: $nic_state + """ + log.info("") + log.info("Using CSE to bring the primary network interface down and call the OSUtil to bring the interface back up. Command to execute: {0}".format(script)) + custom_script = VirtualMachineExtensionClient(self._context.vm, VmExtensionIds.CustomScript, resource_name="CustomScript") + custom_script.enable(protected_settings={'commandToExecute': script}, settings={}) + + # Check that the interface was down and brought back up in instance view + log.info("") + log.info("Checking the instance view to confirm the primary network interface was brought down and successfully recovered by the agent...") + instance_view = custom_script.get_instance_view() + log.info("Instance view for custom script after enable is: {0}".format(json.dumps(instance_view.serialize(), indent=4))) + assert_that(len(instance_view.statuses)).described_as("Instance view should have a status for CustomScript").is_greater_than(0) + assert_that(instance_view.statuses[0].message).described_as("The primary network interface should be in a disconnected state before the attempt to recover").contains("Primary network interface state before recovering: 30 (disconnected)") + assert_that(instance_view.statuses[0].message).described_as("The primary network interface should be in a connected state after the attempt to recover").contains("Primary network interface state after recovering: 100 (connected)") + + # Check that the agent is successfully reporting status after recovering the network + log.info("") + log.info("Checking that the agent is reporting status after recovering the network...") + self.check_agent_reports_status() + + log.info("") + log.info("The primary network interface was successfully recovered by the agent.") + + def get_ignore_error_rules(self) -> List[Dict[str, Any]]: + ignore_rules = [ + # + # We may see temporary network unreachable warnings since we are bringing the network interface down + # 2024-02-01T23:40:03.563499Z ERROR ExtHandler ExtHandler Error fetching the goal state: [ProtocolError] GET vmSettings [correlation ID: ac21bdd7-1a7a-4bba-b307-b9d5bc30da33 eTag: 941323814975149980]: Request failed: [Errno 101] Network is unreachable + # + { + 'message': r"Error fetching the goal state: \[ProtocolError\] GET vmSettings.*Request failed: \[Errno 101\] Network is unreachable" + } + ] + return ignore_rules + + +if __name__ == "__main__": + RecoverNetworkInterface.run_from_command_line() diff --git a/tests_e2e/tests/samples/error_remote_test.py b/tests_e2e/tests/samples/error_remote_test.py new file mode 100755 index 000000000..6b52e46cd --- /dev/null +++ b/tests_e2e/tests/samples/error_remote_test.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from tests_e2e.tests.lib.agent_test import AgentVmTest + + +class ErrorRemoteTest(AgentVmTest): + """ + A trivial remote test that fails + """ + def run(self): + self._run_remote_test(self._context.create_ssh_client(), "samples-error_remote_test.py") + + +if __name__ == "__main__": + ErrorRemoteTest.run_from_command_line() diff --git a/tests_e2e/tests/error_test.py b/tests_e2e/tests/samples/error_test.py similarity index 83% rename from tests_e2e/tests/error_test.py rename to tests_e2e/tests/samples/error_test.py index cf369f7d3..e2d584c6e 100755 --- a/tests_e2e/tests/error_test.py +++ b/tests_e2e/tests/samples/error_test.py @@ -17,15 +17,15 @@ # limitations under the License. # -from tests_e2e.tests.lib.agent_test import AgentTest +from tests_e2e.tests.lib.agent_test import AgentVmTest -class ErrorTest(AgentTest): +class ErrorTest(AgentVmTest): """ A trivial test that errors out """ def run(self): - raise Exception("* ERROR *") + raise Exception("* TEST ERROR *") # simulate an unexpected error if __name__ == "__main__": diff --git a/tests_e2e/tests/samples/fail_remote_test.py b/tests_e2e/tests/samples/fail_remote_test.py new file mode 100755 index 000000000..7a05b67a9 --- /dev/null +++ b/tests_e2e/tests/samples/fail_remote_test.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from tests_e2e.tests.lib.agent_test import AgentVmTest + + +class FailRemoteTest(AgentVmTest): + """ + A trivial remote test that fails + """ + def run(self): + self._run_remote_test(self._context.create_ssh_client(), "samples-fail_remote_test.py") + + +if __name__ == "__main__": + FailRemoteTest.run_from_command_line() diff --git a/tests_e2e/tests/fail_test.py b/tests_e2e/tests/samples/fail_test.py similarity index 87% rename from tests_e2e/tests/fail_test.py rename to tests_e2e/tests/samples/fail_test.py index e96b5bcf7..dfdecb52f 100755 --- a/tests_e2e/tests/fail_test.py +++ b/tests_e2e/tests/samples/fail_test.py @@ -18,15 +18,15 @@ # from assertpy import fail -from tests_e2e.tests.lib.agent_test import AgentTest +from tests_e2e.tests.lib.agent_test import AgentVmTest -class FailTest(AgentTest): +class FailTest(AgentVmTest): """ A trivial test that fails """ def run(self): - fail("* FAILED *") + fail("* TEST FAILED *") if __name__ == "__main__": diff --git a/tests_e2e/tests/samples/pass_remote_test.py b/tests_e2e/tests/samples/pass_remote_test.py new file mode 100755 index 000000000..609ef4d4c --- /dev/null +++ b/tests_e2e/tests/samples/pass_remote_test.py @@ -0,0 +1,32 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from tests_e2e.tests.lib.agent_test import AgentVmTest + + +class PassRemoteTest(AgentVmTest): + """ + A trivial remote test that succeeds + """ + def run(self): + self._run_remote_test(self._context.create_ssh_client(), "samples-pass_remote_test.py") + + +if __name__ == "__main__": + PassRemoteTest.run_from_command_line() diff --git a/tests_e2e/tests/pass_test.py b/tests_e2e/tests/samples/pass_test.py similarity index 91% rename from tests_e2e/tests/pass_test.py rename to tests_e2e/tests/samples/pass_test.py index 580db2dc0..d7c85a355 100755 --- a/tests_e2e/tests/pass_test.py +++ b/tests_e2e/tests/samples/pass_test.py @@ -17,11 +17,11 @@ # limitations under the License. # -from tests_e2e.tests.lib.agent_test import AgentTest +from tests_e2e.tests.lib.agent_test import AgentVmTest from tests_e2e.tests.lib.logging import log -class PassTest(AgentTest): +class PassTest(AgentVmTest): """ A trivial test that passes. """ diff --git a/tests_e2e/tests/samples/vmss_test.py b/tests_e2e/tests/samples/vmss_test.py new file mode 100755 index 000000000..0f50dad8f --- /dev/null +++ b/tests_e2e/tests/samples/vmss_test.py @@ -0,0 +1,37 @@ +#!/usr/bin/env python3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from tests_e2e.tests.lib.agent_test import AgentVmssTest +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.ssh_client import SshClient + + +class VmssTest(AgentVmssTest): + """ + Sample test for scale sets + """ + def run(self): + for address in self._context.vmss.get_instances_ip_address(): + ssh_client: SshClient = SshClient(ip_address=address.ip_address, username=self._context.username, identity_file=self._context.identity_file) + log.info("%s: Hostname: %s", address.instance_name, ssh_client.run_command("hostname").strip()) + log.info("* PASSED *") + + +if __name__ == "__main__": + VmssTest.run_from_command_line() diff --git a/tests_e2e/tests/scripts/agent_cgroups-check_cgroups_agent.py b/tests_e2e/tests/scripts/agent_cgroups-check_cgroups_agent.py new file mode 100755 index 000000000..2f3b877a0 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_cgroups-check_cgroups_agent.py @@ -0,0 +1,115 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import os +import re + +from assertpy import fail + +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.cgroup_helpers import BASE_CGROUP, AGENT_CONTROLLERS, get_agent_cgroup_mount_path, \ + AGENT_SERVICE_NAME, verify_if_distro_supports_cgroup, print_cgroups, \ + verify_agent_cgroup_assigned_correctly +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test + + +def verify_if_cgroup_controllers_are_mounted(): + """ + Checks if controllers CPU, Memory that agent use are mounted in the system + """ + log.info("===== Verifying cgroup controllers that agent use are mounted in the system") + + all_controllers_present = os.path.exists(BASE_CGROUP) + missing_controllers = [] + mounted_controllers = [] + + for controller in AGENT_CONTROLLERS: + controller_path = os.path.join(BASE_CGROUP, controller) + if not os.path.exists(controller_path): + all_controllers_present = False + missing_controllers.append(controller_path) + else: + mounted_controllers.append(controller_path) + + if not all_controllers_present: + fail('Not all of the controllers {0} mounted in expected cgroups. Mounted controllers are: {1}.\n ' + 'Missing controllers are: {2} \n System mounted cgroups are:\n{3}'.format(AGENT_CONTROLLERS, mounted_controllers, missing_controllers, print_cgroups())) + + log.info('Verified all cgroup controllers are present.\n {0}'.format(mounted_controllers)) + + +def verify_agent_cgroup_created_on_file_system(): + """ + Checks agent service is running in azure.slice/{agent_service) cgroup and mounted in same system cgroup controllers mounted path + """ + log.info("===== Verifying the agent cgroup paths exist on file system") + agent_cgroup_mount_path = get_agent_cgroup_mount_path() + all_agent_cgroup_controllers_path_exist = True + missing_agent_cgroup_controllers_path = [] + verified_agent_cgroup_controllers_path = [] + + log.info("expected agent cgroup mount path: %s", agent_cgroup_mount_path) + + for controller in AGENT_CONTROLLERS: + agent_controller_path = os.path.join(BASE_CGROUP, controller, agent_cgroup_mount_path[1:]) + + if not os.path.exists(agent_controller_path): + all_agent_cgroup_controllers_path_exist = False + missing_agent_cgroup_controllers_path.append(agent_controller_path) + else: + verified_agent_cgroup_controllers_path.append(agent_controller_path) + + if not all_agent_cgroup_controllers_path_exist: + fail("Agent's cgroup paths couldn't be found on file system. Missing agent cgroups path :{0}.\n Verified agent cgroups path:{1}".format(missing_agent_cgroup_controllers_path, verified_agent_cgroup_controllers_path)) + + log.info('Verified all agent cgroup paths are present.\n {0}'.format(verified_agent_cgroup_controllers_path)) + + +def verify_agent_cgroups_tracked(): + """ + Checks if agent is tracking agent cgroups path for polling resource usage. This is verified by checking the agent log for the message "Started tracking cgroup" + """ + log.info("===== Verifying agent started tracking cgroups from the log") + + tracking_agent_cgroup_message_re = r'Started tracking cgroup [^\s]+\s+\[(?P[^\s]+)\]' + tracked_cgroups = [] + + for record in AgentLog().read(): + match = re.search(tracking_agent_cgroup_message_re, record.message) + if match is not None: + tracked_cgroups.append(match.group('path')) + + for controller in AGENT_CONTROLLERS: + if not any(AGENT_SERVICE_NAME in cgroup_path and controller in cgroup_path for cgroup_path in tracked_cgroups): + fail('Agent {0} is not being tracked. Tracked cgroups:{1}'.format(controller, tracked_cgroups)) + + log.info("Agent is tracking cgroups correctly.\n%s", tracked_cgroups) + + +def main(): + verify_if_distro_supports_cgroup() + + verify_if_cgroup_controllers_are_mounted() + verify_agent_cgroup_created_on_file_system() + + verify_agent_cgroup_assigned_correctly() + verify_agent_cgroups_tracked() + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/agent_cpu_quota-check_agent_cpu_quota.py b/tests_e2e/tests/scripts/agent_cpu_quota-check_agent_cpu_quota.py new file mode 100755 index 000000000..c8aad49f5 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_cpu_quota-check_agent_cpu_quota.py @@ -0,0 +1,215 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import datetime +import os +import re +import shutil +import time + +from assertpy import fail + +from azurelinuxagent.common.osutil import systemd +from azurelinuxagent.common.utils import shellutil +from azurelinuxagent.ga.cgroupconfigurator import _DROP_IN_FILE_CPU_QUOTA +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.cgroup_helpers import check_agent_quota_disabled, \ + get_agent_cpu_quota +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +from tests_e2e.tests.lib.retry import retry_if_false + + +def prepare_agent(): + # This function prepares the agent: + # 1) It modifies the service unit file to wrap the agent process with a script that starts the actual agent and then + # launches an instance of the dummy process to consume the CPU. Since all these processes are in the same cgroup, + # this has the same effect as the agent itself consuming the CPU. + # + # The process tree is similar to + # + # /usr/bin/python3 /home/azureuser/bin/agent_cpu_quota-start_service.py /usr/bin/python3 -u /usr/sbin/waagent -daemon + # ├─/usr/bin/python3 -u /usr/sbin/waagent -daemon + # │ └─python3 -u bin/WALinuxAgent-9.9.9.9-py3.8.egg -run-exthandlers + # │ └─4*[{python3}] + # ├─dd if=/dev/zero of=/dev/null + # │ + # └─{python3} + # + # And the agent's cgroup looks like + # + # CGroup: /azure.slice/walinuxagent.service + # ├─10507 /usr/bin/python3 /home/azureuser/bin/agent_cpu_quota-start_service.py /usr/bin/python3 -u /usr/sbin/waagent -daemon + # ├─10508 /usr/bin/python3 -u /usr/sbin/waagent -daemon + # ├─10516 python3 -u bin/WALinuxAgent-9.9.9.9-py3.8.egg -run-exthandlers + # ├─10711 dd if=/dev/zero of=/dev/null + # + # 2) It turns on a few debug flags and resart the agent + log.info("***Preparing agent for testing cpu quota") + # + # Create a drop in file to wrap "start-service.py" around the actual agent: This will ovveride the ExecStart line in the agent's unit file + # + # ExecStart= (need to be empty to clear the original ExecStart) + # ExecStart=/home/.../agent_cgroups-start-service.py /usr/bin/python3 -u /usr/sbin/waagent -daemon + # + service_file = systemd.get_agent_unit_file() + exec_start = None + with open(service_file, "r") as file_: + for line in file_: + match = re.match("ExecStart=(.+)", line) + if match is not None: + exec_start = match.group(1) + break + else: + file_.seek(0) + raise Exception("Could not find ExecStart in {0}\n:{1}".format(service_file, file_.read())) + agent_python = exec_start.split()[0] + current_directory = os.path.dirname(os.path.abspath(__file__)) + start_service_script = os.path.join(current_directory, "agent_cpu_quota-start_service.py") + drop_in_file = os.path.join(systemd.get_agent_drop_in_path(), "99-ExecStart.conf") + log.info("Creating %s...", drop_in_file) + with open(drop_in_file, "w") as file_: + file_.write(""" +[Service] +ExecStart= +ExecStart={0} {1} {2} +""".format(agent_python, start_service_script, exec_start)) + log.info("Executing daemon-reload") + shellutil.run_command(["systemctl", "daemon-reload"]) + + # Disable all checks on cgroups and enable log metrics every 20 sec + log.info("Executing script update-waagent-conf to enable agent cgroups config flag") + result = shellutil.run_command(["update-waagent-conf", "Debug.CgroupCheckPeriod=20", "Debug.CgroupLogMetrics=y", + "Debug.CgroupDisableOnProcessCheckFailure=n", "Debug.CgroupDisableOnQuotaCheckFailure=n"]) + log.info("Successfully enabled agent cgroups config flag: {0}".format(result)) + + +def verify_agent_reported_metrics(): + """ + This method verifies that the agent reports % Processor Time and Throttled Time metrics + """ + log.info("** Verifying agent reported metrics") + log.info("Parsing agent log for metrics") + processor_time = [] + throttled_time = [] + + def check_agent_log_for_metrics() -> bool: + for record in AgentLog().read(): + match = re.search(r"% Processor Time\s*\[walinuxagent.service\]\s*=\s*([0-9.]+)", record.message) + if match is not None: + processor_time.append(float(match.group(1))) + else: + match = re.search(r"Throttled Time\s*\[walinuxagent.service\]\s*=\s*([0-9.]+)", record.message) + if match is not None: + throttled_time.append(float(match.group(1))) + if len(processor_time) < 1 or len(throttled_time) < 1: + return False + return True + + found: bool = retry_if_false(check_agent_log_for_metrics) + if found: + log.info("%% Processor Time: %s", processor_time) + log.info("Throttled Time: %s", throttled_time) + log.info("Successfully verified agent reported resource metrics") + else: + fail( + "The agent doesn't seem to be collecting % Processor Time and Throttled Time metrics. Agent found Processor Time: {0}, Throttled Time: {1}".format( + processor_time, throttled_time)) + + +def wait_for_log_message(message, timeout=datetime.timedelta(minutes=5)): + log.info("Checking agent's log for message matching [%s]", message) + start_time = datetime.datetime.now() + while datetime.datetime.now() - start_time <= timeout: + for record in AgentLog().read(): + match = re.search(message, record.message, flags=re.DOTALL) + if match is not None: + log.info("Found message:\n\t%s", record.text.replace("\n", "\n\t")) + return + time.sleep(30) + fail("The agent did not find [{0}] in its log within the allowed timeout".format(message)) + + +def verify_process_check_on_agent_cgroups(): + """ + This method checks agent detect unexpected processes in its cgroup and disables the CPUQuota + """ + log.info("***Verifying process check on agent cgroups") + log.info("Ensuring agent CPUQuota is enabled and backup the drop-in file to restore later in further tests") + if check_agent_quota_disabled(): + fail("The agent's CPUQuota is not enabled: {0}".format(get_agent_cpu_quota())) + quota_drop_in = os.path.join(systemd.get_agent_drop_in_path(), _DROP_IN_FILE_CPU_QUOTA) + quota_drop_in_backup = quota_drop_in + ".bk" + log.info("Backing up %s to %s...", quota_drop_in, quota_drop_in_backup) + shutil.copy(quota_drop_in, quota_drop_in_backup) + # + # Re-enable Process checks on cgroups and verify that the agent detects unexpected processes in its cgroup and disables the CPUQuota wehen + # that happens + # + shellutil.run_command(["update-waagent-conf", "Debug.CgroupDisableOnProcessCheckFailure=y"]) + + # The log message indicating the check failed is similar to + # 2021-03-29T23:33:15.603530Z INFO MonitorHandler ExtHandler Disabling resource usage monitoring. Reason: Check on cgroups failed: + # [CGroupsException] The agent's cgroup includes unexpected processes: ['[PID: 25826] python3\x00/home/nam/Compute-Runtime-Tux-Pipeline/dungeon_crawler/s'] + wait_for_log_message( + "Disabling resource usage monitoring. Reason: Check on cgroups failed:.+The agent's cgroup includes unexpected processes") + disabled: bool = retry_if_false(check_agent_quota_disabled) + if not disabled: + fail("The agent did not disable its CPUQuota: {0}".format(get_agent_cpu_quota())) + + +def verify_throttling_time_check_on_agent_cgroups(): + """ + This method checks agent disables its CPUQuota when it exceeds its throttling limit + """ + log.info("***Verifying CPU throttling check on agent cgroups") + # Now disable the check on unexpected processes and enable the check on throttledtime and verify that the agent disables its CPUQuota when it exceeds its throttling limit + log.info("Re-enabling CPUQuota...") + quota_drop_in = os.path.join(systemd.get_agent_drop_in_path(), _DROP_IN_FILE_CPU_QUOTA) + quota_drop_in_backup = quota_drop_in + ".bk" + log.info("Restoring %s from %s...", quota_drop_in, quota_drop_in_backup) + shutil.copy(quota_drop_in_backup, quota_drop_in) + shellutil.run_command(["systemctl", "daemon-reload"]) + shellutil.run_command(["update-waagent-conf", "Debug.CgroupDisableOnProcessCheckFailure=n", "Debug.CgroupDisableOnQuotaCheckFailure=y", "Debug.AgentCpuThrottledTimeThreshold=5"]) + + # The log message indicating the check failed is similar to + # 2021-04-01T20:47:55.892569Z INFO MonitorHandler ExtHandler Disabling resource usage monitoring. Reason: Check on cgroups failed: + # [CGroupsException] The agent has been throttled for 121.339916938 seconds + # + # After we need to wait for a little longer for the agent to update systemd: + # 2021-04-14T01:51:44.399860Z INFO MonitorHandler ExtHandler Executing systemctl daemon-reload... + # + wait_for_log_message( + "Disabling resource usage monitoring. Reason: Check on cgroups failed:.+The agent has been throttled", + timeout=datetime.timedelta(minutes=10)) + wait_for_log_message("Stopped tracking cgroup walinuxagent.service", timeout=datetime.timedelta(minutes=10)) + wait_for_log_message("Executing systemctl daemon-reload...", timeout=datetime.timedelta(minutes=5)) + disabled: bool = retry_if_false(check_agent_quota_disabled) + if not disabled: + fail("The agent did not disable its CPUQuota: {0}".format(get_agent_cpu_quota())) + + +def main(): + prepare_agent() + verify_agent_reported_metrics() + verify_process_check_on_agent_cgroups() + verify_throttling_time_check_on_agent_cgroups() + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/agent_cpu_quota-start_service.py b/tests_e2e/tests/scripts/agent_cpu_quota-start_service.py new file mode 100755 index 000000000..ba0f5abb2 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_cpu_quota-start_service.py @@ -0,0 +1,96 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script starts the actual agent and then launches an instance of the dummy process periodically to consume the CPU +# +import signal +import subprocess +import sys +import threading +import time +import traceback + +from azurelinuxagent.common import logger + + +class CpuConsumer(threading.Thread): + def __init__(self): + threading.Thread.__init__(self) + self._stopped = False + + def run(self): + threading.current_thread().setName("*Stress*") + + while not self._stopped: + try: + # Dummy operation(reads empty streams and drops) which creates load on the CPU + dd_command = ["dd", "if=/dev/zero", "of=/dev/null"] + logger.info("Starting dummy dd command: {0} to stress CPU", ' '.join(dd_command)) + subprocess.Popen(dd_command) + logger.info("dd command completed; sleeping...") + i = 0 + while i < 30 and not self._stopped: + time.sleep(1) + i += 1 + except Exception as exception: + logger.error("{0}:\n{1}", exception, traceback.format_exc()) + + def stop(self): + self._stopped = True + + +try: + threading.current_thread().setName("*StartService*") + logger.set_prefix("E2ETest") + logger.add_logger_appender(logger.AppenderType.FILE, logger.LogLevel.INFO, "/var/log/waagent.log") + + agent_command_line = sys.argv[1:] + + logger.info("Starting Agent: {0}", ' '.join(agent_command_line)) + agent_process = subprocess.Popen(agent_command_line) + + # sleep a little to give the agent a chance to initialize + time.sleep(15) + + cpu_consumer = CpuConsumer() + cpu_consumer.start() + + + def forward_signal(signum, _): + if signum == signal.SIGTERM: + logger.info("Stopping stress thread...") + cpu_consumer.stop() + logger.info("Forwarding signal {0} to Agent", signum) + agent_process.send_signal(signum) + + + signal.signal(signal.SIGTERM, forward_signal) + + agent_process.wait() + logger.info("Agent completed") + + cpu_consumer.stop() + cpu_consumer.join() + logger.info("Stress completed") + + logger.info("Exiting...") + sys.exit(agent_process.returncode) + +except Exception as exception: + logger.error("Unexpected error occurred while starting agent service : {0}", exception) + raise diff --git a/tests_e2e/tests/scripts/agent_ext_workflow-assert_operation_sequence.py b/tests_e2e/tests/scripts/agent_ext_workflow-assert_operation_sequence.py new file mode 100755 index 000000000..d01d27799 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_ext_workflow-assert_operation_sequence.py @@ -0,0 +1,183 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# The DcrTestExtension maintains an `operations-.log` for every operation that the agent executes on that +# extension. This script asserts that the operations sequence in the log file matches the expected operations given as +# input to this script. We do this to confirm that the agent executed the correct sequence of operations. +# +# Sample operations-.log file snippet - +# Date:2019-07-30T21:54:03Z; Operation:install; SeqNo:0 +# Date:2019-07-30T21:54:05Z; Operation:enable; SeqNo:0 +# Date:2019-07-30T21:54:37Z; Operation:enable; SeqNo:1 +# Date:2019-07-30T21:55:20Z; Operation:disable; SeqNo:1 +# Date:2019-07-30T21:55:22Z; Operation:uninstall; SeqNo:1 +# +import argparse +import os +import sys +import time +from datetime import datetime +from typing import Any, Dict, List + +DELIMITER = ";" +OPS_FILE_DIR = "/var/log/azure/Microsoft.Azure.TestExtensions.Edp.GuestAgentDcrTest/" +OPS_FILE_PATTERN = ["operations-%s.log", "%s/operations-%s.log"] +MAX_RETRY = 5 +SLEEP_TIMER = 30 + + +def parse_ops_log(ops_version: str, input_ops: List[str], start_time: str): + # input_ops are the expected operations that we expect to see in the operations log file + ver = (ops_version,) + ops_file_name = None + for file_pat in OPS_FILE_PATTERN: + ops_file_name = os.path.join(OPS_FILE_DIR, file_pat % ver) + if not os.path.exists(ops_file_name): + ver = ver + (ops_version,) + ops_file_name = None + continue + break + + if not ops_file_name: + raise IOError("Operations File %s not found" % os.path.join(OPS_FILE_DIR, OPS_FILE_PATTERN[0] % ops_version)) + + ops = [] + with open(ops_file_name, 'r') as ops_log: + # we get the last len(input_ops) from the log file and ensure they match with the input_ops + # Example of a line in the log file - `Date:2019-07-30T21:54:03Z; Operation:install; SeqNo:0` + content = ops_log.readlines()[-len(input_ops):] + for op_log in content: + data = op_log.split(DELIMITER) + date = datetime.strptime(data[0].split("Date:")[1], "%Y-%m-%dT%H:%M:%SZ") + op = data[1].split("Operation:")[1] + seq_no = data[2].split("SeqNo:")[1].strip('\n') + + # We only capture the operations that > start_time of the test + if start_time > date: + continue + + ops.append({'date': date, 'op': op, 'seq_no': seq_no}) + return ops + + +def assert_ops_in_sequence(actual_ops: List[Dict[str, Any]], expected_ops: List[str]): + exit_code = 0 + if len(actual_ops) != len(expected_ops): + print("Operation sequence length doesn't match, exit code 2") + exit_code = 2 + + last_date = datetime(70, 1, 1) + for idx, val in enumerate(actual_ops): + if exit_code != 0: + break + + if val['date'] < last_date or val['op'] != expected_ops[idx]: + print("Operation sequence doesn't match, exit code 2") + exit_code = 2 + + last_date = val['date'] + + return exit_code + + +def check_update_sequence(args): + # old_ops_file_name = OPS_FILE_PATTERN % args.old_version + # new_ops_file_name = OPS_FILE_PATTERN % args.new_version + + actual_ops = parse_ops_log(args.old_version, args.old_ops, args.start_time) + actual_ops.extend(parse_ops_log(args.new_version, args.new_ops, args.start_time)) + actual_ops = sorted(actual_ops, key=lambda op: op['date']) + + exit_code = assert_ops_in_sequence(actual_ops, args.ops) + + return exit_code, actual_ops + + +def check_operation_sequence(args): + # ops_file_name = OPS_FILE_PATTERN % args.version + + actual_ops = parse_ops_log(args.version, args.ops, args.start_time) + exit_code = assert_ops_in_sequence(actual_ops, args.ops) + + return exit_code, actual_ops + + +def main(): + # There are 2 main ways you can call this file - normal_ops_sequence or update_sequence + parser = argparse.ArgumentParser() + cmd_parsers = parser.add_subparsers(help="sub-command help", dest="command") + + # We use start_time to make sure we're testing the correct test and not some other test + parser.add_argument("--start-time", dest='start_time', required=True) + + # Normal_ops_sequence gets the version of the ext and parses the corresponding operations file to get the operation + # sequence that were run on the extension + normal_ops_sequence_parser = cmd_parsers.add_parser("normal_ops_sequence", help="Test the normal operation sequence") + normal_ops_sequence_parser.add_argument('--version', dest='version') + normal_ops_sequence_parser.add_argument('--ops', nargs='*', dest='ops', default=argparse.SUPPRESS) + + # Update_sequence mode is used to check for the update scenario. We get the expected old operations, expected + # new operations and the final operation list and verify if the expected operations match the actual ones + update_sequence_parser = cmd_parsers.add_parser("update_sequence", help="Test the update operation sequence") + update_sequence_parser.add_argument("--old-version", dest="old_version") + update_sequence_parser.add_argument("--new-version", dest="new_version") + update_sequence_parser.add_argument("--old-ver-ops", nargs="*", dest="old_ops", default=argparse.SUPPRESS) + update_sequence_parser.add_argument("--new-ver-ops", nargs="*", dest="new_ops", default=argparse.SUPPRESS) + update_sequence_parser.add_argument("--final-ops", nargs="*", dest="ops", default=argparse.SUPPRESS) + + args, unknown = parser.parse_known_args() + + if unknown or len(unknown) > 0: + # Print any unknown arguments passed to this script and fix them with low priority + print("[Low Proiority][To-Fix] Found unknown args: %s" % ', '.join(unknown)) + + args.start_time = datetime.strptime(args.start_time, "%Y-%m-%dT%H:%M:%SZ") + + exit_code = 999 + actual_ops = [] + + for i in range(0, MAX_RETRY): + if args.command == "update_sequence": + exit_code, actual_ops = check_update_sequence(args) + elif args.command == "normal_ops_sequence": + exit_code, actual_ops = check_operation_sequence(args) + else: + print("No such command %s, exit code 5\n" % args.command) + exit_code, actual_ops = 5, [] + break + + if exit_code == 0: + break + + print("{0} test failed with exit code: {1}; Retry attempt: {2}; Retrying in {3} secs".format(args.command, + exit_code, i, + SLEEP_TIMER)) + time.sleep(SLEEP_TIMER) + + if exit_code != 0: + print("Expected Operations: %s" % ", ".join(args.ops)) + print("Actual Operations: %s" % + ','.join(["[%s, Date: %s]" % (op['op'], op['date'].strftime("%Y-%m-%dT%H:%M:%SZ")) for op in actual_ops])) + + print("Assertion completed, exiting with code: %s" % exit_code) + sys.exit(exit_code) + + +if __name__ == "__main__": + print("Asserting operations\n") + main() diff --git a/tests_e2e/tests/scripts/agent_ext_workflow-check_data_in_agent_log.py b/tests_e2e/tests/scripts/agent_ext_workflow-check_data_in_agent_log.py new file mode 100755 index 000000000..867c9b67d --- /dev/null +++ b/tests_e2e/tests/scripts/agent_ext_workflow-check_data_in_agent_log.py @@ -0,0 +1,49 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Checks that the input data is found in the agent log +# +import argparse +import sys + +from pathlib import Path +from tests_e2e.tests.lib.agent_log import AgentLog + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument("--data", dest='data', required=True) + args, _ = parser.parse_known_args() + + print("Verifying data: {0} in waagent.log".format(args.data)) + found = False + + try: + found = AgentLog(Path('/var/log/waagent.log')).agent_log_contains(args.data) + if found: + print("Found data: {0} in agent log".format(args.data)) + else: + print("Did not find data: {0} in agent log".format(args.data)) + except Exception as e: + print("Error thrown when searching for test data in agent log: {0}".format(str(e))) + + sys.exit(0 if found else 1) + + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/tests_e2e/tests/scripts/agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py b/tests_e2e/tests/scripts/agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py new file mode 100755 index 000000000..7f328398b --- /dev/null +++ b/tests_e2e/tests/scripts/agent_ext_workflow-validate_no_lag_between_agent_start_and_gs_processing.py @@ -0,0 +1,117 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Asserts that goal state processing completed no more than 15 seconds after agent start +# +from datetime import timedelta +import re +import sys +import time + +from pathlib import Path +from tests_e2e.tests.lib.agent_log import AgentLog + + +def main(): + success = True + needs_retry = True + retry = 3 + + while retry >= 0 and needs_retry: + success = True + needs_retry = False + + agent_started_time = [] + agent_msg = [] + time_diff_max_secs = 15 + last_agent_log_timestamp = None + + # Example: Agent WALinuxAgent-2.2.47.2 is running as the goal state agent + agent_started_regex = r"Azure Linux Agent \(Goal State Agent version [0-9.]+\)" + gs_completed_regex = r"ProcessExtensionsGoalState completed\s\[(?P[a-z]+_\d+)\s(?P\d+)\sms\]" + + verified_atleast_one_log_line = False + verified_atleast_one_agent_started_log_line = False + verified_atleast_one_gs_complete_log_line = False + + agent_log = AgentLog(Path('/var/log/waagent.log')) + + try: + for agent_record in agent_log.read(): + last_agent_log_timestamp = agent_record.timestamp + verified_atleast_one_log_line = True + + agent_started = re.match(agent_started_regex, agent_record.message) + verified_atleast_one_agent_started_log_line = verified_atleast_one_agent_started_log_line or agent_started + if agent_started: + agent_started_time.append(agent_record.timestamp) + agent_msg.append(agent_record.text) + + gs_complete = re.match(gs_completed_regex, agent_record.message) + verified_atleast_one_gs_complete_log_line = verified_atleast_one_gs_complete_log_line or gs_complete + if agent_started_time and gs_complete: + duration = gs_complete.group('duration') + diff = agent_record.timestamp - agent_started_time.pop() + # Reduce the duration it took to complete the Goalstate, essentially we should only care about how long + # the agent took after start/restart to start processing GS + diff -= timedelta(milliseconds=int(duration)) + agent_msg_line = agent_msg.pop() + if diff.seconds > time_diff_max_secs: + success = False + print("Found delay between agent start and GoalState Processing > {0}secs: " + "Messages: \n {1} {2}".format(time_diff_max_secs, agent_msg_line, agent_record.text)) + + except IOError as e: + print("Unable to validate no lag time: {0}".format(str(e))) + + if not verified_atleast_one_log_line: + success = False + print("Didn't parse a single log line, ensure the log_parser is working fine and verify log regex") + + if not verified_atleast_one_agent_started_log_line: + success = False + print("Didn't parse a single agent started log line, ensure the Regex is working fine: {0}" + .format(agent_started_regex)) + + if not verified_atleast_one_gs_complete_log_line: + success = False + print("Didn't parse a single GS completed log line, ensure the Regex is working fine: {0}" + .format(gs_completed_regex)) + + if agent_started_time or agent_msg: + # If agent_started_time or agent_msg is not empty, there is a mismatch in the number of agent start messages + # and GoalState Processing messages + # If another check hasn't already failed, and the last parsed log is less than 15 seconds after the + # mismatched agent start log, we should retry after sleeping for 5s to give the agent time to finish + # GoalState processing + if success and last_agent_log_timestamp < (agent_started_time[-1] + timedelta(seconds=15)): + needs_retry = True + print("Sleeping for 5 seconds to allow goal state processing to complete...") + time.sleep(5) + else: + success = False + print("Mismatch between number of agent start messages and number of GoalState Processing messages\n " + "Agent Start Messages: \n {0}".format('\n'.join(agent_msg))) + + retry -= 1 + + sys.exit(0 if success else 1) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/agent_firewall-verify_all_firewall_rules.py b/tests_e2e/tests/scripts/agent_firewall-verify_all_firewall_rules.py new file mode 100755 index 000000000..2d165bc17 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_firewall-verify_all_firewall_rules.py @@ -0,0 +1,372 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script checks all agent firewall rules added properly and working as expected +# +import argparse +import os +import pwd +import socket +from typing import List + + +from azurelinuxagent.common.utils import shellutil +from azurelinuxagent.common.utils.textutil import format_exception +from tests_e2e.tests.lib.firewall_helpers import get_root_accept_rule_command, get_non_root_accept_rule_command, \ + get_non_root_drop_rule_command, print_current_iptable_rules, get_wireserver_ip, get_all_iptable_rule_commands, \ + check_if_iptable_rule_is_available, IPTableRules, verify_all_rules_exist, FIREWALL_PERIOD, execute_cmd +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +import http.client as httpclient + +from tests_e2e.tests.lib.retry import retry + +ROOT_USER = 'root' +VERSIONS_PATH = '/?comp=versions' + + +def switch_user(user: str) -> None: + """ + This function switches the function to a given user + """ + try: + uid = pwd.getpwnam(user)[2] + log.info("uid:%s and user name:%s", uid, user) + os.seteuid(uid) + except Exception as e: + raise Exception("Error -- failed to switch user to {0} : Failed with exception {1}".format(user, e)) + + +def verify_rules_deleted_successfully(commands: List[List[str]] = None) -> None: + """ + This function is used to verify if provided rule or all(if not specified) iptable rules are deleted successfully. + """ + log.info("-----Verifying requested rules deleted successfully") + + if commands is None: + commands = [] + + if not commands: + root_accept, non_root_accept, non_root_drop = get_all_iptable_rule_commands(IPTableRules.CHECK_COMMAND) + commands.extend([root_accept, non_root_accept, non_root_drop]) + + # "-C" return error code 1 when not available which is expected after deletion + for command in commands: + if not check_if_iptable_rule_is_available(command): + pass + else: + raise Exception("Deletion of ip table rules not successful\n.Current ip table rules:\n" + print_current_iptable_rules()) + + log.info("ip table rules deleted successfully \n %s", commands) + + +def delete_iptable_rules(commands: List[List[str]] = None) -> None: + """ + This function is used to delete the provided rule or all(if not specified) iptable rules + """ + if commands is None: + commands = [] + if not commands: + root_accept, non_root_accept, non_root_drop = get_all_iptable_rule_commands(IPTableRules.DELETE_COMMAND) + commands.extend([root_accept, non_root_accept, non_root_drop]) + + log.info("-----Deleting ip table rules \n %s", commands) + + try: + cmd = None + for command in commands: + cmd = command + retry(lambda: execute_cmd(cmd=cmd), attempts=3) + except Exception as e: + raise Exception("Error -- Failed to Delete the ip table rule set {0}".format(e)) + + log.info("Success --Deletion of ip table rule") + + +def verify_dns_tcp_to_wireserver_is_allowed(user: str) -> None: + """ + This function is used to verify if tcp to wireserver is allowed for the given user + """ + log.info("-----Verifying DNS tcp to wireserver is allowed") + switch_user(user) + try: + socket.create_connection((get_wireserver_ip(), 53), timeout=30) + except Exception as e: + raise Exception( + "Error -- while using DNS TCP request as user:({0}), make sure the firewall rules are set correctly {1}".format(user, + e)) + + log.info("Success -- can connect to wireserver port 53 using TCP as a user:(%s)", user) + + +def verify_dns_tcp_to_wireserver_is_blocked(user: str) -> None: + """ + This function is used to verify if tcp to wireserver is blocked for given user + """ + log.info("-----Verifying DNS tcp to wireserver is blocked") + switch_user(user) + try: + socket.create_connection((get_wireserver_ip(), 53), timeout=10) + raise Exception("Error -- unprivileged user:({0}) could connect to wireserver port 53 using TCP".format(user)) + except Exception as e: + # Expected timeout if unprivileged user reaches wireserver + if isinstance(e, socket.timeout): + log.info("Success -- unprivileged user:(%s) access to wireserver port 53 using TCP is blocked", user) + else: + raise Exception("Unexpected error while connecting to wireserver: {0}".format(format_exception(e))) + + +def verify_http_to_wireserver_blocked(user: str) -> None: + """ + This function is used to verify if http to wireserver is blocked for the given user + """ + log.info("-----Verifying http request to wireserver is blocked") + switch_user(user) + try: + client = httpclient.HTTPConnection(get_wireserver_ip(), timeout=10) + except Exception as e: + raise Exception("Error -- failed to create HTTP connection with user: {0} \n {1}".format(user, e)) + + try: + blocked = False + client.request('GET', VERSIONS_PATH) + except Exception as e: + # if we get timeout exception, it means the request is blocked + if isinstance(e, socket.timeout): + blocked = True + else: + raise Exception("Unexpected error while connecting to wireserver: {0}".format(format_exception(e))) + + if not blocked: + raise Exception("Error -- unprivileged user:({0}) could connect to wireserver, make sure the firewall rules are set correctly".format(user)) + + log.info("Success -- unprivileged user:(%s) access to wireserver is blocked", user) + + +def verify_http_to_wireserver_allowed(user: str) -> None: + """ + This function is used to verify if http to wireserver is allowed for the given user + """ + log.info("-----Verifying http request to wireserver is allowed") + switch_user(user) + try: + client = httpclient.HTTPConnection(get_wireserver_ip(), timeout=30) + except Exception as e: + raise Exception("Error -- failed to create HTTP connection with user:{0} \n {1}".format(user, e)) + + try: + client.request('GET', VERSIONS_PATH) + except Exception as e: + # if we get exception, it means the request is failed to connect + raise Exception("Error -- unprivileged user:({0}) access to wireserver failed:\n {1}".format(user, e)) + + log.info("Success -- privileged user:(%s) access to wireserver is allowed", user) + + +def verify_non_root_accept_rule(): + """ + This function verifies the non root accept rule and make sure it is re added by agent after deletion + """ + log.info("-----Verifying non root accept rule behavior") + log.info("Before deleting the non root accept rule , ensure a non root user can do a tcp to wireserver but cannot do a http request") + verify_dns_tcp_to_wireserver_is_allowed(NON_ROOT_USER) + verify_http_to_wireserver_blocked(NON_ROOT_USER) + + # switch to root user required to stop the agent + switch_user(ROOT_USER) + # stop the agent, so that it won't re-add rules while checking + log.info("Stop Guest Agent service") + # agent-service is script name and stop is argument + stop_agent = ["agent-service", "stop"] + shellutil.run_command(stop_agent) + + # deleting non root accept rule + non_root_accept_delete_cmd = get_non_root_accept_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([non_root_accept_delete_cmd]) + # verifying deletion successful + non_root_accept_check_cmd = get_non_root_accept_rule_command(IPTableRules.CHECK_COMMAND) + verify_rules_deleted_successfully([non_root_accept_check_cmd]) + + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + log.info("After deleting the non root accept rule , ensure a non root user cannot do a tcp to wireserver request") + verify_dns_tcp_to_wireserver_is_blocked(NON_ROOT_USER) + + switch_user(ROOT_USER) + # restart the agent to re-add the deleted rules + log.info("Restart Guest Agent service to re-add the deleted rules") + # agent-service is script name and start is argument + start_agent = ["agent-service", "start"] + shellutil.run_command(start_agent) + + verify_all_rules_exist() + log.info("** Current IP table rules \n") + print_current_iptable_rules() + + log.info("After appending the rule back , ensure a non root user can do a tcp to wireserver but cannot do a http request\n") + verify_dns_tcp_to_wireserver_is_allowed(NON_ROOT_USER) + verify_http_to_wireserver_blocked(NON_ROOT_USER) + + log.info("Ensuring missing rules are re-added by the running agent") + # deleting non root accept rule + non_root_accept_delete_cmd = get_non_root_accept_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([non_root_accept_delete_cmd]) + + verify_all_rules_exist() + log.info("** Current IP table rules \n") + print_current_iptable_rules() + + log.info("non root accept rule verified successfully\n") + + +def verify_root_accept_rule(): + """ + This function verifies the root accept rule and make sure it is re added by agent after deletion + """ + log.info("-----Verifying root accept rule behavior") + log.info("Before deleting the root accept rule , ensure a root user can do a http request but non root user cannot") + verify_http_to_wireserver_allowed(ROOT_USER) + verify_http_to_wireserver_blocked(NON_ROOT_USER) + + # switch to root user required to stop the agent + switch_user(ROOT_USER) + # stop the agent, so that it won't re-add rules while checking + log.info("Stop Guest Agent service") + # agent-service is script name and stop is argument + stop_agent = ["agent-service", "stop"] + shellutil.run_command(stop_agent) + + # deleting root accept rule + root_accept_delete_cmd = get_root_accept_rule_command(IPTableRules.DELETE_COMMAND) + # deleting drop rule too otherwise after restart, the daemon will go into loop since it cannot connect to wireserver. This would block the agent initialization + drop_delete_cmd = get_non_root_drop_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([root_accept_delete_cmd, drop_delete_cmd]) + # verifying deletion successful + root_accept_check_cmd = get_root_accept_rule_command(IPTableRules.CHECK_COMMAND) + drop_check_cmd = get_non_root_drop_rule_command(IPTableRules.CHECK_COMMAND) + verify_rules_deleted_successfully([root_accept_check_cmd, drop_check_cmd]) + + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + # restart the agent to re-add the deleted rules + log.info("Restart Guest Agent service to re-add the deleted rules") + # agent-service is script name and start is argument + start_agent = ["agent-service", "start"] + shellutil.run_command(start_agent) + + verify_all_rules_exist() + log.info("** Current IP table rules \n") + print_current_iptable_rules() + + log.info("After appending the rule back , ensure a root user can do a http request but non root user cannot") + verify_dns_tcp_to_wireserver_is_allowed(NON_ROOT_USER) + verify_http_to_wireserver_blocked(NON_ROOT_USER) + verify_http_to_wireserver_allowed(ROOT_USER) + + log.info("Ensuring missing rules are re-added by the running agent") + # deleting root accept rule + root_accept_delete_cmd = get_root_accept_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([root_accept_delete_cmd]) + + verify_all_rules_exist() + log.info("** Current IP table rules \n") + print_current_iptable_rules() + + log.info("root accept rule verified successfully\n") + + +def verify_non_root_drop_rule(): + """ + This function verifies drop rule and make sure it is re added by agent after deletion + """ + log.info("-----Verifying non root drop rule behavior") + # switch to root user required to stop the agent + switch_user(ROOT_USER) + # stop the agent, so that it won't re-add rules while checking + log.info("Stop Guest Agent service") + # agent-service is script name and stop is argument + stop_agent = ["agent-service", "stop"] + shellutil.run_command(stop_agent) + + # deleting non root delete rule + non_root_drop_delete_cmd = get_non_root_drop_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([non_root_drop_delete_cmd]) + # verifying deletion successful + non_root_drop_check_cmd = get_non_root_drop_rule_command(IPTableRules.CHECK_COMMAND) + verify_rules_deleted_successfully([non_root_drop_check_cmd]) + + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + log.info("After deleting the non root drop rule, ensure a non root user can do http request to wireserver") + verify_http_to_wireserver_allowed(NON_ROOT_USER) + + # restart the agent to re-add the deleted rules + log.info("Restart Guest Agent service to re-add the deleted rules") + # agent-service is script name and start is argument + start_agent = ["agent-service", "start"] + shellutil.run_command(start_agent) + + verify_all_rules_exist() + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + log.info("After appending the rule back , ensure a non root user can do a tcp to wireserver but cannot do a http request") + verify_dns_tcp_to_wireserver_is_allowed(NON_ROOT_USER) + verify_http_to_wireserver_blocked(NON_ROOT_USER) + verify_http_to_wireserver_allowed(ROOT_USER) + + log.info("Ensuring missing rules are re-added by the running agent") + # deleting non root delete rule + non_root_drop_delete_cmd = get_non_root_drop_rule_command(IPTableRules.DELETE_COMMAND) + delete_iptable_rules([non_root_drop_delete_cmd]) + + verify_all_rules_exist() + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + log.info("non root drop rule verified successfully\n") + + +def prepare_agent(): + log.info("Executing script update-waagent-conf to enable agent firewall config flag") + # Changing the firewall period from default 5 mins to 1 min, so that test won't wait for that long to verify rules + shellutil.run_command(["update-waagent-conf", "OS.EnableFirewall=y", f"OS.EnableFirewallPeriod={FIREWALL_PERIOD}"]) + log.info("Successfully enabled agent firewall config flag") + + +def main(): + prepare_agent() + log.info("** Current IP table rules\n") + print_current_iptable_rules() + + verify_all_rules_exist() + + verify_non_root_accept_rule() + verify_root_accept_rule() + verify_non_root_drop_rule() + + +parser = argparse.ArgumentParser() +parser.add_argument('-u', '--user', required=True, help="Non root user") +args = parser.parse_args() +NON_ROOT_USER = args.user +run_remote_test(main) + diff --git a/tests_e2e/tests/scripts/agent_persist_firewall-access_wireserver b/tests_e2e/tests/scripts/agent_persist_firewall-access_wireserver new file mode 100755 index 000000000..c38e0a570 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_persist_firewall-access_wireserver @@ -0,0 +1,85 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Helper script which tries to access Wireserver on system reboot. Also prints out iptable rules if non-root and still +# able to access Wireserver + +USER=$(whoami) +echo "$(date --utc +%FT%T.%3NZ): Running as user: $USER" + +function check_online +{ + ping 8.8.8.8 -c 1 -i .2 -t 30 > /dev/null 2>&1 && echo 0 || echo 1 +} + +# Check more, sleep less +MAX_CHECKS=10 +# Initial starting value for checks +CHECKS=0 +IS_ONLINE=$(check_online) + +# Loop while we're not online. +while [ "$IS_ONLINE" -eq 1 ]; do + + CHECKS=$((CHECKS + 1)) + if [ $CHECKS -gt $MAX_CHECKS ]; then + break + fi + + echo "$(date --utc +%FT%T.%3NZ): Network still not accessible" + # We're offline. Sleep for a bit, then check again + sleep 1; + IS_ONLINE=$(check_online) + +done + +if [ "$IS_ONLINE" -eq 1 ]; then + # We will never be able to get online. Kill script. + echo "Unable to connect to network, exiting now" + echo "ExitCode: 1" + exit 1 +fi + +echo "Finally online, Time: $(date --utc +%FT%T.%3NZ)" +echo "Trying to contact Wireserver as $USER to see if accessible" + +echo "" +echo "IPTables before accessing Wireserver" +sudo iptables -t security -L -nxv +echo "" + +WIRE_IP=$(cat /var/lib/waagent/WireServerEndpoint 2>/dev/null || echo '168.63.129.16' | tr -d '[:space:]') +if command -v wget >/dev/null 2>&1; then + wget --tries=3 "http://$WIRE_IP/?comp=versions" --timeout=5 -O "/tmp/wire-versions-$USER.xml" +else + curl --retry 3 --retry-delay 5 --connect-timeout 5 "http://$WIRE_IP/?comp=versions" -o "/tmp/wire-versions-$USER.xml" +fi +WIRE_EC=$? +echo "ExitCode: $WIRE_EC" + +if [[ "$USER" != "root" && "$WIRE_EC" == 0 ]]; then + echo "Wireserver should not be accessible for non-root user ($USER)" +fi + +if [[ "$USER" != "root" ]]; then +echo "" +echo "checking tcp traffic to wireserver port 53 for non-root user ($USER)" +echo -n 2>/dev/null < /dev/tcp/$WIRE_IP/53 && echo 0 || echo 1 # Establish network connection for port 53 +TCP_EC=$? +echo "TCP 53 Connection ExitCode: $TCP_EC" +fi \ No newline at end of file diff --git a/tests_e2e/tests/scripts/agent_persist_firewall-test_setup b/tests_e2e/tests/scripts/agent_persist_firewall-test_setup new file mode 100755 index 000000000..a157e58cb --- /dev/null +++ b/tests_e2e/tests/scripts/agent_persist_firewall-test_setup @@ -0,0 +1,30 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Script adds cron job on reboot to make sure iptables rules are added to allow access to Wireserver and also, enable the firewall config flag +# + +if [[ $# -ne 1 ]]; then + echo "Usage: agent_persist_firewall-test_setup " + exit 1 +fi + +echo "@reboot /home/$1/bin/agent_persist_firewall-access_wireserver > /tmp/reboot-cron-root.log 2>&1" | crontab -u root - +echo "@reboot /home/$1/bin/agent_persist_firewall-access_wireserver > /tmp/reboot-cron-$1.log 2>&1" | crontab -u $1 - +update-waagent-conf OS.EnableFirewall=y \ No newline at end of file diff --git a/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewall_rules_on_boot.py b/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewall_rules_on_boot.py new file mode 100755 index 000000000..549e368b2 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewall_rules_on_boot.py @@ -0,0 +1,176 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script checks firewall rules are set on boot through cron job logs.And also capture the logs for debugging purposes. +# +import argparse +import os +import re +import shutil + +from assertpy import fail + +from azurelinuxagent.common.utils import shellutil +from tests_e2e.tests.lib.firewall_helpers import verify_all_rules_exist +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry + + +def move_cron_logs_to_var_log(): + # Move the cron logs to /var/log + log.info("Moving cron logs to /var/log for debugging purposes") + for cron_log in [ROOT_CRON_LOG, NON_ROOT_CRON_LOG, NON_ROOT_WIRE_XML, ROOT_WIRE_XML]: + try: + shutil.move(src=cron_log, dst=os.path.join("/var", "log", + "{0}.{1}".format(os.path.basename(cron_log), + BOOT_NAME))) + except Exception as e: + log.info("Unable to move cron log to /var/log; {0}".format(e)) + + +def check_wireserver_versions_file_exist(wire_version_file): + log.info("Checking wire-versions file exist: {0}".format(wire_version_file)) + if not os.path.exists(wire_version_file): + log.info("File: {0} not found".format(wire_version_file)) + return False + + if os.stat(wire_version_file).st_size > 0: + return True + + return False + + +def verify_data_in_cron_logs(cron_log, verify, err_msg): + log.info("Verifying Cron logs") + + def cron_log_checks(): + + if not os.path.exists(cron_log): + raise Exception("Cron log file not found: {0}".format(cron_log)) + with open(cron_log) as f: + cron_logs_lines = list(map(lambda _: _.strip(), f.readlines())) + if not cron_logs_lines: + raise Exception("Empty cron file, looks like cronjob didnt run") + + if any("Unable to connect to network, exiting now" in line for line in cron_logs_lines): + raise Exception("VM was unable to connect to network on startup. Skipping test validation") + + if not any("ExitCode" in line for line in cron_logs_lines): + raise Exception("Cron logs still incomplete, will try again in a minute") + + if not any(verify(line) for line in cron_logs_lines): + fail("Verification failed! (UNEXPECTED): {0}".format(err_msg)) + + log.info("Verification succeeded. Cron logs as expected") + + retry(cron_log_checks) + + +def verify_wireserver_ip_reachable_for_root(): + """ + For root logs - + Ensure the /var/log/wire-versions-root.xml is not-empty (generated by the cron job) + Ensure the exit code in the /var/log/reboot-cron-root.log file is 0 + """ + log.info("Verifying Wireserver IP is reachable from root user") + + def check_exit_code(line): + match = re.match("ExitCode:\\s(\\d+)", line) + return match is not None and int(match.groups()[0]) == 0 + + verify_data_in_cron_logs(cron_log=ROOT_CRON_LOG, verify=check_exit_code, + err_msg="Exit Code should be 0 for root based cron job!") + + if not check_wireserver_versions_file_exist(ROOT_WIRE_XML): + fail("Wire version file should not be empty for root user!") + + +def verify_wireserver_ip_unreachable_for_non_root(): + """ + For non-root - + Ensure the /tmp/wire-versions-non-root.xml is empty (generated by the cron job) + Ensure the exit code in the /tmp/reboot-cron-non-root.log file is non-0 + """ + log.info("Verifying WireServer IP is unreachable from non-root user") + + def check_exit_code(line): + match = re.match("ExitCode:\\s(\\d+)", line) + return match is not None and int(match.groups()[0]) != 0 + + verify_data_in_cron_logs(cron_log=NON_ROOT_CRON_LOG, verify=check_exit_code, + err_msg="Exit Code should be non-0 for non-root cron job!") + + if check_wireserver_versions_file_exist(NON_ROOT_WIRE_XML): + fail("Wire version file should be empty for non-root user!") + + +def verify_tcp_connection_to_wireserver_for_non_root(): + """ + For non-root - + Ensure the TCP 53 Connection exit code in the /tmp/reboot-cron-non-root.log file is 0 + """ + log.info("Verifying TCP connection to Wireserver port for non-root user") + + def check_exit_code(line): + match = re.match("TCP 53 Connection ExitCode:\\s(\\d+)", line) + return match is not None and int(match.groups()[0]) == 0 + + verify_data_in_cron_logs(cron_log=NON_ROOT_CRON_LOG, verify=check_exit_code, + err_msg="TCP 53 Connection Exit Code should be 0 for non-root cron job!") + + +def generate_svg(): + """ + This is a good to have, but not must have. Not failing tests if we're unable to generate a SVG + """ + log.info("Running systemd-analyze plot command to get the svg for boot execution order") + dest_dir = os.path.join("/var", "log", "svgs") + if not os.path.exists(dest_dir): + os.makedirs(dest_dir) + svg_name = os.path.join(dest_dir, "{0}.svg".format(BOOT_NAME)) + cmd = ["systemd-analyze plot > {0}".format(svg_name)] + err_code, stdout = shellutil.run_get_output(cmd) + if err_code != 0: + log.info("Unable to generate svg: {0}".format(stdout)) + log.info("SVG generated successfully") + + +def main(): + try: + # Verify firewall rules are set on boot through cron job logs + verify_wireserver_ip_unreachable_for_non_root() + verify_wireserver_ip_reachable_for_root() + verify_tcp_connection_to_wireserver_for_non_root() + verify_all_rules_exist() + finally: + # save the logs to /var/log to capture by collect-logs, this might be useful for debugging + move_cron_logs_to_var_log() + generate_svg() + + +parser = argparse.ArgumentParser() +parser.add_argument('-u', '--user', required=True, help="Non root user") +parser.add_argument('-bn', '--boot_name', required=True, help="Boot Name") +args = parser.parse_args() +NON_ROOT_USER = args.user +BOOT_NAME = args.boot_name +ROOT_CRON_LOG = "/tmp/reboot-cron-root.log" +NON_ROOT_CRON_LOG = f"/tmp/reboot-cron-{NON_ROOT_USER}.log" +NON_ROOT_WIRE_XML = f"/tmp/wire-versions-{NON_ROOT_USER}.xml" +ROOT_WIRE_XML = "/tmp/wire-versions-root.xml" +main() diff --git a/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewalld_rules_readded.py b/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewalld_rules_readded.py new file mode 100755 index 000000000..5cec654a1 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_persist_firewall-verify_firewalld_rules_readded.py @@ -0,0 +1,170 @@ +#!/usr/bin/env pypy3 +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script deleting the firewalld rules and ensure deleted rules added back to the firewalld rule set after agent start +# + +from azurelinuxagent.common.osutil import get_osutil +from tests_e2e.tests.lib.firewall_helpers import firewalld_service_running, print_current_firewalld_rules, \ + get_non_root_accept_tcp_firewalld_rule, get_all_firewalld_rule_commands, FirewalldRules, execute_cmd, \ + check_if_firewalld_rule_is_available, verify_all_firewalld_rules_exist, get_root_accept_firewalld_rule, \ + get_non_root_drop_firewalld_rule +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry + + +def delete_firewalld_rules(commands=None): + """ + This function is used to delete the provided rule or all(if not specified) from the firewalld rules + """ + if commands is None: + commands = [] + if not commands: + root_accept, non_root_accept, non_root_drop = get_all_firewalld_rule_commands(FirewalldRules.REMOVE_PASSTHROUGH) + commands.extend([root_accept, non_root_accept, non_root_drop]) + + log.info("Deleting firewalld rules \n %s", commands) + + try: + cmd = None + for command in commands: + cmd = command + retry(lambda: execute_cmd(cmd=cmd), attempts=3) + except Exception as e: + raise Exception("Error -- Failed to Delete the firewalld rule set {0}".format(e)) + + log.info("Success --Deletion of firewalld rule") + + +def verify_rules_deleted_successfully(commands=None): + """ + This function is used to verify if provided rule or all(if not specified) rules are deleted successfully. + """ + log.info("Verifying requested rules deleted successfully") + + if commands is None: + commands = [] + + if not commands: + root_accept, non_root_accept, non_root_drop = get_all_firewalld_rule_commands(FirewalldRules.QUERY_PASSTHROUGH) + commands.extend([root_accept, non_root_accept, non_root_drop]) + + # "--QUERY-PASSTHROUGH" return error code 1 when not available which is expected after deletion + for command in commands: + if not check_if_firewalld_rule_is_available(command): + pass + else: + raise Exception("Deletion of firewalld rules not successful\n.Current firewalld rules:\n" + print_current_firewalld_rules()) + + log.info("firewalld rules deleted successfully \n %s", commands) + + +def verify_non_root_accept_rule(): + """ + This function verifies the non root accept rule and make sure it is re added by agent after deletion + """ + log.info("verifying non root accept rule") + agent_name = get_osutil().get_service_name() + # stop the agent, so that it won't re-add rules while checking + log.info("stop the agent, so that it won't re-add rules while checking") + cmd = ["systemctl", "stop", agent_name] + execute_cmd(cmd) + + # deleting tcp rule + accept_tcp_rule_with_delete = get_non_root_accept_tcp_firewalld_rule(FirewalldRules.REMOVE_PASSTHROUGH) + delete_firewalld_rules([accept_tcp_rule_with_delete]) + + # verifying deletion successful + accept_tcp_rule_with_check = get_non_root_accept_tcp_firewalld_rule(FirewalldRules.QUERY_PASSTHROUGH) + verify_rules_deleted_successfully([accept_tcp_rule_with_check]) + + # restart the agent to re-add the deleted rules + log.info("restart the agent to re-add the deleted rules") + cmd = ["systemctl", "restart", agent_name] + execute_cmd(cmd=cmd) + + verify_all_firewalld_rules_exist() + + +def verify_root_accept_rule(): + """ + This function verifies the root accept rule and make sure it is re added by agent after deletion + """ + log.info("Verifying root accept rule") + agent_name = get_osutil().get_service_name() + # stop the agent, so that it won't re-add rules while checking + log.info("stop the agent, so that it won't re-add rules while checking") + cmd = ["systemctl", "stop", agent_name] + execute_cmd(cmd) + + # deleting root accept rule + root_accept_rule_with_delete = get_root_accept_firewalld_rule(FirewalldRules.REMOVE_PASSTHROUGH) + delete_firewalld_rules([root_accept_rule_with_delete]) + + # verifying deletion successful + root_accept_rule_with_check = get_root_accept_firewalld_rule(FirewalldRules.QUERY_PASSTHROUGH) + verify_rules_deleted_successfully([root_accept_rule_with_check]) + + # restart the agent to re-add the deleted rules + log.info("restart the agent to re-add the deleted rules") + cmd = ["systemctl", "restart", agent_name] + execute_cmd(cmd=cmd) + + verify_all_firewalld_rules_exist() + + +def verify_non_root_drop_rule(): + """ + This function verifies drop rule and make sure it is re added by agent after deletion + """ + log.info("Verifying non root drop rule") + agent_name = get_osutil().get_service_name() + # stop the agent, so that it won't re-add rules while checking + log.info("stop the agent, so that it won't re-add rules while checking") + cmd = ["systemctl", "stop", agent_name] + execute_cmd(cmd) + + # deleting non-root drop rule + non_root_drop_with_delete = get_non_root_drop_firewalld_rule(FirewalldRules.REMOVE_PASSTHROUGH) + delete_firewalld_rules([non_root_drop_with_delete]) + + # verifying deletion successful + non_root_drop_with_check = get_non_root_drop_firewalld_rule(FirewalldRules.QUERY_PASSTHROUGH) + verify_rules_deleted_successfully([non_root_drop_with_check]) + + # restart the agent to re-add the deleted rules + log.info("restart the agent to re-add the deleted rules") + cmd = ["systemctl", "restart", agent_name] + execute_cmd(cmd=cmd) + + verify_all_firewalld_rules_exist() + + +def main(): + + if firewalld_service_running(): + log.info("Displaying current firewalld rules") + print_current_firewalld_rules() + verify_non_root_accept_rule() + verify_root_accept_rule() + verify_non_root_drop_rule() + else: + log.info("firewalld.service is not running and skipping test") + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/agent_persist_firewall-verify_persist_firewall_service_running.py b/tests_e2e/tests/scripts/agent_persist_firewall-verify_persist_firewall_service_running.py new file mode 100755 index 000000000..87e1e29e1 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_persist_firewall-verify_persist_firewall_service_running.py @@ -0,0 +1,70 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script verifies firewalld rules set on the vm if firewalld service is running and if it's not running, it verifies network-setup service is enabled by the agent +# +from assertpy import fail + +from azurelinuxagent.common.osutil import get_osutil +from azurelinuxagent.common.utils import shellutil +from tests_e2e.tests.lib.firewall_helpers import execute_cmd_return_err_code, \ + firewalld_service_running, verify_all_firewalld_rules_exist +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry_if_false + + +def verify_network_setup_service_enabled(): + """ + Checks if network-setup service is enabled in the vm + """ + agent_name = get_osutil().get_service_name() + service_name = "{0}-network-setup.service".format(agent_name) + cmd = ["systemctl", "is-enabled", service_name] + + def op(cmd): + exit_code, output = execute_cmd_return_err_code(cmd) + return exit_code == 0 and output.rstrip() == "enabled" + + try: + status = retry_if_false(lambda: op(cmd), attempts=5, delay=30) + except Exception as e: + log.warning("Error -- while checking network.service is-enabled status {0}".format(e)) + status = False + if not status: + cmd = ["systemctl", "status", service_name] + fail("network-setup.service is not enabled!. Current status: {0}".format(shellutil.run_command(cmd))) + + log.info("network-setup.service is enabled") + + +def verify_firewall_service_running(): + log.info("Ensure test agent initialize the firewalld/network service setup") + + # Check if firewall active on the Vm + log.info("Checking if firewall service is active on the VM") + if firewalld_service_running(): + # Checking if firewalld rules are present in rule set if firewall service is active + verify_all_firewalld_rules_exist() + else: + # Checking if network-setup service is enabled if firewall service is not active + log.info("Checking if network-setup service is enabled by the agent since firewall service is not active") + verify_network_setup_service_enabled() + + +if __name__ == "__main__": + verify_firewall_service_running() diff --git a/tests_e2e/tests/scripts/agent_publish-check_update.py b/tests_e2e/tests/scripts/agent_publish-check_update.py new file mode 100755 index 000000000..38ae00a90 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_publish-check_update.py @@ -0,0 +1,111 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +import re + +from assertpy import fail + +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +from tests_e2e.tests.lib.retry import retry_if_false + + +# pylint: disable=W0105 +""" +Post the _LOG_PATTERN_00 changes, the last group sometimes might not have the 'Agent' part at the start of the sentence; thus making it optional. + +> WALinuxAgent-2.2.18 discovered WALinuxAgent-2.2.47 as an update and will exit +(None, 'WALinuxAgent-2.2.18', '2.2.47') +""" +_UPDATE_PATTERN_00 = re.compile(r'(.*Agent\s)?(\S*)\sdiscovered\sWALinuxAgent-(\S*)\sas an update and will exit') + +""" +> Agent WALinuxAgent-2.2.45 discovered update WALinuxAgent-2.2.47 -- exiting +('Agent', 'WALinuxAgent-2.2.45', '2.2.47') +""" +_UPDATE_PATTERN_01 = re.compile(r'(.*Agent)?\s(\S*) discovered update WALinuxAgent-(\S*) -- exiting') + +""" +> Normal Agent upgrade discovered, updating to WALinuxAgent-2.9.1.0 -- exiting +('Normal Agent', WALinuxAgent, '2.9.1.0 ') +""" +_UPDATE_PATTERN_02 = re.compile(r'(.*Agent) upgrade discovered, updating to (WALinuxAgent)-(\S*) -- exiting') + +""" +> Agent update found, exiting current process to downgrade to the new Agent version 1.3.0.0 +(Agent, 'downgrade', '1.3.0.0') +""" +_UPDATE_PATTERN_03 = re.compile(r'(.*Agent) update found, exiting current process to (\S*) to the new Agent version (\S*)') + +""" +> Agent WALinuxAgent-2.2.47 is running as the goal state agent +('2.2.47',) +""" +_RUNNING_PATTERN_00 = re.compile(r'.*Agent\sWALinuxAgent-(\S*)\sis running as the goal state agent') + + +def verify_agent_update_from_log(): + + exit_code = 0 + detected_update = False + update_successful = False + update_version = '' + + agentlog = AgentLog() + + for record in agentlog.read(): + if 'TelemetryData' in record.text: + continue + + for p in [_UPDATE_PATTERN_00, _UPDATE_PATTERN_01, _UPDATE_PATTERN_02, _UPDATE_PATTERN_03]: + update_match = re.match(p, record.text) + if update_match: + detected_update = True + update_version = update_match.groups()[2] + log.info('found the agent update log: %s', record.text) + break + + if detected_update: + running_match = re.match(_RUNNING_PATTERN_00, record.text) + if running_match and update_version == running_match.groups()[0]: + update_successful = True + log.info('found the agent started new version log: %s', record.text) + + if detected_update: + log.info('update was detected: %s', update_version) + if update_successful: + log.info('update was successful') + else: + log.warning('update was not successful') + exit_code = 1 + else: + log.warning('update was not detected') + exit_code = 1 + + return exit_code == 0 + + +# This method will trace agent update messages in the agent log and determine if the update was successful or not. +def main(): + found: bool = retry_if_false(verify_agent_update_from_log) + if not found: + fail('update was not found in the logs') + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/agent_publish-get_agent_log_record_timestamp.py b/tests_e2e/tests/scripts/agent_publish-get_agent_log_record_timestamp.py new file mode 100755 index 000000000..d055fc6c2 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_publish-get_agent_log_record_timestamp.py @@ -0,0 +1,75 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import re +from datetime import datetime + +from tests_e2e.tests.lib.agent_log import AgentLog + +# pylint: disable=W0105 +""" +> WALinuxAgent-2.2.18 discovered WALinuxAgent-2.2.47 as an update and will exit +(None, 'WALinuxAgent-2.2.18', '2.2.47') +""" +_UPDATE_PATTERN_00 = re.compile(r'(.*Agent\s)?(\S*)\sdiscovered\sWALinuxAgent-(\S*)\sas an update and will exit') + +""" +> Agent WALinuxAgent-2.2.45 discovered update WALinuxAgent-2.2.47 -- exiting +('Agent', 'WALinuxAgent-2.2.45', '2.2.47') +""" +_UPDATE_PATTERN_01 = re.compile(r'(.*Agent)?\s(\S*) discovered update WALinuxAgent-(\S*) -- exiting') + +""" +> Normal Agent upgrade discovered, updating to WALinuxAgent-2.9.1.0 -- exiting +('Normal Agent', WALinuxAgent, '2.9.1.0 ') +""" +_UPDATE_PATTERN_02 = re.compile(r'(.*Agent) upgrade discovered, updating to (WALinuxAgent)-(\S*) -- exiting') + +""" +> Agent update found, exiting current process to downgrade to the new Agent version 1.3.0.0 +(Agent, 'downgrade', '1.3.0.0') +""" +_UPDATE_PATTERN_03 = re.compile( + r'(.*Agent) update found, exiting current process to (\S*) to the new Agent version (\S*)') + + +""" +This script return timestamp of update message in the agent log +""" + + +def main(): + try: + agentlog = AgentLog() + + for record in agentlog.read(): + + for p in [_UPDATE_PATTERN_00, _UPDATE_PATTERN_01, _UPDATE_PATTERN_02, _UPDATE_PATTERN_03]: + update_match = re.match(p, record.text) + if update_match: + return record.timestamp + + return datetime.min + except Exception as e: + raise Exception("Error thrown when searching for update pattern in agent log to get record timestamp: {0}".format(str(e))) + + +if __name__ == "__main__": + timestamp = main() + print(timestamp) diff --git a/tests_e2e/tests/scripts/agent_status-get_last_gs_processed.py b/tests_e2e/tests/scripts/agent_status-get_last_gs_processed.py new file mode 100755 index 000000000..8bbe598f1 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_status-get_last_gs_processed.py @@ -0,0 +1,47 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Writes the last goal state processed line in the log to stdout +# +import re +import sys + +from tests_e2e.tests.lib.agent_log import AgentLog + + +def main(): + gs_completed_regex = r"ProcessExtensionsGoalState completed\s\[[a-z_\d]{13,14}\s\d+\sms\]" + last_gs_processed = None + agent_log = AgentLog() + + try: + for agent_record in agent_log.read(): + gs_complete = re.match(gs_completed_regex, agent_record.message) + + if gs_complete is not None: + last_gs_processed = agent_record.text + + except IOError as e: + print("Unable to get last goal state processed: {0}".format(str(e))) + + print(last_gs_processed) + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/agent_update-modify_agent_version b/tests_e2e/tests/scripts/agent_update-modify_agent_version new file mode 100755 index 000000000..68cb017d5 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-modify_agent_version @@ -0,0 +1,35 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script to update necessary flags to make agent ready for rsm updates +# +set -euo pipefail + + +if [[ $# -ne 1 ]]; then + echo "Usage: agent_update-modify_agent_version " + exit 1 +fi + +version=$1 +PYTHON=$(get-agent-python) +echo "Agent's Python: $PYTHON" +# some distros return .pyc byte file instead source file .py. So, I retrieve parent directory first. +version_file_dir=$($PYTHON -c 'import azurelinuxagent.common.version as v; import os; print(os.path.dirname(v.__file__))') +version_file_full_path="$version_file_dir/version.py" +sed -E -i "s/^AGENT_VERSION\s+=\s+'[0-9.]+'/AGENT_VERSION = '$version'/" $version_file_full_path \ No newline at end of file diff --git a/tests_e2e/tests/scripts/agent_update-self_update_check.py b/tests_e2e/tests/scripts/agent_update-self_update_check.py new file mode 100755 index 000000000..b205c94ab --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-self_update_check.py @@ -0,0 +1,62 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Script verifies agent update was done by test agent +# +import argparse +import re + +from assertpy import fail + +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.retry import retry_if_false + + +#2023-12-28T04:34:23.535652Z INFO ExtHandler ExtHandler Current Agent 2.8.9.9 completed all update checks, exiting current process to upgrade to the new Agent version 2.10.0.7 +_UPDATE_PATTERN = re.compile(r'Current Agent (\S*) completed all update checks, exiting current process to upgrade to the new Agent version (\S*)') + + +def verify_agent_update_from_log(latest_version, current_version) -> bool: + """ + Checks if the agent updated to the latest version from current version + """ + agentlog = AgentLog() + + for record in agentlog.read(): + update_match = re.match(_UPDATE_PATTERN, record.message) + if update_match: + log.info('found the agent update log: %s', record.text) + if update_match.groups()[0] == current_version and update_match.groups()[1] == latest_version: + return True + return False + + +def main() -> None: + parser = argparse.ArgumentParser() + parser.add_argument('-l', '--latest-version', required=True) + parser.add_argument('-c', '--current-version', required=True) + args = parser.parse_args() + + found: bool = retry_if_false(lambda: verify_agent_update_from_log(args.latest_version, args.current_version)) + if not found: + fail('agent update was not found in the logs for latest version {0} from current version {1}'.format(args.latest_version, args.current_version)) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/agent_update-self_update_latest_version.py b/tests_e2e/tests/scripts/agent_update-self_update_latest_version.py new file mode 100755 index 000000000..4be0f0dc3 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-self_update_latest_version.py @@ -0,0 +1,69 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# returns the agent latest version published +# + +from azurelinuxagent.common.protocol.goal_state import GoalStateProperties +from azurelinuxagent.common.protocol.util import get_protocol_util +from azurelinuxagent.common.utils.flexible_version import FlexibleVersion +from tests_e2e.tests.lib.retry import retry + + +def get_agent_family_manifest(goal_state): + """ + Get the agent_family from last GS for Test Family + """ + agent_families = goal_state.extensions_goal_state.agent_families + agent_family_manifests = [] + for m in agent_families: + if m.name == 'Test': + if len(m.uris) > 0: + agent_family_manifests.append(m) + return agent_family_manifests[0] + + +def get_largest_version(agent_manifest): + """ + Get the largest version from the agent manifest + """ + largest_version = FlexibleVersion("0.0.0.0") + for pkg in agent_manifest.pkg_list.versions: + pkg_version = FlexibleVersion(pkg.version) + if pkg_version > largest_version: + largest_version = pkg_version + return largest_version + + +def main(): + + try: + protocol = get_protocol_util().get_protocol(init_goal_state=False) + retry(lambda: protocol.client.reset_goal_state( + goal_state_properties=GoalStateProperties.ExtensionsGoalState)) + goal_state = protocol.client.get_goal_state() + agent_family = get_agent_family_manifest(goal_state) + agent_manifest = goal_state.fetch_agent_manifest(agent_family.name, agent_family.uris) + largest_version = get_largest_version(agent_manifest) + print(str(largest_version)) + except Exception as e: + raise Exception("Unable to verify agent updated to latest version since test failed to get the which is the latest version from the agent manifest: {0}".format(e)) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/agent_update-self_update_test_setup b/tests_e2e/tests/scripts/agent_update-self_update_test_setup new file mode 100755 index 000000000..512beb322 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-self_update_test_setup @@ -0,0 +1,74 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# This script prepares the new agent and install it on the vm +# + +set -euo pipefail + +usage() ( + echo "Usage: agent_update-self_update_test_setup -p|--package -v|--version -u|--update_to_latest_version " + exit 1 +) + +while [[ $# -gt 0 ]]; do + case $1 in + -p|--package) + shift + if [ "$#" -lt 1 ]; then + usage + fi + package=$1 + shift + ;; + -v|--version) + shift + if [ "$#" -lt 1 ]; then + usage + fi + version=$1 + shift + ;; + -u|--update_to_latest_version) + shift + if [ "$#" -lt 1 ]; then + usage + fi + update_to_latest_version=$1 + shift + ;; + *) + usage + esac +done +if [ "$#" -ne 0 ] || [ -z ${package+x} ] || [ -z ${version+x} ]; then + usage +fi + +echo "updating the related to self-update flags" +update-waagent-conf AutoUpdate.UpdateToLatestVersion=$update_to_latest_version Debug.EnableGAVersioning=n Debug.SelfUpdateHotfixFrequency=120 Debug.SelfUpdateRegularFrequency=120 Autoupdate.Frequency=120 +agent-service stop +mv /var/log/waagent.log /var/log/waagent.$(date --iso-8601=seconds).log + +echo "Cleaning up the existing agents" +rm -rf /var/lib/waagent/WALinuxAgent-* + +echo "Installing $package as version $version..." +unzip.py $package /var/lib/waagent/WALinuxAgent-$version +agent-service restart + diff --git a/tests_e2e/tests/scripts/agent_update-verify_agent_reported_update_status.py b/tests_e2e/tests/scripts/agent_update-verify_agent_reported_update_status.py new file mode 100755 index 000000000..6f1247861 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-verify_agent_reported_update_status.py @@ -0,0 +1,61 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Verify if the agent reported update status to CRP via status file +# +import argparse +import glob +import json + +from assertpy import fail + +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +from tests_e2e.tests.lib.retry import retry_if_false + + +def check_agent_reported_update_status(expected_version: str) -> bool: + agent_status_file = "/var/lib/waagent/history/*/waagent_status.json" + file_paths = glob.glob(agent_status_file, recursive=True) + for file in file_paths: + with open(file, 'r') as f: + data = json.load(f) + log.info("Agent status file is %s and it's content %s", file, data) + guest_agent_status = data["aggregateStatus"]["guestAgentStatus"] + if "updateStatus" in guest_agent_status.keys(): + if guest_agent_status["updateStatus"]["expectedVersion"] == expected_version: + log.info("we found the expected version %s in agent status file", expected_version) + return True + log.info("we did not find the expected version %s in agent status file", expected_version) + return False + + +def main(): + + parser = argparse.ArgumentParser() + parser.add_argument('-v', '--version', required=True) + args = parser.parse_args() + + log.info("checking agent status file to verify if agent reported update status") + found: bool = retry_if_false(lambda: check_agent_reported_update_status(args.version)) + if not found: + fail("Agent failed to report update status, so skipping rest of the agent update validations") + + +run_remote_test(main) + diff --git a/tests_e2e/tests/scripts/agent_update-verify_versioning_supported_feature.py b/tests_e2e/tests/scripts/agent_update-verify_versioning_supported_feature.py new file mode 100755 index 000000000..d876033b6 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-verify_versioning_supported_feature.py @@ -0,0 +1,54 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Verify if the agent reported supportedfeature VersioningGovernance flag to CRP via status file +# +import glob +import json + +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +from tests_e2e.tests.lib.retry import retry_if_false + + +def check_agent_supports_versioning() -> bool: + agent_status_file = "/var/lib/waagent/history/*/waagent_status.json" + file_paths = glob.glob(agent_status_file, recursive=True) + for file in file_paths: + with open(file, 'r') as f: + data = json.load(f) + log.info("Agent status file is %s and it's content %s", file, data) + supported_features = data["supportedFeatures"] + for supported_feature in supported_features: + if supported_feature["Key"] == "VersioningGovernance": + return True + return False + + +def main(): + log.info("checking agent status file for VersioningGovernance supported feature flag") + found: bool = retry_if_false(check_agent_supports_versioning) + if not found: + raise Exception("Agent failed to report supported feature flag. So, skipping agent update validations " + "since CRP will not send RSM requested version in GS if feature flag not found in status") + + +run_remote_test(main) + + + diff --git a/tests_e2e/tests/scripts/agent_update-wait_for_rsm_gs.py b/tests_e2e/tests/scripts/agent_update-wait_for_rsm_gs.py new file mode 100755 index 000000000..832e0fd64 --- /dev/null +++ b/tests_e2e/tests/scripts/agent_update-wait_for_rsm_gs.py @@ -0,0 +1,71 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Verify the latest goal state included rsm requested version and if not, retry +# +import argparse + +from azurelinuxagent.common.protocol.util import get_protocol_util +from azurelinuxagent.common.protocol.goal_state import GoalState, GoalStateProperties +from azurelinuxagent.common.protocol.wire import WireProtocol +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test +from tests_e2e.tests.lib.retry import retry_if_false, retry + + +def get_requested_version(gs: GoalState) -> str: + agent_families = gs.extensions_goal_state.agent_families + agent_family_manifests = [m for m in agent_families if m.name == "Test" and len(m.uris) > 0] + if len(agent_family_manifests) == 0: + raise Exception( + u"No manifest links found for agent family Test, skipping agent update verification") + manifest = agent_family_manifests[0] + if manifest.version is not None: + return str(manifest.version) + return "" + + +def verify_rsm_requested_version(wire_protocol: WireProtocol, expected_version: str) -> bool: + log.info("fetching the goal state to check if it includes rsm requested version") + wire_protocol.client.update_goal_state() + goal_state = wire_protocol.client.get_goal_state() + requested_version = get_requested_version(goal_state) + if requested_version == expected_version: + return True + else: + return False + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument('-v', '--version', required=True) + args = parser.parse_args() + + protocol = get_protocol_util().get_protocol(init_goal_state=False) + retry(lambda: protocol.client.reset_goal_state( + goal_state_properties=GoalStateProperties.ExtensionsGoalState)) + + found: bool = retry_if_false(lambda: verify_rsm_requested_version(protocol, args.version)) + + if not found: + raise Exception("The latest goal state didn't contain requested version after we submit the rsm request for: {0}.".format(args.version)) + else: + log.info("Successfully verified that latest GS contains rsm requested version : %s", args.version) + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/ext_cgroups-check_cgroups_extensions.py b/tests_e2e/tests/scripts/ext_cgroups-check_cgroups_extensions.py new file mode 100755 index 000000000..48bd3f902 --- /dev/null +++ b/tests_e2e/tests/scripts/ext_cgroups-check_cgroups_extensions.py @@ -0,0 +1,224 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os + +from assertpy import fail + +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.cgroup_helpers import verify_if_distro_supports_cgroup, \ + verify_agent_cgroup_assigned_correctly, BASE_CGROUP, EXT_CONTROLLERS, get_unit_cgroup_mount_path, \ + GATESTEXT_SERVICE, AZUREMONITORAGENT_SERVICE, MDSD_SERVICE, check_agent_quota_disabled, \ + check_cgroup_disabled_with_unknown_process, CGROUP_TRACKED_PATTERN, AZUREMONITOREXT_FULL_NAME, GATESTEXT_FULL_NAME, \ + print_cgroups +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test + + +def verify_custom_script_cgroup_assigned_correctly(): + """ + This method verifies that the CSE script is created expected folder after install and also checks if CSE ran under the expected cgroups + """ + log.info("===== Verifying custom script was assigned to the correct cgroups") + + # CSE creates this folder to save the output of cgroup information where the CSE script was executed. Since CSE process exits after execution, + # and cgroup paths gets cleaned up by the system, so this information saved at run time when the extension executed. + check_temporary_folder_exists() + + cpu_mounted = False + memory_mounted = False + + log.info("custom script cgroup mounts:") + + with open('/var/lib/waagent/tmp/custom_script_check') as fh: + controllers = fh.read() + log.info("%s", controllers) + + extension_path = "/azure.slice/azure-vmextensions.slice/azure-vmextensions-Microsoft.Azure.Extensions.CustomScript" + + correct_cpu_mount_v1 = "cpu,cpuacct:{0}".format(extension_path) + correct_cpu_mount_v2 = "cpuacct,cpu:{0}".format(extension_path) + + correct_memory_mount = "memory:{0}".format(extension_path) + + for mounted_controller in controllers.split("\n"): + if correct_cpu_mount_v1 in mounted_controller or correct_cpu_mount_v2 in mounted_controller: + log.info('Custom script extension mounted under correct cgroup ' + 'for CPU: %s', mounted_controller) + cpu_mounted = True + elif correct_memory_mount in mounted_controller: + log.info('Custom script extension mounted under correct cgroup ' + 'for Memory: %s', mounted_controller) + memory_mounted = True + + if not cpu_mounted: + fail('Custom script not mounted correctly for CPU! Expected {0} or {1}'.format(correct_cpu_mount_v1, correct_cpu_mount_v2)) + + if not memory_mounted: + fail('Custom script not mounted correctly for Memory! Expected {0}'.format(correct_memory_mount)) + + +def check_temporary_folder_exists(): + tmp_folder = "/var/lib/waagent/tmp" + if not os.path.exists(tmp_folder): + fail("Temporary folder {0} was not created which means CSE script did not run!".format(tmp_folder)) + + +def verify_ext_cgroup_controllers_created_on_file_system(): + """ + This method ensure that extension cgroup controllers are created on file system after extension install + """ + log.info("===== Verifying ext cgroup controllers exist on file system") + + all_controllers_present = os.path.exists(BASE_CGROUP) + missing_controllers_path = [] + verified_controllers_path = [] + + for controller in EXT_CONTROLLERS: + controller_path = os.path.join(BASE_CGROUP, controller) + if not os.path.exists(controller_path): + all_controllers_present = False + missing_controllers_path.append(controller_path) + else: + verified_controllers_path.append(controller_path) + + if not all_controllers_present: + fail('Expected all of the extension controller: {0} paths present in the file system after extension install. But missing cgroups paths are :{1}\n' + 'and verified cgroup paths are: {2} \nSystem mounted cgroups are \n{3}'.format(EXT_CONTROLLERS, missing_controllers_path, verified_controllers_path, print_cgroups())) + + log.info('Verified all extension cgroup controller paths are present and they are: \n {0}'.format(verified_controllers_path)) + + +def verify_extension_service_cgroup_created_on_file_system(): + """ + This method ensure that extension service cgroup paths are created on file system after running extension + """ + log.info("===== Verifying the extension service cgroup paths exist on file system") + + # GA Test Extension Service + gatestext_cgroup_mount_path = get_unit_cgroup_mount_path(GATESTEXT_SERVICE) + verify_extension_service_cgroup_created(GATESTEXT_SERVICE, gatestext_cgroup_mount_path) + + # Azure Monitor Extension Service + azuremonitoragent_cgroup_mount_path = get_unit_cgroup_mount_path(AZUREMONITORAGENT_SERVICE) + azuremonitoragent_service_name = AZUREMONITORAGENT_SERVICE + # Old versions of AMA extension has different service name + if azuremonitoragent_cgroup_mount_path is None: + azuremonitoragent_cgroup_mount_path = get_unit_cgroup_mount_path(MDSD_SERVICE) + azuremonitoragent_service_name = MDSD_SERVICE + verify_extension_service_cgroup_created(azuremonitoragent_service_name, azuremonitoragent_cgroup_mount_path) + + log.info('Verified all extension service cgroup paths created in file system .\n') + + +def verify_extension_service_cgroup_created(service_name, cgroup_mount_path): + log.info("expected extension service cgroup mount path: %s", cgroup_mount_path) + + all_controllers_present = True + missing_cgroups_path = [] + verified_cgroups_path = [] + + for controller in EXT_CONTROLLERS: + # cgroup_mount_path is similar to /azure.slice/walinuxagent.service + # cgroup_mount_path[1:] = azure.slice/walinuxagent.service + # expected extension_service_controller_path similar to /sys/fs/cgroup/cpu/azure.slice/walinuxagent.service + extension_service_controller_path = os.path.join(BASE_CGROUP, controller, cgroup_mount_path[1:]) + + if not os.path.exists(extension_service_controller_path): + all_controllers_present = False + missing_cgroups_path.append(extension_service_controller_path) + else: + verified_cgroups_path.append(extension_service_controller_path) + + if not all_controllers_present: + fail("Extension service: [{0}] cgroup paths couldn't be found on file system. Missing cgroup paths are: {1} \n Verified cgroup paths are: {2} \n " + "System mounted cgroups are \n{3}".format(service_name, missing_cgroups_path, verified_cgroups_path, print_cgroups())) + + +def verify_ext_cgroups_tracked(): + """ + Checks if ext cgroups are tracked by the agent. This is verified by checking the agent log for the message "Started tracking cgroup {extension_name}" + """ + log.info("===== Verifying ext cgroups tracked") + + cgroups_added_for_telemetry = [] + gatestext_cgroups_tracked = False + azuremonitoragent_cgroups_tracked = False + gatestext_service_cgroups_tracked = False + azuremonitoragent_service_cgroups_tracked = False + + for record in AgentLog().read(): + + # Cgroup tracking logged as + # 2021-11-14T13:09:59.351961Z INFO ExtHandler ExtHandler Started tracking cgroup Microsoft.Azure.Extensions.Edp.GATestExtGo-1.0.0.2 + # [/sys/fs/cgroup/cpu,cpuacct/azure.slice/azure-vmextensions.slice/azure-vmextensions-Microsoft.Azure.Extensions.Edp.GATestExtGo_1.0.0.2.slice] + cgroup_tracked_match = CGROUP_TRACKED_PATTERN.findall(record.message) + if len(cgroup_tracked_match) != 0: + name, path = cgroup_tracked_match[0][0], cgroup_tracked_match[0][1] + if name.startswith(GATESTEXT_FULL_NAME): + gatestext_cgroups_tracked = True + elif name.startswith(AZUREMONITOREXT_FULL_NAME): + azuremonitoragent_cgroups_tracked = True + elif name.startswith(GATESTEXT_SERVICE): + gatestext_service_cgroups_tracked = True + elif name.startswith(AZUREMONITORAGENT_SERVICE) or name.startswith(MDSD_SERVICE): + azuremonitoragent_service_cgroups_tracked = True + cgroups_added_for_telemetry.append((name, path)) + + # agent, gatest extension, azuremonitor extension and extension service cgroups + if len(cgroups_added_for_telemetry) < 1: + fail('Expected cgroups were not tracked, according to the agent log. ' + 'Pattern searched for: {0} and found \n{1}'.format(CGROUP_TRACKED_PATTERN.pattern, cgroups_added_for_telemetry)) + + if not gatestext_cgroups_tracked: + fail('Expected gatestext cgroups were not tracked, according to the agent log. ' + 'Pattern searched for: {0} and found \n{1}'.format(CGROUP_TRACKED_PATTERN.pattern, cgroups_added_for_telemetry)) + + if not azuremonitoragent_cgroups_tracked: + fail('Expected azuremonitoragent cgroups were not tracked, according to the agent log. ' + 'Pattern searched for: {0} and found \n{1}'.format(CGROUP_TRACKED_PATTERN.pattern, cgroups_added_for_telemetry)) + + if not gatestext_service_cgroups_tracked: + fail('Expected gatestext service cgroups were not tracked, according to the agent log. ' + 'Pattern searched for: {0} and found \n{1}'.format(CGROUP_TRACKED_PATTERN.pattern, cgroups_added_for_telemetry)) + + if not azuremonitoragent_service_cgroups_tracked: + fail('Expected azuremonitoragent service cgroups were not tracked, according to the agent log. ' + 'Pattern searched for: {0} and found \n{1}'.format(CGROUP_TRACKED_PATTERN.pattern, cgroups_added_for_telemetry)) + + log.info("Extension cgroups tracked as expected\n%s", cgroups_added_for_telemetry) + + +def main(): + verify_if_distro_supports_cgroup() + verify_ext_cgroup_controllers_created_on_file_system() + verify_custom_script_cgroup_assigned_correctly() + verify_agent_cgroup_assigned_correctly() + verify_extension_service_cgroup_created_on_file_system() + verify_ext_cgroups_tracked() + + +try: + run_remote_test(main) +except Exception as e: + # It is possible that agent cgroup can be disabled due to UNKNOWN process or throttled before we run this check, in that case, we should ignore the validation + if check_agent_quota_disabled() and check_cgroup_disabled_with_unknown_process(): + log.info("Cgroup is disabled due to UNKNOWN process, ignoring ext cgroups validations") + else: + raise diff --git a/tests_e2e/tests/scripts/ext_sequencing-get_ext_enable_time.py b/tests_e2e/tests/scripts/ext_sequencing-get_ext_enable_time.py new file mode 100755 index 000000000..f65da676b --- /dev/null +++ b/tests_e2e/tests/scripts/ext_sequencing-get_ext_enable_time.py @@ -0,0 +1,89 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Gets the timestamp for when the provided extension was enabled +# + +import argparse +import json +import os +import sys + +from pathlib import Path + + +def main(): + """ + Returns the timestamp of when the provided extension was enabled + """ + parser = argparse.ArgumentParser() + parser.add_argument("--ext", dest='ext', required=True) + args, _ = parser.parse_known_args() + + # Extension enabled time is in extension extension status file + ext_dirs = [item for item in os.listdir(Path('/var/lib/waagent')) if item.startswith(args.ext)] + if not ext_dirs: + print("Extension {0} directory does not exist".format(args.ext), file=sys.stderr) + sys.exit(1) + ext_status_path = Path('/var/lib/waagent/' + ext_dirs[0] + '/status') + ext_status_files = os.listdir(ext_status_path) + ext_status_files.sort() + if not ext_status_files: + # Extension did not report a status + print("Extension {0} did not report a status".format(args.ext), file=sys.stderr) + sys.exit(1) + latest_ext_status_path = os.path.join(ext_status_path, ext_status_files[-1]) + ext_status_file = open(latest_ext_status_path, 'r') + ext_status = json.loads(ext_status_file.read()) + + # Example status file + # [ + # { + # "status": { + # "status": "success", + # "formattedMessage": { + # "lang": "en-US", + # "message": "Enable succeeded" + # }, + # "operation": "Enable", + # "code": "0", + # "name": "Microsoft.Azure.Monitor.AzureMonitorLinuxAgent" + # }, + # "version": "1.0", + # "timestampUTC": "2023-12-12T23:14:45Z" + # } + # ] + msg = "" + if len(ext_status) == 0 or not ext_status[0]['status']: + msg = "Extension {0} did not report a status".format(args.ext) + elif not ext_status[0]['status']['operation'] or ext_status[0]['status']['operation'] != 'Enable': + msg = "Extension {0} did not report a status for enable operation".format(args.ext) + elif ext_status[0]['status']['status'] != 'success': + msg = "Extension {0} did not report success for the enable operation".format(args.ext) + elif not ext_status[0]['timestampUTC']: + msg = "Extension {0} did not report the time the enable operation succeeded".format(args.ext) + else: + print(ext_status[0]['timestampUTC']) + sys.exit(0) + + print(msg, file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/ext_telemetry_pipeline-add_extension_events.py b/tests_e2e/tests/scripts/ext_telemetry_pipeline-add_extension_events.py new file mode 100755 index 000000000..2e5776c71 --- /dev/null +++ b/tests_e2e/tests/scripts/ext_telemetry_pipeline-add_extension_events.py @@ -0,0 +1,224 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Adds extension events for each provided extension and verifies the TelemetryEventsCollector collected or dropped them +# + +import argparse +import json +import os +import sys +import time +import uuid + +from assertpy import fail +from datetime import datetime, timedelta +from random import choice +from typing import List + +from tests_e2e.tests.lib.agent_log import AgentLog +from tests_e2e.tests.lib.logging import log + + +def add_extension_events(extensions: List[str], bad_event_count=0, no_of_events_per_extension=50): + def missing_key(bad_event): + key = choice(list(bad_event.keys())) + del bad_event[key] + return "MissingKeyError: {0}".format(key) + + def oversize_error(bad_event): + bad_event["EventLevel"] = "ThisIsAnOversizeError\n" * 300 + return "OversizeEventError" + + def empty_message(bad_event): + bad_event["Message"] = "" + return "EmptyMessageError" + + errors = [ + missing_key, + oversize_error, + empty_message + ] + + sample_ext_event = { + "EventLevel": "INFO", + "Message": "Starting IaaS ScriptHandler Extension v1", + "Version": "1.0", + "TaskName": "Extension Info", + "EventPid": "3228", + "EventTid": "1", + "OperationId": "519e4beb-018a-4bd9-8d8e-c5226cf7f56e", + "TimeStamp": "2019-12-12T01:20:05.0950244Z" + } + + sample_messages = [ + "Starting IaaS ScriptHandler Extension v1", + "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.", + "The quick brown fox jumps over the lazy dog", + "Cursus risus at ultrices mi.", + "Doing Something", + "Iaculis eu non diam phasellus.", + "Doing other thing", + "Look ma, lemons", + "Pretium quam vulputate dignissim suspendisse.", + "Man this is insane", + "I wish it worked as it should and not as it ain't", + "Ut faucibus pulvinar elementum integer enim neque volutpat ac tincidunt." + "Did you get any of that?", + "Non-English message - 此文字不是英文的" + "κόσμε", + "�", + "Quizdeltagerne spiste jordbær med fløde, mens cirkusklovnen Wolther spillede på xylofon.", + "Falsches Üben von Xylophonmusik quält jeden größeren Zwerg", + "Zwölf Boxkämpfer jagten Eva quer über den Sylter Deich", + "Heizölrückstoßabdämpfung", + "Γαζέες καὶ μυρτιὲς δὲν θὰ βρῶ πιὰ στὸ χρυσαφὶ ξέφωτο", + "Ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία", + "El pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío, añoraba a su querido cachorro.", + "Portez ce vieux whisky au juge blond qui fume sur son île intérieure, à côté de l'alcôve ovoïde, où les bûches", + "se consument dans l'âtre, ce qui lui permet de penser à la cænogenèse de l'être dont il est question", + "dans la cause ambiguë entendue à Moÿ, dans un capharnaüm qui, pense-t-il, diminue çà et là la qualité de son œuvre.", + "D'fhuascail Íosa, Úrmhac na hÓighe Beannaithe, pór Éava agus Ádhaimh", + "Árvíztűrő tükörfúrógép", + "Kæmi ný öxi hér ykist þjófum nú bæði víl og ádrepa", + "Sævör grét áðan því úlpan var ónýt", + "いろはにほへとちりぬるを わかよたれそつねならむ うゐのおくやまけふこえて あさきゆめみしゑひもせす", + "イロハニホヘト チリヌルヲ ワカヨタレソ ツネナラム ウヰノオクヤマ ケフコエテ アサキユメミシ ヱヒモセスン", + "? דג סקרן שט בים מאוכזב ולפתע מצא לו חברה איך הקליטה" + "Pchnąć w tę łódź jeża lub ośm skrzyń fig", + "В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!", + "๏ เป็นมนุษย์สุดประเสริฐเลิศคุณค่า กว่าบรรดาฝูงสัตว์เดรัจฉาน", + "Pijamalı hasta, yağız şoföre çabucak güvendi." + ] + + for ext in extensions: + bad_count = bad_event_count + event_dir = os.path.join("/var/log/azure/", ext, "events") + if not os.path.isdir(event_dir): + fail(f"Expected events dir: {event_dir} does not exist") + + log.info("") + log.info("Expected dir: {0} exists".format(event_dir)) + log.info("Creating random extension events for {0}. No of Good Events: {1}, No of Bad Events: {2}".format( + ext, no_of_events_per_extension - bad_event_count, bad_event_count)) + + new_opr_id = str(uuid.uuid4()) + event_list = [] + + for _ in range(no_of_events_per_extension): + event = sample_ext_event.copy() + event["OperationId"] = new_opr_id + event["TimeStamp"] = datetime.utcnow().strftime(u'%Y-%m-%dT%H:%M:%S.%fZ') + event["Message"] = choice(sample_messages) + + if bad_count != 0: + # Make this event a bad event + reason = choice(errors)(event) + bad_count -= 1 + + # Missing key error might delete the TaskName key from the event + if "TaskName" in event: + event["TaskName"] = "{0}. This is a bad event: {1}".format(event["TaskName"], reason) + else: + event["EventLevel"] = "{0}. This is a bad event: {1}".format(event["EventLevel"], reason) + + event_list.append(event) + + file_name = os.path.join(event_dir, '{0}.json'.format(int(time.time() * 1000000))) + log.info("Create json with extension events in event directory: {0}".format(file_name)) + with open("{0}.tmp".format(file_name), 'w+') as f: + json.dump(event_list, f) + os.rename("{0}.tmp".format(file_name), file_name) + + +def wait_for_extension_events_dir_empty(extensions: List[str]): + # By ensuring events dir to be empty, we verify that the telemetry events collector has completed its run + start_time = datetime.now() + timeout = timedelta(minutes=2) + ext_event_dirs = [os.path.join("/var/log/azure/", ext, "events") for ext in extensions] + + while (start_time + timeout) >= datetime.now(): + log.info("") + log.info("Waiting for extension event directories to be empty...") + all_dir_empty = True + for event_dir in ext_event_dirs: + if not os.path.exists(event_dir) or len(os.listdir(event_dir)) != 0: + log.info("Dir: {0} is not yet empty".format(event_dir)) + all_dir_empty = False + + if all_dir_empty: + log.info("Extension event directories are empty: \n{0}".format(ext_event_dirs)) + return + + time.sleep(20) + + fail("Extension events dir not empty before 2 minute timeout") + + +def main(): + # This test is a best effort test to ensure that the agent does not throw any errors while trying to transmit + # events to wireserver. We're not validating if the events actually make it to wireserver. + + parser = argparse.ArgumentParser() + parser.add_argument("--extensions", dest='extensions', type=str, required=True) + parser.add_argument("--num_events_total", dest='num_events_total', type=int, required=True) + parser.add_argument("--num_events_bad", dest='num_events_bad', type=int, required=False, default=0) + args, _ = parser.parse_known_args() + + extensions = args.extensions.split(',') + add_extension_events(extensions=extensions, bad_event_count=args.num_events_bad, + no_of_events_per_extension=args.num_events_total) + + # Ensure that the event collector ran after adding the events + wait_for_extension_events_dir_empty(extensions=extensions) + + # Sleep for a min to ensure that the TelemetryService has enough time to send events and report errors if any + time.sleep(60) + found_error = False + agent_log = AgentLog() + + log.info("") + log.info("Check that the TelemetryEventsCollector did not emit any errors while collecting and reporting events...") + telemetry_event_collector_name = "TelemetryEventsCollector" + for agent_record in agent_log.read(): + if agent_record.thread == telemetry_event_collector_name and agent_record.level == "ERROR": + found_error = True + log.info("waagent.log contains the following errors emitted by the {0} thread: \n{1}".format(telemetry_event_collector_name, agent_record)) + + if found_error: + fail("Found error(s) emitted by the TelemetryEventsCollector, but none were expected.") + log.info("The TelemetryEventsCollector did not emit any errors while collecting and reporting events") + + for ext in extensions: + good_count = args.num_events_total - args.num_events_bad + log.info("") + if not agent_log.agent_log_contains("Collected {0} events for extension: {1}".format(good_count, ext)): + fail("The TelemetryEventsCollector did not collect the expected number of events: {0} for {1}".format(good_count, ext)) + log.info("All {0} good events for {1} were collected by the TelemetryEventsCollector".format(good_count, ext)) + + if args.num_events_bad != 0: + log.info("") + if not agent_log.agent_log_contains("Dropped events for Extension: {0}".format(ext)): + fail("The TelemetryEventsCollector did not drop bad events for {0} as expected".format(ext)) + log.info("The TelemetryEventsCollector dropped bad events for {0} as expected".format(ext)) + + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/fips-check_fips_mariner b/tests_e2e/tests/scripts/fips-check_fips_mariner new file mode 100755 index 000000000..e5a7730be --- /dev/null +++ b/tests_e2e/tests/scripts/fips-check_fips_mariner @@ -0,0 +1,56 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Verifies whether FIPS is enabled on Mariner 2.0 +# + +set -euo pipefail + +# Check if FIPS mode is enabled by the kernel (returns 1 if enabled) +fips_enabled=$(sudo cat /proc/sys/crypto/fips_enabled) +if [ "$fips_enabled" != "1" ]; then + echo "FIPS is not enabled by the kernel: $fips_enabled" + exit 1 +fi + +# Check if sysctl is configured (returns crypto.fips_enabled = 1 if enabled) +sysctl_configured=$(sudo sysctl crypto.fips_enabled) +if [ "$sysctl_configured" != "crypto.fips_enabled = 1" ]; then + echo "sysctl is not configured for FIPS: $sysctl_configured" + exit 1 +fi + +# Check if openssl library is running in FIPS mode +# MD5 should fail; the command's output should be similar to: +# Error setting digest +# 131590634539840:error:060800C8:digital envelope routines:EVP_DigestInit_ex:disabled for FIPS:crypto/evp/digest.c:135: +openssl=$(openssl md5 < /dev/null 2>&1 || true) +if [[ "$openssl" != *"disabled for FIPS"* ]]; then + echo "openssl is not running in FIPS mode: $openssl" + exit 1 +fi + +# Check if dracut-fips is installed (returns dracut-fips-) +dracut_fips=$( (rpm -qa | grep dracut-fips) || true ) +if [[ "$dracut_fips" != *"dracut-fips"* ]]; then + echo "dracut-fips is not installed: $dracut_fips" + exit 1 +fi + +echo "FIPS mode is enabled." \ No newline at end of file diff --git a/tests_e2e/tests/scripts/fips-enable_fips_mariner b/tests_e2e/tests/scripts/fips-enable_fips_mariner new file mode 100755 index 000000000..8259b8d6c --- /dev/null +++ b/tests_e2e/tests/scripts/fips-enable_fips_mariner @@ -0,0 +1,53 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# +# Enables FIPS on Mariner 2.0 +# + +set -euo pipefail + +echo "Installing packages required packages to enable FIPS..." +sudo tdnf install -y grubby dracut-fips + +# +# Set boot_uuid variable for the boot partition if different from the root +# +boot_dev="$(df /boot/ | tail -1 | cut -d' ' -f1)" +echo "Boot partition: $boot_dev" + +root_dev="$(df / | tail -1 | cut -d' ' -f1)" +echo "Root partition: $root_dev" + +boot_uuid="" +if [ "$boot_dev" != "$root_dev" ]; then + boot_uuid="boot=UUID=$(blkid $boot_dev -s UUID -o value)" + echo "Boot UUID: $boot_uuid" +fi + +# +# Enable FIPS and set boot= parameter +# +echo "Enabling FIPS..." +if sudo grub2-editenv - list | grep -q kernelopts; then + set -x + sudo grub2-editenv - set "$(sudo grub2-editenv - list | grep kernelopts) fips=1 $boot_uuid" +else + set -x + sudo grubby --update-kernel=ALL --args="fips=1 $boot_uuid" +fi \ No newline at end of file diff --git a/tests_e2e/tests/scripts/get-waagent-conf-value b/tests_e2e/tests/scripts/get-waagent-conf-value new file mode 100755 index 000000000..663ca1811 --- /dev/null +++ b/tests_e2e/tests/scripts/get-waagent-conf-value @@ -0,0 +1,41 @@ +#!/usr/bin/env bash + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# Echos the value in waagent.conf for the specified setting if it exists. +# + +set -euo pipefail + +if [[ $# -lt 1 ]]; then + echo "Usage: get-waagent-conf-value " + exit 1 +fi + +PYTHON=$(get-agent-python) +waagent_conf=$($PYTHON -c 'from azurelinuxagent.common.osutil import get_osutil; print(get_osutil().agent_conf_file_path)') + +cat $waagent_conf | while read line +do + if [[ $line == $1* ]]; then + IFS='=' read -a values <<< "$line" + echo ${values[1]} + exit 0 + fi +done diff --git a/tests_e2e/tests/scripts/get_distro.py b/tests_e2e/tests/scripts/get_distro.py new file mode 100755 index 000000000..e9151f653 --- /dev/null +++ b/tests_e2e/tests/scripts/get_distro.py @@ -0,0 +1,35 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Prints the distro and version of the machine +# + +import sys + +from azurelinuxagent.common.version import get_distro + + +def main(): + # Prints '_' + distro = get_distro() + print(distro[0] + "_" + distro[1].replace('.', '')) + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/recover_network_interface-get_nm_controlled.py b/tests_e2e/tests/scripts/recover_network_interface-get_nm_controlled.py new file mode 100755 index 000000000..32ca378d8 --- /dev/null +++ b/tests_e2e/tests/scripts/recover_network_interface-get_nm_controlled.py @@ -0,0 +1,39 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# +import sys + +from azurelinuxagent.common.osutil import get_osutil + + +def main(): + os_util = get_osutil() + ifname = os_util.get_if_name() + nm_controlled = os_util.get_nm_controlled(ifname) + + if nm_controlled: + print("Interface is NM controlled") + else: + print("Interface is NOT NM controlled") + + sys.exit(0) + + +if __name__ == "__main__": + main() diff --git a/tests_e2e/tests/scripts/samples-error_remote_test.py b/tests_e2e/tests/scripts/samples-error_remote_test.py new file mode 100755 index 000000000..fd7c3810f --- /dev/null +++ b/tests_e2e/tests/scripts/samples-error_remote_test.py @@ -0,0 +1,36 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# A sample remote test that simulates an unexpected error +# + +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test + + +def main(): + log.info("Setting up test") + log.info("Doing some operation") + log.warning("Something went wrong, but the test can continue") + log.info("Doing some other operation") + raise Exception("Something went wrong") # simulate an unexpected error + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/samples-fail_remote_test.py b/tests_e2e/tests/scripts/samples-fail_remote_test.py new file mode 100755 index 000000000..2e2cbae69 --- /dev/null +++ b/tests_e2e/tests/scripts/samples-fail_remote_test.py @@ -0,0 +1,37 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# A sample remote test that fails +# + +from assertpy import fail +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test + + +def main(): + log.info("Setting up test") + log.info("Doing some operation") + log.warning("Something went wrong, but the test can continue") + log.info("Doing some other operation") + fail("Verification of the operation failed") + + +run_remote_test(main) diff --git a/tests_e2e/tests/scripts/samples-pass_remote_test.py b/tests_e2e/tests/scripts/samples-pass_remote_test.py new file mode 100755 index 000000000..1c65f5332 --- /dev/null +++ b/tests_e2e/tests/scripts/samples-pass_remote_test.py @@ -0,0 +1,36 @@ +#!/usr/bin/env pypy3 + +# Microsoft Azure Linux Agent +# +# Copyright 2018 Microsoft Corporation +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# +# A sample remote test that passes +# + +from tests_e2e.tests.lib.logging import log +from tests_e2e.tests.lib.remote_test import run_remote_test + + +def main(): + log.info("Setting up test") + log.info("Doing some operation") + log.warning("Something went wrong, but the test can continue") + log.info("Doing some other operation") + log.info("All verifications succeeded") + + +run_remote_test(main)