Skip to content

VSC Reframe Meeting 2022 03 10

Michele Pugno edited this page Mar 10, 2022 · 2 revisions

VSC ReFrame meeting 2022-03-10

Attendees

  • Franky
  • Kenneth
  • Maxime
  • Michele
  • Robin
  • Sam
  • Steven

Reached Milestones

  • folder structure and recursive launch file
  • tools availability and version test
  • basic environment variable availability and correctness test
  • shared FSs test (Not merged yet)
  • mpi hello world
    • works on: Hydra (VUB), Vaughn, Hortense, BrENIAC?

Notes

  • best practices

    • create a small readme.md inside each test folder.
  • naming conventions

    • tests that require loading specific modules
    • difference in module naming
    • example: module load Python/3 (or Python3) should work everywhere
      • should load latest version in production
      • via symlink
    • get_module_name function to get module name for OpenFOAM (which could be tweaked per site)
      • could give flexibility to run tests with different modules (like MPI tests with a range of toolchains)
      • get_module_name function could pick up on environment variables like VSC_TES_SUITE_MPI_MODULE=...
    • custom "partition" in ReFrame configuration
      • also specifies environment, resource manager, MPI launcher
    • wrong module names only trigger warnings (stderr stream out) in ReFrame (to test failures)
  • system tools

    • which tools should be available system-wide vs through modules?
    • for some tools (git, tmux) more recent versions can be useful
    • Singularity should never be installed through modules (create a test that check singularity is not isntalled as a module)
    • EasyBuild should always be installed as a module
    • Mercurial installed as a module requires python-devel OS package?
    • tools installed as modules should be installed with system toolchain (so they can be used along with any other modules)
    • no distinction between installed on login node vs workernodes
      • should also be clear from CUE working group
    • columns for common set of tools:
      • list of tools, system-wide vs module (w/ system toolchain), login vs workernode, version req.
    • minimal set of software per toolchain generation: Python, ...
    • (Kenneth) Apptainer vs Singularity
      • Apptainer will eventually replace Singularity in EPEL package repo
      • Apptainer will still provide singularity command
      • update to Apptainer will require renaming configuration file from /etc/singularity.conf to /etc/apptainer.conf
  • env var tests

    • How should variables like $VSC_DATA_VO should be handled?
      • Test should have logic to ensure it is set (correctly) when it should be set (like UGent VSC account in non-default VO)
    • User based env test according to the site where the user is logged in
    • $VSC_DATA_VO should always be set to /does/not/exist for sites that don't use VO concept
      • having it set to an incorrect value is better than not defining it (think use of $VSC_DATA_VO in job scripts that are shared across sites)
    • $VSC_ARCH_LOCAL: how should this be checked?
      • via archspec
      • how should archspec be available?
        • as a part of ReFrame installation?
          • this makes most sense?
          • archspec is an optional dependency for ReFrame
          • reframe --detect-host-topology the command uses https://github.com/archspec/archspec
          • should be easy to change this in ReFrame easyconfig file
          • isn't that installed automatically while executing the bootstrap during indtallation?
        • as system-wide installation (part of CUE)?
        • as a module (part of CUE)?
  • Stick to Reframe 3.10.1 VSC wide

    • goal should be to clean up run.sh script to avoid site-specific things in there
    • ReFrame installed as a module should be part of CUE
    • keep up with ReFrame versions as they are released: pull request to bump ReFrame version in run.sh script to agree on version bump
    • need to set $SBATCH_ACCOUNT on Tier-1 Hortense can be avoided by letting script that sets up environment do that automatically (when you only have 1 active project), if using environments to describe it inside config file be careful to specify correct target systems in order to avoid overlap between environs with the same name (e.g. single-node) configuring the environment variable inside a partition in the config file(automatically works for the correct system)
  • shared FS test is WIP

    • PR is open by Robin
    • PR is being discused
      • should permissions also be checked?
      • maybe first version of this test should only check dir existence, permission check can be added later
    • policy for directory permissions should be agreed in CUE?
  • putting test suite in hands of users

    • especially relevant for CUE tests
    • let users check whether their account adher to current environment
  • can CUE working group on a common way to check storage quota?

    • myquota (GJB)
    • myquota --onlyscratch
    • Steven only sees scratch quota on Vaughan with his VSC account
    • my_dodrio_quota (Hortense)
  • (Mantainability) tests with parameters have long folder names, can we fix this?

    • changing name via attribute of tests is deprecated, will probably no longer be possible in ReFrame 4.x
    • reach out to ReFrame developers on this via Slack or community meetup?
  • Use branches or forks?

    • open PRs from a branch in your own fork
    • allow pushes in your fork from vsc team members
  • Open Issues on Github?

Second Sprint

  • Finish first iteration over FS test (Robin, Michele)
  • Upgrade to Reframe 3.10.1, better run.sh script, Add envvar to partitions in the config file. (Everyone)
  • Second iteration over envar test (Steven, Samuel)
  • Add one app test (e.g. numpy) (Michele, Samuel)
  • next meeting: Thu 21st April 2022