diff --git a/docs/no_toc/01-intro.md b/docs/no_toc/01-intro.md index ddd84b2..92747ca 100644 --- a/docs/no_toc/01-intro.md +++ b/docs/no_toc/01-intro.md @@ -12,8 +12,8 @@ The course is intended for students in the biomedical sciences and researchers w _This course is written for individuals who:_ - Are comfortable with GitHub and know how to make a pull request -- Wish to save time and enhance their scientific projects using automation -- Perhaps previously tried to learn about GitHub Actions but felt overwhelmed on how to get started +- Wish to save time and enhance their scientific projects using automation +- Have perhaps tried to learn about GitHub Actions before but felt overwhelmed about how to start @@ -56,4 +56,4 @@ This course is meant to teach learners how to create sophisticated GitHub Action ## How to use the course -Ideally you should follow along with the chapters and perform they activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub. +Ideally you should follow along with the chapters and perform the activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub. diff --git a/docs/no_toc/02-programming-practices.md b/docs/no_toc/02-programming-practices.md index 2911961..9bf5d6c 100644 --- a/docs/no_toc/02-programming-practices.md +++ b/docs/no_toc/02-programming-practices.md @@ -50,7 +50,7 @@ We will take a closer look at two concrete examples, one on each end of the soft Imagine that you have two sampling distributions (lists/arrays of numbers) and you want to test whether the means of the distributions are statistically equivalent or not. This is the setup for a _t_-test. -_t_-tests are implemented in standard functions in R ([`base` library](https://rdrr.io/r/base/base-package.html)) and Python ([`scipy` library](https://scipy.org/)), as well as most other commonly used programming languages. +_t_-tests are implemented in standard functions in [base R](https://rdrr.io/r/base/base-package.html) and [Python `scipy` library](https://scipy.org/), as well as most other commonly used programming languages. @@ -62,7 +62,7 @@ In your own software, you might need to do some verification of your input (for Imagine that you have a list of reads from a [DNA sequencing](https://en.wikipedia.org/wiki/DNA_sequencing#High-throughput_methods) machine, and you want to use these data to answer a biological question, or to make a plot/visualization to communicate a biological insight. This is a much less well-defined problem than our previous example, with many more independently operating components, and many more subjective decisions that a researcher must make along the way. -Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals. +Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals. @@ -96,4 +96,3 @@ This eliminates the need for you to remember to run tests, to clean up your code Later in the course, we will talk more specifically about what exactly automation via continuous integration looks like, and go into more depth as to its uses and benefits. - diff --git a/docs/no_toc/03-why-automation.md b/docs/no_toc/03-why-automation.md index 1c80ec2..3f18ae1 100644 --- a/docs/no_toc/03-why-automation.md +++ b/docs/no_toc/03-why-automation.md @@ -41,9 +41,9 @@ Returning to those 10 researchers, if instead of having those 10 people manually -## Continuous integration / Continous deployment +## Continuous integration / Continuous deployment -Before we discuss the concept of Continous integration / Continuous deployment (often abbreviated CI/CD), let's use an analogy. +Before we discuss the concept of Continuous integration / Continuous deployment (often abbreviated CI/CD), let's use an analogy. @@ -68,7 +68,7 @@ Let's assume over the course of developing a project, bugs are introduced at a c Without using CI/CD you may find yourself trying to fix many bugs at once! This will make the bugs harder to isolate and harder to fix and pinpoint. The amount of time it will take to fix 3 bugs at once may be exponentially higher than if you caught these bugs one at a time. Additionally, the longer amount of time that goes on before you catch a bug, it may be more likely it will get accidentally incorporated into your published results -- this will be a lot more work for you and others to rectify. -However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run a muck! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress! +However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run amock! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress! @@ -76,7 +76,7 @@ This is not only true for classic "my script won't run" bugs but also "silent" b -### Continous Integration / Continuous Deployment +### Continuous Integration / Continuous Deployment A workflow that uses CI/CD principles may look like this: diff --git a/docs/no_toc/04-gha-basics.md b/docs/no_toc/04-gha-basics.md index d91192d..c8c4fba 100644 --- a/docs/no_toc/04-gha-basics.md +++ b/docs/no_toc/04-gha-basics.md @@ -16,7 +16,7 @@ All GitHub Actions involve answering three questions: 2. What should be run? 3. With what environment should the thing be run? -These questions and other specifications are set by writing a [YAML file](https://www.redhat.com/en/topics/automation/what-is-yaml). YAML files are human readable markup language files. Basically its a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we're going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do. +These questions and other specifications are set by writing a [YAML file](https://www.redhat.com/en/topics/automation/what-is-yaml). YAML files are human readable markup language files. Basically it's a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we're going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do. @@ -27,7 +27,7 @@ The headlines about working with YAML files: 1. Spacing is VERY specific! -- incorrect spacing will definitely result in errors for your GitHub Action run. Let's take a look at an example YAML. -Note that the what comes before a `:` is generally a name and indent indicate subsets of a list. So in the overall list of `food` we have sublists of `vegetables` and `fruits`. `#` can be used as a comment and will not be treated as code. +Note that what comes before a `:` is generally a name and indentations indicate subsets of a list. So in the overall list of `food` we have sublists of `vegetables` and `fruits`. `#` can be used as a comment and will not be treated as code. Additionally, `:` are often names. So `citrus` is the name for the item `oranges` and etc. ``` @@ -107,7 +107,7 @@ So for example, there are built in operating systems like `windows-latest`, `mac -But just like a Windows machine straight out of a box is unlikely to have everything you need to run some code, you may need a more specific computing environment. You can also create custom environments using `containerization`. +Similarly to how a new Windows computer may not come equipped with all the software you need to execute certain code, you might require a more tailored computing environment. You can also create custom environments using `containerization`. ### Containerization @@ -115,7 +115,7 @@ A "virtual machine" is basically when your computer creates its own fake compute -Containers aren't virtual machines, but they do a similar thing, which is they spin up a computing environment where you can do things. They are called containers because they are isolated from the rest of your computer. +Containers, while not virtual machines, serve a similar purpose by creating an isolated computing environment where tasks can be performed. They are named "containers" due to their separation from the rest of your computer. Containerization is useful because it allows us to share our computing environments with others. This is useful because it can be a powerful tool for reproducing analyses if we are controlling our computing environments. @@ -151,7 +151,7 @@ Docker is a whole other world. There's whole conferences, hackathons, and etc de
-Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data on your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed! +Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data in your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed!
diff --git a/docs/no_toc/05-gha-run-analysis.md b/docs/no_toc/05-gha-run-analysis.md index 8a2e86f..7ad291d 100644 --- a/docs/no_toc/05-gha-run-analysis.md +++ b/docs/no_toc/05-gha-run-analysis.md @@ -79,11 +79,12 @@ on: In our `jobs:` we've named this job `R run analysis`. -Additionally we are running this on a `ubuntu-latest` operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn't need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, python, and some specific packages. +Additionally we are running this on a `ubuntu-latest` operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn't need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, Python, and some specific packages. -We could, attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made docker image that has R, python, and other packages we need already installed. +We could attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made Docker image that has R, python, and other packages we need already installed. + +This custom made Docker image is pulled from [Dockerhub](https://hub.docker.com/r/jhudsl/ottr_python). If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others [managed and version controlled on this GitHub repository](https://github.com/jhudsl/ottr_docker). You may note we use GitHub Actions on this repository to help us manage these Docker images. -This custom made docker image is pulled from [Dockerhub and it exists here](https://hub.docker.com/r/jhudsl/ottr_python). If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others [managed and version controlled here on this GitHub repository](https://github.com/jhudsl/ottr_docker). You may note we use GitHub Actions on this repository to help us manage these Docker images. ``` jobs: re-run: @@ -96,7 +97,7 @@ jobs: #### actions/checkout -One of the most frequently use GitHub Actions [from the GitHub Action Marketplace is `actions/checkout`](https://github.com/actions/checkout). This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them). +One of the most frequently use GitHub Actions from the GitHub Action Marketplace is [`actions/checkout`](https://github.com/actions/checkout). This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them). ``` steps: @@ -109,7 +110,7 @@ steps: By default, it will checkout the files from the repository where this action is being run, but we could specify other repository and other branches. -`fetch-depth: 0` means we will grab all the file. +`fetch-depth: 0` means we will grab all the files. #### sh run_analysis.sh @@ -124,7 +125,7 @@ Additionally the `|` tells `run:` to expect multiple lines of a command. We didn run: | sh run_analysis.sh ``` -We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the [run_analysis.sh](https://github.com/fhdsl/github-actions-workshop/blob/main/run_analysis.sh) file, you will see its basically simple workflow step calling file. +We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the [run_analysis.sh](https://github.com/fhdsl/github-actions-workshop/blob/main/run_analysis.sh) file, you will see it is basically a simple workflow step calling file. It looks like this: ``` diff --git a/docs/no_toc/06-gha-variables.md b/docs/no_toc/06-gha-variables.md index c8520e1..4b19d32 100644 --- a/docs/no_toc/06-gha-variables.md +++ b/docs/no_toc/06-gha-variables.md @@ -96,7 +96,8 @@ To do this we can use this sort of set up: -Step that sets a variable depending on some output +Step that sets a variable depending on some output: + ``` # How to export a variable to a next step - name: Setting output to the environment at large diff --git a/docs/no_toc/07-gha-troubleshooting.md b/docs/no_toc/07-gha-troubleshooting.md index 514ca68..70acb01 100644 --- a/docs/no_toc/07-gha-troubleshooting.md +++ b/docs/no_toc/07-gha-troubleshooting.md @@ -4,7 +4,7 @@ -Many of your standard programming troubleshooting skills are applicable with GitHub actions. In this chapter we'll five you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like. +Many of your standard programming troubleshooting skills are applicable with GitHub Actions. In this chapter we'll give you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like. ## Tips @@ -42,7 +42,7 @@ In order to design these steps you are going to need to look closely at your log ### Look at the logs closely! -Whether you GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on. +Whether your GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on. You should start by scrolling down on the Actions page to look at the `Annotations`. This is GitHub Action's summary of how the workflow ran. However, the summary is often unlikely to give you enough information to troubleshoot a failed action. @@ -60,7 +60,7 @@ The `pull_request` trigger is helpful for development so that every time you pus The `workflow_dispatch` trigger is useful so you can re-trigger the workflow run whenever you need to test the next thing you tried for troubleshooting purposes. You might consider doing this if your testing as a pull request isn't appropriate. You can also initiate `workflow_dispatch` workflow runs from any branch -- so you don't have to merge it before you've polished it. -There's two caveats to this strategy: +There are two caveats to this strategy: 1. If you don't want these triggers long term, make sure you delete them before you merge to main. 2. Recall that if you are using default environment variables in your workflow runs that those change depending on the triggers, so those may not always be representative of the workflow run as you it will be run with the final version. @@ -77,7 +77,7 @@ run: | This may help you figure out whether a variable or file isn't in the place you think it is. -### Use Marketplace actions +### Use Marketplace Actions The great part about using GitHub Actions is that you can use other people's actions from the marketplace so you don't have to write everything from scratch! @@ -87,7 +87,7 @@ There are three tips for troubleshooting a problematic GitHub Action that is bor 1. Read their docs carefully and make sure you are using it as specified. 2. Try bumping up to a later version if it looks like there's a bug that may have been addressed. -3. Try not to use Marketplace actions that don't show evidence of being maintained or don't have fully fledged documentation! +3. Try not to use Marketplace Actions that don't show evidence of being maintained or don't have fully fledged documentation! ## Activity: Troubleshooting GitHub Actions diff --git a/docs/no_toc/404.html b/docs/no_toc/404.html index 8421b76..c7c9cc9 100644 --- a/docs/no_toc/404.html +++ b/docs/no_toc/404.html @@ -131,9 +131,9 @@
  • 3.1.1 Why reproducibility is so important.
  • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -249,7 +249,7 @@

      4.1 GHA structure

    • What should be run?
    • With what environment should the thing be run?
    • -

      These questions and other specifications are set by writing a YAML file. YAML files are human readable markup language files. Basically its a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we’re going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do.

      +

      These questions and other specifications are set by writing a YAML file. YAML files are human readable markup language files. Basically it’s a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we’re going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do.

      The headlines about working with YAML files:

        @@ -258,7 +258,7 @@

        4.1 GHA structure

      1. Spacing is VERY specific! – incorrect spacing will definitely result in errors for your GitHub Action run.

      Let’s take a look at an example YAML. -Note that the what comes before a : is generally a name and indent indicate subsets of a list. So in the overall list of food we have sublists of vegetables and fruits. # can be used as a comment and will not be treated as code.

      +Note that what comes before a : is generally a name and indentations indicate subsets of a list. So in the overall list of food we have sublists of vegetables and fruits. # can be used as a comment and will not be treated as code.

      Additionally, : are often names. So citrus is the name for the item oranges and etc.

      # A comment here which is ignored
       food:
      @@ -340,13 +340,13 @@ 

      4.1.3 runs-on: with what:

      What do we mean by a computing environment? As just like when you work on your personal computer, you install, update, and sometimes delete software in order to run different things, the GitHub Actions computers need to do the same in order to run your code. Although some person from Microsoft isn’t setting up a new physical computer and manually installing software, the specs you give underneath runs-on: tell GitHub Actions what kind of set up to use.

      So for example, there are built in operating systems like windows-latest, mac-latest, and ubuntu-latest. You can see more about the default GitHub runners here.

      -

      But just like a Windows machine straight out of a box is unlikely to have everything you need to run some code, you may need a more specific computing environment. You can also create custom environments using containerization.

      +

      Similarly to how a new Windows computer may not come equipped with all the software you need to execute certain code, you might require a more tailored computing environment. You can also create custom environments using containerization.

      4.1.4 Containerization

      A “virtual machine” is basically when your computer creates its own fake computer inside of it. It’s acting like a different computer but it doesn’t have any additional physical parts.

      -

      Containers aren’t virtual machines, but they do a similar thing, which is they spin up a computing environment where you can do things. They are called containers because they are isolated from the rest of your computer.

      +

      Containers, while not virtual machines, serve a similar purpose by creating an isolated computing environment where tasks can be performed. They are named “containers” due to their separation from the rest of your computer.

      Containerization is useful because it allows us to share our computing environments with others. This is useful because it can be a powerful tool for reproducing analyses if we are controlling our computing environments.

      The software you use, and the versions of the software you use can affect the results from an analysis (Beaulieu-Jones and Greene 2017).

      @@ -363,7 +363,7 @@

      4.1.4 Containerization

      Docker is a whole other world. There’s whole conferences, hackathons, and etc devoted to Docker and other containerization software. It can be a lot to learn. To start, we recommend borrowing other people’s Docker images as much as possible instead of trying to build your own And then install the few packages you need. (more on this in a future chapter)

      -

      Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data on your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed!

      +

      Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data in your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed!

      4.1.4.1 More resources about Docker

      diff --git a/docs/no_toc/index.html b/docs/no_toc/index.html index 6aeca43..5a35333 100644 --- a/docs/no_toc/index.html +++ b/docs/no_toc/index.html @@ -131,9 +131,9 @@
    • 3.1.1 Why reproducibility is so important.
    • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -239,7 +239,7 @@

      About this Course

      diff --git a/docs/no_toc/index.md b/docs/no_toc/index.md index b349361..a2c2414 100644 --- a/docs/no_toc/index.md +++ b/docs/no_toc/index.md @@ -1,6 +1,6 @@ --- title: "GitHub Automation for Scientists" -date: "February, 2024" +date: "March, 2024" site: bookdown::bookdown_site documentclass: book bibliography: [book.bib, packages.bib] diff --git a/docs/no_toc/introduction.html b/docs/no_toc/introduction.html index 8623704..d0545d6 100644 --- a/docs/no_toc/introduction.html +++ b/docs/no_toc/introduction.html @@ -131,9 +131,9 @@
    • 3.1.1 Why reproducibility is so important.
    • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -248,7 +248,7 @@

      1.1 Target Audience

    • Are comfortable with GitHub and know how to make a pull request
    • Wish to save time and enhance their scientific projects using automation
    • -
    • Perhaps previously tried to learn about GitHub Actions but felt overwhelmed on how to get started
    • +
    • Have perhaps tried to learn about GitHub Actions before but felt overwhelmed about how to start

    @@ -283,7 +283,7 @@

    1.4 Curriculum

    1.5 How to use the course

    -

    Ideally you should follow along with the chapters and perform they activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub.

    +

    Ideally you should follow along with the chapters and perform the activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub.

    diff --git a/docs/no_toc/reference-keys.txt b/docs/no_toc/reference-keys.txt index 9b6abe0..1bcd357 100644 --- a/docs/no_toc/reference-keys.txt +++ b/docs/no_toc/reference-keys.txt @@ -16,8 +16,8 @@ why-automation automation-as-an-aid-for-reproducibility why-reproducibility-is-so-important. automation-as-a-reproducibility-tool -continuous-integration-continous-deployment -continous-integration-continuous-deployment +continuous-integration-continuous-deployment +continuous-integration-continuous-deployment-1 a-real-world-example other-cicd-services github-actions-fundamentals diff --git a/docs/no_toc/references.html b/docs/no_toc/references.html index a5c7e19..069574b 100644 --- a/docs/no_toc/references.html +++ b/docs/no_toc/references.html @@ -131,9 +131,9 @@
  • 3.1.1 Why reproducibility is so important.
  • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      diff --git a/docs/no_toc/scientific-software-development-best-practices.html b/docs/no_toc/scientific-software-development-best-practices.html index b0f53ab..82dd1b8 100644 --- a/docs/no_toc/scientific-software-development-best-practices.html +++ b/docs/no_toc/scientific-software-development-best-practices.html @@ -131,9 +131,9 @@
    • 3.1.1 Why reproducibility is so important.
    • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -282,7 +282,7 @@

      2.4 Examples

      2.4.1 t-test

      Imagine that you have two sampling distributions (lists/arrays of numbers) and you want to test whether the means of the distributions are statistically equivalent or not. This is the setup for a t-test. -t-tests are implemented in standard functions in R (base library) and Python (scipy library), as well as most other commonly used programming languages.

      +t-tests are implemented in standard functions in base R and Python scipy library, as well as most other commonly used programming languages.

      In both R and Python, a t-test is a very well-defined, specific function that takes two lists of numbers and returns the t-statistic and p-value. Since this is a part of a standard, widely used library in each language, it is already tested as part of those libraries. diff --git a/docs/no_toc/search_index.json b/docs/no_toc/search_index.json index 63dfd4d..9bf4ad1 100644 --- a/docs/no_toc/search_index.json +++ b/docs/no_toc/search_index.json @@ -1 +1 @@ -[["index.html", "GitHub Automation for Scientists About this Course", " GitHub Automation for Scientists February, 2024 About this Course This course is part of a series of courses for the Informatics Technology for Cancer Research (ITCR) called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. This initiative is funded by the following grant: National Cancer Institute (NCI) UE5 CA254170. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at www.itcrtraining.org for more information. "],["introduction.html", "Chapter 1 Introduction 1.1 Target Audience 1.2 Topics covered 1.3 Motivation 1.4 Curriculum 1.5 How to use the course", " Chapter 1 Introduction 1.1 Target Audience The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research This course is written for individuals who: Are comfortable with GitHub and know how to make a pull request Wish to save time and enhance their scientific projects using automation Perhaps previously tried to learn about GitHub Actions but felt overwhelmed on how to get started 1.2 Topics covered This course covers how to use GitHub actions for scientific software development. We encourage the recognition that scientific software can take many forms that can all benefit from the concepts of continuous integration and continuous deployment. This course builds on concepts introduced in the Reproducibility and Advanced Reproducibility courses from the ITCR Training Network. If you are unfamiliar with GitHub and/or do not have an account, we’d suggest you start with those courses by using the links or QR codes below. 1.3 Motivation Cancer datasets are plentiful, complicated, and hold untold amounts of information regarding cancer biology. Cancer researchers are working to apply their expertise to the analysis of these vast amounts of data but training opportunities to properly equip them in these efforts can be sparse. This includes training in reproducible data analysis methods. Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (Beaulieu-Jones and Greene 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively. Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. One tool among many for creating reproducible analyses is utilizing automation. Many individuals performing analyses on cancer data may not have formal training in software development and may be unfamiliar with the ideas of continuous integration and continuous deployment. By recognizing that biological data analysis code is a form of software development, we can try to adapt good development practices in scientific analyses and software contexts. Scientific software projects may include (but aren’t limited to): Software that built as tools to be utilized by others to analyze biologically derived data. Code that is built primarily for analyzing one project’s data. Code that is built as a workflow for a series of steps and analyses that might be reused among collaborators or within a lab. Any scripts and code that are built to handle data in a research setting. Any scripts and code a researcher might interact with. 1.4 Curriculum The course includes hands-on exercises for how to understand, build, and troubleshoot GitHub Actions as a continuous integration/continuous deployment tool for scientific software projects. Goal of this course: Equip learners with basics skills and confidence to utilize the concepts of continuous integration in the context of scientific software. What is not the goal This course is meant to teach learners how to create sophisticated GitHub Actions, but instead introduce learners to basic fundamentals of continuous integration and continuous deployment. This course focuses on GitHub Actions and will not cover any other (perfectly fine) tools for CI/CD. 1.5 How to use the course Ideally you should follow along with the chapters and perform they activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub. References "],["scientific-software-development-best-practices.html", "Chapter 2 Scientific software development best practices 2.1 Learning Objectives 2.2 Science and software as iterative processes 2.3 Software complexity as a spectrum 2.4 Examples 2.5 Automation for scientific software", " Chapter 2 Scientific software development best practices 2.1 Learning Objectives 2.2 Science and software as iterative processes Scientific papers are often arranged as a list of methods and results, building on themselves more or less sequentially. Each figure follows from the previous figure or text description, to describe the data that support a hypothesis or illustrate a conclusion in a linear, “story”-like order. However, the modern process of doing science, itself, is rarely linear. It is not realistic to do an experiment, and write a manuscript, and publish the paper, in that order and with no other complications – usually, there is some amount of iteration involved on one or more of these steps: You might do an experiment, then summarize it, then run more experiments based on the results to confirm/test/extend your findings You might do an experiment, write a manuscript, then revise the manuscript based on feedback from other scientists You might submit a manuscript, then a reviewer may request revisions or additional experiments, which will require you to go back and revisit your experimental setup and conclusions As scientists, we don’t generally expect science to be a static, “write once and forget” process. The same idea applies to developing research software! Rarely, you might be able to write a script or program for a scientific study and use it once, for a single well-defined purpose. But more often, you’ll write a script (or join several of them together in a more complex pipeline) and reuse it, possibly with changes or extensions as the project progresses. In this course, through the lens of automation, we hope to familiarize you with some of the skills necessary to think about research software in an iterative way, from the beginning of a research project. Although software development is not generally rewarded directly in academia, it turns out that writing good software does have less obvious rewards, even within the traditional academic structure. For example, software that is easy to install tends to be cited more often (Mangul et al. 2019), and software that is more consistently maintained tends to be more accurate (Gardner et al. 2022). 2.3 Software complexity as a spectrum Not all software is complex, and not all software requires complex infrastructure (or automation, for that matter)! It can be useful to think about the complexity of software engineering infrastructure necessary for a project proportionally to the complexity of the software itself: Simple software (math, data transformations, procedural/rule-based scripts) requires simpler infrastructure. Complex software (e.g. “pipelines” composed of many commands/software packages chained together, “libraries” that are intended to be reused in many different applications) requires more complex infrastructure, to check assumptions and test reproducibility at each step. We will take a closer look at two concrete examples, one on each end of the software complexity spectrum, in the next section. 2.4 Examples 2.4.1 t-test Imagine that you have two sampling distributions (lists/arrays of numbers) and you want to test whether the means of the distributions are statistically equivalent or not. This is the setup for a t-test. t-tests are implemented in standard functions in R (base library) and Python (scipy library), as well as most other commonly used programming languages. In both R and Python, a t-test is a very well-defined, specific function that takes two lists of numbers and returns the t-statistic and p-value. Since this is a part of a standard, widely used library in each language, it is already tested as part of those libraries. In your own software, you might need to do some verification of your input (for instance, what happens if you pass an empty list of numbers?) but probably not too much, since you can be fairly confident that the t-test function does what it is documented to do in the programming language you choose to use. 2.4.2 Sequencing analysis Imagine that you have a list of reads from a DNA sequencing machine, and you want to use these data to answer a biological question, or to make a plot/visualization to communicate a biological insight. This is a much less well-defined problem than our previous example, with many more independently operating components, and many more subjective decisions that a researcher must make along the way. Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals. Most sequencing analyses require multiple steps (i.e. different programs or scripts), and generate multiple intermediate files (e.g. read counts, normalized counts, quality information) that can be checked to verify that the pipeline is proceeding as expected. Sequencing analyses can also take hours or days to run, as compared to the t-test example which runs effectively instantaneously. This means that: The set of steps that need to take place is more complex than our previous example, and each step in the analysis likely builds from previous steps. Finding errors early in the process can save a lot of time and effort in later steps. A longer or more complex set of steps often means there are more ambiguous/“gray area” decisions that need to be made along the way. This usually means more iterations or experiments, to explore what works and what doesn’t. Introducing reproducible software practices from the ground up will help to make this exploratory process easier and clearer. 2.5 Automation for scientific software Good software practices do not necessarily have to rely on automation. However, complex projects can be unwieldy to check and revise in the absence of some sort of automated process to kick them off automatically, without too much human intervention. Steps that are involved might include: Rerunning the software itself (often on new or modified input data) Software testing Code style linting Rebuilding figures or processed datasets And many more! Each of these steps could be individually run by hand. Alternatively, they could be combined in a central script that runs all the steps in order or in parallel, which can also be triggered manually. Such a central script can itself be considered a form of automation. Automation like that of GitHub Actions, in contrast, can provide a “single point of truth”: a single central script to run these steps, and a single set of (automated) criteria for when to run them. This eliminates the need for you to remember to run tests, to clean up your code, to rebuild figures, or to kick off similar standard processes or commands on your own. Later in the course, we will talk more specifically about what exactly automation via continuous integration looks like, and go into more depth as to its uses and benefits. References "],["why-automation.html", "Chapter 3 Why Automation 3.1 Automation as an aid for reproducibility 3.2 Continuous integration / Continous deployment", " Chapter 3 Why Automation 3.1 Automation as an aid for reproducibility All of science is built on results being reliable and continually working toward identifying more true/less wrong explanations about the world. The process of science is first repeatability – can the same researcher with the same data get the same results? Undoubtedly in the early stages of an analysis, sometimes the results and output can be in flux. But as the analysis gets further polished and decisions are made, it should be that the same results can be obtained no matter how many times an analysis is run or re-run by the same researcher. This brings us to the critical but previously historically overlooked part of the pyramid known as reproducibility. Reproducibility is what happens when another researcher can take the same data as the research #1 and get the same results. This is more difficult than it sounds at face value because data analysis requires so many decisions and variables. The order and ease of which something is re-run and the computing environment used to run the analysis are two such factors. Keep in mind that consistent results (like those seen with reproducible work) are not automatically true but inconsistent results (like those seen with irreproducible work) cannot be true. In other words, correctness is not the same as reproducibility but reproducibility is a necessary aspect of correctness. Reproducibility is the overlooked but critical step that allows replicability to happen. Replicability is when new data is collected that extends the findings of the first study. With this new data, hopefully the same type of analysis can be done that helps the field learn even more about the concepts that were learned in the first study. 3.1.1 Why reproducibility is so important. Reproducibility is not only important because all of science is built upon it but it also saves everyone time! We can often underestimate the extent to which our work, code, and data are being used and reused by others in the scientific community. The extent to which our work is reproducible then, not only affect us and our immediate collaborates but could aid or hinder other researcher’s work in an exponential scale. In other words, if 10 researchers reuse your work and all 10 of them spent 100 hours trying to get it to work without success, that’s a lot of time to waste (10,000 hours)! But conversely, if your work was made with reproducibility aiding tools and skillsets (like automation that we are discussing in this course) then you could save other researchers loads of time! Let’s say instead 9 out of 10 of the researchers who try to reproduce your work (as opposed to running their own analysis from scratch) are able to do so in that time allotment, that saves them an insane amount of time and stress! 3.1.2 Automation as a reproducibility tool Automation is just one of many tools and skillsets that can aid the reproducibility of your work! Returning to those 10 researchers, if instead of having those 10 people manually try to reproduce our work every time we change it, what if we had robots do that work instead? That would not only help us re-run our results more quickly (because researchers are often busy) but also robots are much better at repetitive work. In other words, your human collaborator is great at many things but even your most reliable collaborator will not be as punctual as a robot who is programmed to do the job. 3.2 Continuous integration / Continous deployment Before we discuss the concept of Continous integration / Continuous deployment (often abbreviated CI/CD), let’s use an analogy. Obviously we are getting at here, that generally its a good idea to check work along the way instead of waiting until something is completely finished to test it. Software is no exception to this idea. Often if we send a collaborator an enormous amount of code to review; they are likely to feel overwhelmed and may not be able to give useful feedback. But if you send a manageable, small chunk of code to review, they are likely to give more feedback. Continuous Integration / Continuous Deployment then is a manner of working that means we will have changes checked as they are being integrated and before the changes are deployed. This allows for continuous monitoring of the project and hopefully early catching of bugs! Bugs/mistakes are an unavoidable part of software development because software developers and researchers are generally humans and humans make mistakes! Let’s assume over the course of developing a project, bugs are introduced at a certain rate. Without using CI/CD you may find yourself trying to fix many bugs at once! This will make the bugs harder to isolate and harder to fix and pinpoint. The amount of time it will take to fix 3 bugs at once may be exponentially higher than if you caught these bugs one at a time. Additionally, the longer amount of time that goes on before you catch a bug, it may be more likely it will get accidentally incorporated into your published results – this will be a lot more work for you and others to rectify. However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run a muck! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress! This is not only true for classic “my script won’t run” bugs but also “silent” bugs – bugs where the analysis still ran to completion but perhaps the results were slightly different. 3.2.1 Continous Integration / Continuous Deployment A workflow that uses CI/CD principles may look like this: The idea is we use version control and build aspects of our software. Before what we’ve built is incorporated into the published version, we will stage it and test it. By staging we mean that perhaps we keep it stored on a different branch and have ways that we can play around with the beta version of the analysis or product before our most recent adds are incorporated. The above diagram is in reference to more traditional software products but CI/CD can also be thought of in the context of an scientific data analysis: In the case of a scientific analysis, we may modify or add to the analysis, but we’ll want to test that these changes work – aka we may want to re-run the analysis before we merge it into the main branch or public facing version. In this instance we may think of the final product as being a published manuscript as opposed to deployed website or app. But the same principles here apply. We’ll want to re-run the analysis and build tests that check if the results make sense. 3.2.2 A real world example Let’s bring this into the terms of a very common story for science. Let’s say you are a researcher who submitted a manuscript and a reviewer comes back and asks you to re run the analysis with a minor tweak; perhaps a parameter change. If you developed your analysis without using reproducibility aiding practices and without automation, it is very likely that this seemingly simple task could take a lot of your time and brain power. Because while you don’t think anything on your computer changed since you ran this analysis 6 months ago, your computing environment and the software it uses has been changing the entire time! This kind of simple “this should be easy” situation can easily devolve into a huge rabbit hole – when you thought this analysis was basically wrapped up. But, if you had been using the principles of CI/CD and reproducibility you may have a better chance that your analysis should still run reliably. And if it doesn’t re-run reliably you will have more runs and set up that you’ll be able to pull from to pinpoint where the bug is in your analysis re-run that is keeping it from running. By having automation keep tabs on your development, you will be less likely to be blind sided by bugs in a situation where you need to re-run your analysis (or adapt it for a new analysis!) 3.2.3 Other CI/CD services In this course we are focusing on using GitHub Actions for CI/CD. However, at this point we should mention that GitHub Actions is just one of many options for this. Circle CI, Appveyor, and Travis CI are all also perfectly fine options to use. But if you are using GitHub already, GitHub Actions may be the easiest to start out with. However, if at a later point in your automation development journey you find that GitHub Actions may not have a feature you need, we encourage you to explore these other options and use what works best for you. These other CI/CD options definitely have some commonalities with GitHub actions so learning GitHub Actions will still give you a good start in understanding how these services work. "],["github-actions-fundamentals.html", "Chapter 4 GitHub Actions Fundamentals 4.1 GHA structure 4.2 Exercise 1 - Running your first GitHub Action", " Chapter 4 GitHub Actions Fundamentals 4.1 GHA structure All GitHub Actions involve answering three questions: When should a thing run? What should be run? With what environment should the thing be run? These questions and other specifications are set by writing a YAML file. YAML files are human readable markup language files. Basically its a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we’re going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do. The headlines about working with YAML files: Everything is a list (kind of like a JSON file). Indentations = subsets of a list Spacing is VERY specific! – incorrect spacing will definitely result in errors for your GitHub Action run. Let’s take a look at an example YAML. Note that the what comes before a : is generally a name and indent indicate subsets of a list. So in the overall list of food we have sublists of vegetables and fruits. # can be used as a comment and will not be treated as code. Additionally, : are often names. So citrus is the name for the item oranges and etc. # A comment here which is ignored food: - vegetables: tomatoes - fruits: citrus: oranges tropical: bananas Two items that every GitHub Action YAML must contain is on: and jobs:. on: tells GitHub when something should be run. For example “whenever a pull request is opened”. jobs: tells GitHub what should be run. For example “run this bash script”. runs-on: tells GitHub with what environment should this be run. For example “windows-latest”. 4.1.1 on: When a thing should be run If you are to automate something, step one is to figure out when do you want the thing to happen. What should trigger your action? For that we use on: in a GitHub Action. There’s lots of possible answers for when something should be run. The triggers can be a lot of different events on GitHub: pull requests, issues, comments, times of day, etc. 4.1.2 jobs: What should be run Perhaps even more important, what is the job that this automated task needs to do? Description Trigger term When you click a button workflow_dispatch: When its a certain time of day schedule: When a pull request is opened or has a new commit pull_request: When a branch is merged push: When something happens with an issue issue: When a different github action runs workflow_call: When someone comments on a pull request pull_request_review_comment: Scenario: You are running an analysis using public data that continually has more samples added - You would like the analysis to rerun when new samples are added - You would like to be informed of when the analysis got rerun and what the results were on Slack That’s totally a thing a GitHub action can do! We will walk through some examples like it! And here’s the good news, you don’t have to write things from scratch or know ALL the languages. GitHub marketplace allows you to use really cool actions that other people have created. More on this later. 4.1.3 runs-on: with what: The runs-on: tag specifies with what environment the job is going to be run. What does this mean? Well let’s start by discussing that the term “cloud” computing is a tad misleading. When we send a job to an online service like GitHub Actions, its not a mysterious vague mass. Instead, its being sent to a real computer somewhere and that computer is setting up a computing environment to run your job and sends back the results to you through the GitHub website. What do we mean by a computing environment? As just like when you work on your personal computer, you install, update, and sometimes delete software in order to run different things, the GitHub Actions computers need to do the same in order to run your code. Although some person from Microsoft isn’t setting up a new physical computer and manually installing software, the specs you give underneath runs-on: tell GitHub Actions what kind of set up to use. So for example, there are built in operating systems like windows-latest, mac-latest, and ubuntu-latest. You can see more about the default GitHub runners here. But just like a Windows machine straight out of a box is unlikely to have everything you need to run some code, you may need a more specific computing environment. You can also create custom environments using containerization. 4.1.4 Containerization A “virtual machine” is basically when your computer creates its own fake computer inside of it. It’s acting like a different computer but it doesn’t have any additional physical parts. Containers aren’t virtual machines, but they do a similar thing, which is they spin up a computing environment where you can do things. They are called containers because they are isolated from the rest of your computer. Containerization is useful because it allows us to share our computing environments with others. This is useful because it can be a powerful tool for reproducing analyses if we are controlling our computing environments. The software you use, and the versions of the software you use can affect the results from an analysis (Beaulieu-Jones and Greene 2017). Real data and experiments have shown this! Below is a figure from Beaulieu-Jones and Casey S. Greene, 2017 that shows how a microarray data analysis had different results depending on the software versions used (Beaulieu-Jones and Greene 2017). And as time goes on, your computing environment changes; potentially in ways you don’t realize! Most languages and programs allow you to print out the specifications of your computing environment. See below a “session info” print out from R. What this shows is two different computing environments. Side by side we can see how they differ. There’s various containerization software programs, that will allow you to share your computing environments but a very popular one is Docker. We can picture how this makes analyses more reproducible. Docker and other containerization software work by allowing you to take a snapshot of your environment, called an image. This image can be shared and others can use this image to build the container from which they can run the analysis or whatever it is they plan to do. Docker is a whole other world. There’s whole conferences, hackathons, and etc devoted to Docker and other containerization software. It can be a lot to learn. To start, we recommend borrowing other people’s Docker images as much as possible instead of trying to build your own And then install the few packages you need. (more on this in a future chapter) Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data on your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed! 4.1.4.1 More resources about Docker Launching a Docker image Modifying a Docker image Docker for data scientists 4.1.5 Summarizing GitHub actions are specified by YAML files in .github/workflows/ folder on a GitHub repository. The specs from this YAML are used to run a job when an on trigger specifies it should be run. The runs-on spec tells the server what kind of environment it should be run with. Containers like those made with Docker can help you make custom computing environments. 4.2 Exercise 1 - Running your first GitHub Action Let’s apply what we’ve learned about GitHub Actions by running one! If you don’t have a standard workflow for how you use GitHub locally, or are unhappy with your current methods for this activity we recommend installing GitHub Desktop First we need to create a copy of the exercise GitHub repository we will use for this course. Go to https://github.com/fhdsl/github-action-workshop and click on the Use this template button. Fill out the form on this page about where you want this repository to be and what description you want it to have. And click Create Repository. Clone this repository to your local computer. In GitHub Desktop you can do this by clicking the Clone Repository button. But from command line you can use this kind of command: git clone https://github.com/<your-username>/github-actions-workshop Create a new branch by clicking the buttons as shown here or using the command line examples below cd github-actions-workshop git checkout -b "first-gha" Create the specific GitHub Actions folders. Recall that in order to run a GitHub action, GitHub will look for YAML files in a specific location. We will need to create these folders to get going. Use your operating system to create a .github folder and then inside that folder, a workflows folder. Don’t forget the s in workflows or the . in .github – these folder names have to be exactly written this way for your GitHub Action to be found and recognized by GitHub. You can use this command to do this: mkdir -p .github/workflows Now you will want to move the 00-my-first-action.yml file into the .github/workflows folder. In command line you can do this by using this command: mv activity-1-sample-github-actions/00-my-first-action.yml .github/workflows/00-my-first-action.yml Add and commit these changes to your branch. Then you will want to push your branch to the online GitHub repository. In command line this can be done like this: git add .github/* git commit -m "adding first gha" git push --set-upstream origin first-gha Open a pull request. In GitHub Desktop you can click this button: Or just navigate to your GitHub repository online and open a pull request through the website. Check your pull request to make sure the changes are what you expect. Then merge it! After merging, go to the Actions tab on GitHub You’ll be come very acquainted with this page if you use GitHub actions. On the left shows the workflows that are available or have been run before. We should see our new GitHub Action we just merged from our pull request here called “Basic GitHub Action”. Click on that. Underneath this we should now see a blue banner that allows us to click “Run workflow”. Click Run workflow and then Run workflow again. Because we made our on: trigger workflow_dispatch this means we have to tell the GitHub Action when to run (which means its not really automated in this case). Yay! 4.2.1 Checking results of a GitHub Action Go to the Action tab. You’ll see your newest run of your GitHub Action is logged here. All future GitHub Action runs will have their logs here. Click on the workflow run log so we can look into it. To see more run details we’ll click on the job name which in this case is hello. Click on the dropdown arrows to see even more details on each step. 4.2.2 Breaking down the YAML We can break down how what we wrote in the YAML lead to what is shown in this run’s log. We named this Action Basic GitHub Action and the log was named that. The only job being ran was named hello so in the log it shows up this way underneath the Jobs header. If we had more than one job, those jobs’ names would show up here too. Each job can contain as many steps as we want. Our one step is named Hello World. This step involved running some code using the run: tag which by default uses bash. The bash code just echoed “hello world”. Congrats! You’ve ran your first GitHub Action! In the next chapter we’ll run something a little more automated and a little more fun to build on what we’ve learned here. References "],["automating-re-running-analyses.html", "Chapter 5 Automating Re-running Analyses 5.1 Exercise 2 - Re-run analysis example 5.2 Diving into the details", " Chapter 5 Automating Re-running Analyses In the beginning of this course we discussed the benefits of using continuous integration/continuous deployment principles for scientific code including analyses. In this chapter we will go through example code that shows how this can be set up. We highly encourage you to take this code and adapt it to your own project’s needs. 5.1 Exercise 2 - Re-run analysis example For this exercise, we are going to continue to use the example repository that we set up in the previous chapter. Create a new branch to work from. As is good practice for adapting a GitHub workflow, we will create a new branch for us to work from. In GitHub Desktop you can click the branch button and follow the same steps we did in the previous exercise. From command line: `git checkout -b "more-ghas"` For this exercise we are going to copy over a second GitHub Action YAML file from the folder. This time, move the 01-re-run-analysis.yml file to your .github/workflows directories you made in the previous chapter. From command line: mv activity-1-sample-github-actions/01-re-run-analysis.yml .github/workflows/01-re-run-analysis.yml Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "adding more ghas" git push --set-upstream origin more-ghas Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). After you open your pull request, scroll down to the bottom of the page. If all went as expected, you should see a status message that shows a GitHub Action is running after opening your pull request. Think about it. Without looking at the YAML file… What do you suppose the on: value (the when) might be for these actions? Take a look at the file, .github/workflows/01-re-run-analysis.yml, to see if you are right! On your pull request page on GitHub, click on the Details button next to your workflow run. You can navigate to this same page by going to the Actions tab, then Scrolling down to see the most recent workflow run which should be named Re-run analysis and clicking on that. 5.2 Diving into the details Let’s break down what is in this GitHub Action YAML file and what this workflow run did. 5.2.1 name and on At the top of the file we have: name: Re-run analysis. This is what our workflow run shows up in the Actions tab log as and helps us differentiate it from other GitHub Action Workflows. Below that, there is the on: trigger. This workflow of re-running this analysis will only run when a pull request is open or pushed to. And further we’ve specified with branches: it will only run if the pull request is targeted to branches named main or staging. # Run this workflow when a pull request is opened or pushed to. on: pull_request: branches: [ main, staging ] 5.2.2 jobs In our jobs: we’ve named this job R run analysis. Additionally we are running this on a ubuntu-latest operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn’t need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, python, and some specific packages. We could, attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made docker image that has R, python, and other packages we need already installed. This custom made docker image is pulled from Dockerhub and it exists here. If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others managed and version controlled here on this GitHub repository. You may note we use GitHub Actions on this repository to help us manage these Docker images. jobs: re-run: name: Re run analysis runs-on: ubuntu-latest # This image has python, R and other things we need to run our mock analysis container: image: jhudsl/ottr_python:main 5.2.2.1 actions/checkout One of the most frequently use GitHub Actions from the GitHub Action Marketplace is actions/checkout. This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them). steps: # Need to get the files specific to our branch from our pull request - name: Checkout files uses: actions/checkout@v3 with: fetch-depth: 0 By default, it will checkout the files from the repository where this action is being run, but we could specify other repository and other branches. fetch-depth: 0 means we will grab all the file. 5.2.2.2 sh run_analysis.sh Now the main objective we were building to. We are going to run a script that re-runs our entire analysis. We’ve named this file run_analysis.sh to be clear about what it does. We’re giving this step an id of running (this will become clear in the next paragraph). Additionally the | tells run: to expect multiple lines of a command. We didn’t need this to be a multiple line command, but we thought it would be good to show you this. # We can call our main script then to re-run it to make sure it works - name: Run it id: running run: | sh run_analysis.sh We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the run_analysis.sh file, you will see its basically simple workflow step calling file. It looks like this: # This is a mock script that shows how you could have your whole analysis ran by one script call. ## Usage: To re-run this whole analysis, go to bash and # These specs will make sure that if one script fails this will fail the script set -e ## Run the first step python3 "01-python_test.py" ## Run the second step Rscript "02-r_test.R" ## Run a third step Rscript -e "rmarkdown::render('03-make-a-plot.Rmd')" The set -e is actually critical here. We need to make sure that this script will stop if it encounters an error. That is the main point of our GitHub Action here, is we want to know if something failed. (We also want to know if the results remained the same, but that will require a bit more engineering than we are showing in this simple example). A very tricky thing about GitHub Actions (and languages called by them) is that GitHub workflows do not always stop when there are errors as we would define them. When designing a new action, we need to carefully evaluate the steps of the job in the logs to make sure what we think happened and completed actually did complete successfully. Returning to our GitHub Action YAML file, we can see that the last step of this job has an if statement. What we are doing here is asking GitHub to evaluate whether the step running (remember the id we set?) had success as its outcome. # We can have this double check that the last step was successfully run - name: Check on re-run outcome if: steps.running.outcome != 'success' run: | echo Re-running status ${{steps.running.outcome}} exit 1 This steps.running.outcome is representative of a whole new world of GitHub Actions Environmental variables that we have not discussed yet but we will now! 5.2.3 Summary "],["github-action-variables.html", "Chapter 6 GitHub Action Variables 6.1 Exercise 3 - Exploring Variables", " Chapter 6 GitHub Action Variables The GitHub Actions environments have variables that are already set by default in the environment but you can also set environment variables yourself. 6.0.1 Types of variables There are two types of variables in GitHub Actions. Default - Ones GitHub already sets for you. User set - Ones you set yourself. To print things out, you can use this kind of notation in bash or other contexts in the yaml file. echo ${{ github.repository }} In this next exercise we’ll explore different ways to use variables. 6.1 Exercise 3 - Exploring Variables For this exercise, we are going to continue to use the example repository that we set up in the previous chapter. Create a new branch to work from. From command line: `git checkout -b "env-var"` For this exercise we are going to copy over another GHA yaml to explore. This time, move the exploring-var-and-secrets.yml file to your .github/workflows directories you made in the previous chapter. From command line: mv activity-1-sample-github-actions/exploring-var-and-secrets.yml .github/workflows/exploring-var-and-secrets.yml Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "exploring gha variables" git push --set-upstream origin env-var Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). On your pull request page on GitHub, click on the Details button next to your workflow run. Keep this handy because we will dive into the details of what we just ran. 6.1.1 Default variables You can read the latest documentation about GitHub Action default variables here. But here’s some highlights. name example output explanation GITHUB_REPOSITORY username/repository_name This prints out what repo this is run from GITHUB_REF refs/pull/1/merge The branch or tag that triggered this workflow. But note that this will be blank if the trigger is not based or related to branches or tags. For example a workflow_dispatch wouldn’t have this GITHUB_ACTOR cansavvy The GitHub handle of the person who caused this workflow to run Below shows an example of the log of the where we printed out these default GitHub variables. 6.1.2 User set variables 6.1.2.1 env: There are different ways to set variables. The simplest way to set variables is within a step you can set them using env:. Underneath env: you write the name of the variable on one side of the colon and then the definition on the other side. For example, in our yaml file we had: - name: Hello, but make it personal run: echo "Hello $First_Name." env: First_Name: Candace This set up printed out Hello Candace in the logs as our output. This might be useful, but if we want an environmental variable to be stored and retrieved between steps we’ll need to use something different. 6.1.2.2 Setting output variables If we’d like one step to be able to retrieve information from another step we’ll need to send a variable to the GITHUB_OUTPUT. To do this we can use this sort of set up: Step that sets a variable depending on some output # How to export a variable to a next step - name: Setting output to the environment at large id: step_name run: echo "results=5" >> $GITHUB_OUTPUT Here we are naming the variable results and the notation >> $GITHUB_OUTPUT is always there. In this example, results is only set equal to 5 but you could see how this might be made to be more complicated. Like perhaps the results are a bash command output like: "time=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT This would allow us to have a time stamp of when this step was run. Or perhaps we are running a script that outputs a result: results=$(Rscript utils/script.R) 6.1.2.3 Using output variables To use this output variable in a subsequent step we have to use this kind of setup: steps.step_name.outputs.results where step_name is the id: we set for the step that set this variable (see above) and results is the name of the variable we set. And, as is typical we need the ${{ }} notation. # How to print out the variable we just saved - name: Print out that variable in a later step run: echo ${{ steps.step_name.outputs.results > 3 }} This is nifty because now we can use the result of one step to determine whether or not we run a subsequent step. GitHub Action steps can have conditional or if statements. Maybe we only want a step to run if the result is something specific: - name: Conditional step # Here we are only going to do this step if the results from the previous step are bigger than 3 if: ${{ steps.step_name.outputs.results > 3 }} run: echo 'the results are greater than 3!' Or, maybe we want to make sure the whole workflow shuts down if a variable is something in particular like this example below. - name: Shut it down # Here we are only going to do this step if the results from the previous step are bigger than 3 if: ${{ steps.step_name.outputs.results =< 3 }} run: | echo 'the results are less than or equal to 3! -- going to exit!' exit 1 6.1.2.4 Setting and grabbing secrets What if the string or variable we need is not something we can supply in the YAML itself? Perhaps we have credentials or something that cannot be shared publicly but that we need it to complete our steps. That’s where GitHub secrets come in handy! Read more about GitHub secrets here.. One very common type of GitHub secret you may need to add is a GitHub Personal Access Token (sometimes abbreviated PAT). A personal access token is a string set that, when provided, gives access to a user’s GitHub account. Read more about tokens here. For GitHub actions that are doing things that require authorization or particular permissions levels, you will need to provide your GitHub action with your personal access token (PAT) that you store as a GitHub secret. 6.1.3 Activity: Setting GitHub secrets Let’s practice this by setting a GitHub Access token as a secret! 6.1.3.1 Make a Personal Access token You can store any alphanumeric string as your GitHub secret. It may be an API key or authorization keys from some other software program. But for this example, we will use an authorization key for GitHub. Recall we have may have to give authorization to a GitHub action some times, because we are not actually running this with our user account, this job is being sent to GitHub for them to run on their servers somewhere. First make your own personal access token by going here: https://github.com/settings/tokens You can find this page by going to your own profile, and then to Settings and Developer settings. The GitHub Documentation for how to make PATs is here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens But we’ll walk through it together now. Underneath Tokens (classic) click Generate new token and pick Generate new token (classic). You will likely have to enter your password at this point. Underneath Note write something that will remind you about where you are using this PAT. Check the repo workflow. (Depending on what you are trying to do you may have to check other boxes but for a lot of the permissions you’ll need repo will do). Scroll to the bottom of the page and click Generate. Your token will be shown on the next page. You’ll keep this handy because they won’t show it to you again. Be careful not to share this any place publicly because it will give someone authorization to you GitHub account! 6.1.3.2 Creating a GitHub Secret Return to your repository that we were using for these activities. Settings > Secrets and variables > Actions > New Repository Secret. Name your secret something. In this example, let’s call it GH_PAT. You’ll want to name your secret something that relates to what it is. Now copy and paste in the secret section. 6.1.3.2.1 Referencing a GitHub secret in a GitHub action To retrieve a GitHub secret amidst a GitHub Action workflow run, you do this sort of notation: ${{ secrets.SECRET_NAME}} Where SECRET_NAME directly is the name you used for your GitHub secret. # Here's how we'd reference a secret - name: How do we reference a GITHUB secret? run: ${ secrets.SECRET_NAME } In the previous step we named our secret GH_PAT so if we needed to use it in our workflow we would use ${ secrets.GH_PAT }. Perhaps at this point you are worried that your logs may accidentally display your GitHub secret if you did something like: run: echo ${ secrets.GH_PAT } But, you don’t have to worry about that part, in your logs your secrets will show as *** and will not be displayed. 6.1.3.2.2 Activity: Using a GitHub secret On your repository, go to your 01-exploring-var-and-secrets.yml file from your working branch. Click the edit this file button. Scroll to the bottom. Uncomment the last step step. It should look like this: # Here's how we'd reference a secret - name: How do we reference a GITHUB secret? run: ${ secrets.SECRET_NAME } Replace SECRET_NAME with what you named your secret (probably GH_PAT). Commit that change to your file. Push that change to your file. Take a look at the log by clicking Details. What you should see is that the workflow runs again, tries to print the GitHub secret out but really just shows a ***. "],["troubleshooting-github-actions.html", "Chapter 7 Troubleshooting GitHub Actions 7.1 Tips 7.2 Activity: Troubleshooting GitHub Actions 7.3 Summary", " Chapter 7 Troubleshooting GitHub Actions Many of your standard programming troubleshooting skills are applicable with GitHub actions. In this chapter we’ll five you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like. 7.1 Tips 7.1.1 Look out for silent errors! A well designed GitHub Action will: Successfully fail when you should be alerted to something that isn’t working Successfully pass when the test is working as you’d like. What makes point 1 tricky is that just because you get a green check mark, doesn’t mean that all your steps ran successfully or did what you thought they were doing. Especially when you are first developing a GitHub Action, it is a good idea to look through the logs and click on the dropdown arrows for each step to see what was printed out. It’s a great idea to add a test or evaluation that will be more specific to what you need to be done in your GitHub Actions. This is where the variable setting we discussed in the previous chapter can come in handy. Sometimes you might be able to do something as simple as this: - name: Check on re-run outcome if: steps.running.outcome != 'success' run: | echo Re-running status ${{steps.running.outcome}} exit 1 Where running is the id: of the step you want to evaluate. However this will have limited success and evaluations like this should always be made as specific as possible to what your GitHub Action is testing. In order to design these steps you are going to need to look closely at your logs to see when things are 7.1.2 Look at the logs closely! Whether you GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on. You should start by scrolling down on the Actions page to look at the Annotations. This is GitHub Action’s summary of how the workflow ran. However, the summary is often unlikely to give you enough information to troubleshoot a failed action. In order to find out what the error message really is, you may need to dig into the logs deeper than that. Usually when you open up the log it will open up the step that it detects has failed. Read carefully what output happened here versus what you expected to happen. You may want to use the arrow to show what commands were specifically run. Sometimes you may need to look in earlier steps to really pinpoint what has happened. You may want to Google those messages depending on what they are saying. If the message has to do with a script being called you will want to test those scripts you wrote elsewhere to make sure they are working. 7.1.3 Use workflow_dispatch/pull_request triggers for development Regardless of whether you want your final GitHub Action to run on a pull_request or workflow_dispatch triggers, it can be helpful during development to use these. You can have multiple triggers for a GitHub actions. The pull_request trigger is helpful for development so that every time you push to your pull request your action will be re-run automatically so you can see if what you tried worked. This is the default method (in my mind) for developing a GitHub action. The only instance you may not want to use this method is if you are using GitHub default variables that are different in pull requests than they will be in your final version of the GitHub Action. In that instance a pull_request may not be a representative test for you. The workflow_dispatch trigger is useful so you can re-trigger the workflow run whenever you need to test the next thing you tried for troubleshooting purposes. You might consider doing this if your testing as a pull request isn’t appropriate. You can also initiate workflow_dispatch workflow runs from any branch – so you don’t have to merge it before you’ve polished it. There’s two caveats to this strategy: If you don’t want these triggers long term, make sure you delete them before you merge to main. Recall that if you are using default environment variables in your workflow runs that those change depending on the triggers, so those may not always be representative of the workflow run as you it will be run with the final version. 7.1.4 Print out things to test your assumptions You can check your assumptions about GitHub Actions is running things by printing out pieces of the action. For example, if you are using variables or file paths and suspect they are part of the issue, you can run ls or print out a variable with echo. run: | echo ${GITHUB_ACTION_PATH} ls This may help you figure out whether a variable or file isn’t in the place you think it is. 7.1.5 Use Marketplace actions The great part about using GitHub Actions is that you can use other people’s actions from the marketplace so you don’t have to write everything from scratch! This however, does mean that you are dependent on the developers and maintainers of these GitHub Actions. There are three tips for troubleshooting a problematic GitHub Action that is borrowed from marketplace: Read their docs carefully and make sure you are using it as specified. Try bumping up to a later version if it looks like there’s a bug that may have been addressed. Try not to use Marketplace actions that don’t show evidence of being maintained or don’t have fully fledged documentation! 7.2 Activity: Troubleshooting GitHub Actions What to know about these activities: None of these broken actions will require more than 1 simple line to fix it. So don’t spend too much time writing lots of code to fix these. There are clues in our descriptions here about what you should look out for in fixing these actions. So look out for these clues. Use standard troubleshooting tips to figure this out – Googling and iterative work and attempts is encouraged! Create a new branch to work from. From command line: `git checkout -b "troubleshoot-practice"` For this exercise we are going to copy purposely broken GitHub actions we will fix. Move all three files from the activity-3-find-the-break folder to to your .github/workflows directories you made in the previous chapter. From command line: mv activity-3-find-the-break/* .github/workflows/ Now follow the same set of steps we used in the previous chapters to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "troubleshooting exercise" git push --set-upstream origin troubleshooting Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). Scroll to the bottom of this pull request. You’ll notice that only one of the three broken actions have a status here. If you don’t see a GitHub Action you expect to be running, you’ll need to go to Actions tab to see what’s happening. We’ll dive into this in this next section. 7.2.1 Broken Action 1 - Upload a file Let’s dive into the broken action 1 first. Let’s look into the logs. Go to the Actions tab and find the most recent workflow run that indicates its from .github/workflows/broken-action-1.yml. You’ll notice that this action and the second one don’t show up normally. They both have startup issues. Meaning their problems are so fundamental that GitHub can’t even process them to the point where they can begin to run. Generally start up issues have to do with: An essential specification is missing. There’s a syntax issue There’s a spacing issue Click on this issue’s log and scroll down to the bottom where it says “Annotation”. In this first case we have an error: a step cannot have both the `uses` and `run` keys What do you suppose this error means? When thinking about what you believe this error means, take a look at the parts of the yaml file that have uses and run keys. To recap, uses is a key we use when we are borrowing an action from the marketplace like the following: - name: Checkout files uses: actions/checkout@v3 And the run key we generally use for calling some bash or other language’s commands. like this: - name: Print out a thing run: echo Let's print a thing out! Both run and uses are calling commands. So given this information why do you suppose we are getting the error: a step cannot have both the `uses` and `run` keys 7.2.1.1 Fixing Action 1 Based on what you think is causing this error, attempt to make a change to the broken-action-1.yml file. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-1.yml. Look at the logs to see if the error is different or is fixed. If it ran successfully you’ll see it’s actual title show up in the logs. But regardless if it fails or succeeds you should check the logs. Repeat these steps until you have fixed action 1. 7.2.2 Broken Action 2 - Create an issue Let’s look into Action 2’s logs to try to fix it. Go to the Actions tab and find the most recent workflow run that indicates its from .github/workflows/broken-action-2.yml. This action, like the first one, has start up issues so it will not have its status shown on the pull request. Click on this issue’s log and scroll down to the bottom where it says “Annotation”. In this case we have an error: You have an error in your yaml syntax on line 11 What do you suppose this error means? What’s nice about this error is that it does tell us a specific line to look at. Keep in mind though sometimes when GitHub action tells us a line this may be approximate. We may need to look slightly before or slightly after the line it calls for us to know what to fix. 7.2.2.1 Fixing Action 2 Open up the broken-action-2.yml file. Take a look at the code around line 11 What do you notice that is different about these lines as compared to other actions we’ve looked at? Formulate a hypothesis on what you think is the problem and change that in broken-action-2.yml. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-2.yml. If it ran successfully you’ll see it’s actual title show up in the logs. But regardless if it fails or succeeds you should check the logs. Look at the logs to see if the error is different or is fixed. Repeat these steps until you have fixed action 2! 7.2.3 Broken Action 3 - Run script Finally, let’s look into Broken Action 3 - Run script. Go to the logs and look for a recent workflow run of that title. In this case, the Annotations might tell us Process completed with exit code 127. If we look online we can see that this means that either a script doesn’t exist or it can’t run. This is moderately helpful but doesn’t really help us identify the problem. So we have to dig into the log some more. When we click on the log it will likely open up the step that this workflow failed on. In this case we have an error: /__w/_temp/36dfc03e-56ed-43fa-9019-85d8b151f42a.sh: 2: python3: not found Running python here The /__w/_temp/36dfc03e-56ed-43fa-9019-85d8b151f42a.sh bit just tells us information about where this was being ran in the temporary workspace that GitHub was using to run our workflow. And if we look at our yaml file we can see the message: Running python here is just something we had echoed. What we’re going to want to zero in on here is the python3: not found. 7.2.3.1 Fixing Action 3 Open up the broken-action-3.yml file. It looks like Python is not able to be found. Look at the yaml file and try to figure out why that might be. Formulate a hypothesis on what you think is the problem and change that in broken-action-3.yml. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-3.yml. Look at the logs to see if the error is different or is fixed. Repeat these steps until you have fixed action 2! For a further hint about fixing this problem look here You’re going to want to look into what software packages the docker image referenced on in the image: key has. Does rocker/r-base image have Python? If not, you may need to look for a Docker image to use that has python. We’ve used one in a previous example that you could borrow. 7.3 Summary In this activity, we practiced troubleshooting GitHub Actions. We discussed some of the most common ways that GitHub Actions can be broken. Here’s a summary of troubleshooting tips covered in this chapter/ "],["applying-github-actions-examples.html", "Chapter 8 Applying GitHub Actions Examples 8.1 Explore the actions", " Chapter 8 Applying GitHub Actions Examples A great way to learn GitHub Actions is to borrow a yaml file someone else has written and incorporate it into your own project. In this chapter we will introduce you to two GitHub Actions and encourage you to adapt one or both of them to one of your own projects. We encourage you to follow these similar steps and tips for other actions that are written on the internet that you may find that you can use for your project. The two action examples we have in this chapter both work on pull_request triggers and do the following: Spell checks markdown and R Markdown files and saves the spelling errors in a file uploaded to GitHub . Style R code and commit it back to a branch. This option will take more work to adapt if you do not use R but is totally doable and we’ll walk you through some guidance on how to adapt it. 8.1 Explore the actions To get a sense of how these actions work, you guessed it, we are once again going to create a new branch and open a pull request. From command line: `git checkout -b "example-ghas"` For this exercise we are going to copy over a second GitHub Action YAML file from the folder. This time, move the spell-check.yml and style-code.yml files to your .github/workflows directories. From command line: mv activity-4-sample-github-actions/* .github/workflows/ Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "adding even more ghas" git push --set-upstream origin example-ghas Now create a pull request with the changes you just made. On your pull request page on GitHub, click on the Details button next to each of these workflow runs. Before moving on to the next section, take a look at the logs and yaml files and practice getting an idea of what these GitHub Actions are doing. 8.1.1 Diving deeper In this section, we’ll give you the basic recipe of these GitHub Actions. Hopefully by having a basic idea of what is in these actions you’ll be able to adapt them for your own purposes. 8.1.1.1 Spell Check Overview We’ll run through the yaml file and explain the basic set up here. We also have included links about the resources used at each step. We encourage you to poke around with these resources and with this yaml file to really learn about what these actions are doing. spell-check: runs-on: ubuntu-latest # We will run this on a Docker image so it has most of the things we need container: image: rocker/tidyverse:4.0.2 We’re using a Docker image that has some R packages we will use so we don’t have to install everything individually: rocker/tidyverse:4.0.2 # Need to get the files specific to our branch from our pull request - uses: actions/checkout@v3 with: ref: ${{ github.event.pull_request.head.ref }} Checking out the files we need using actions/checkout@v3 # Our docker image doesn't have this one package though so we'll install it - name: Install packages run: Rscript -e "install.packages('spelling')" Install a spelling package we need that wasn’t on the Docker image already. This is a reasonable strategy if we only need one or two packages that don’t take long to install. - name: Run spell check id: spell_check_run run: | sp_chk_results=$(Rscript "utils/spell-check.R") # This is where we are going to store output from this step to the environment so we can retrieve it in a later step echo "sp_chk_results=$sp_chk_results" >> $GITHUB_OUTPUT cat spell_check_results.tsv Run custom spell check script – this is where you’d really have to personalize this. We’re calling this custom R script that looks for the R Markdown and markdown files and spell checks them. Note that this means this script must be available to this GitHub Action if you are to use it. You’ll either need to download it or add it to whatever repo you add this to. We also have this print out the results in the log and save these results to the GITHUB_OUTPUT variable as discussed in the previous chapter. This allows us to retrieve the number of misspellings identified in a future step. # We want to retrieve this file after this runs so we can see what spell check errors were detected - name: Archive spelling errors uses: actions/upload-artifact@v3 # These arguments underneath `with` are generally action specific so we have to check the documentation: https://github.com/marketplace/actions/upload-a-build-artifact with: name: spell-check-results path: spell_check_results.tsv Archive spelling errors using actions/upload-artifact@v3. For Archived files, we can see them by going to Summary on the left side of the log page and scrolling down to the Artifacts section. This file we uploaded will tell us the spell check error our script detected. # If there are too many spelling errors, this will stop the workflow - name: Check spell check results - fail if too many errors # Here we are only going to through an error if there's more than 3 spell check errors detected if: ${{ steps.spell_check_run.outputs.sp_chk_results > 3 }} run: exit 1 We’ve discussed in previous chapters about conditional statements and using GitHub variables. Here we are using a conditional statement that will fail this GitHub Action if there are too many spelling errors (which here we’ve said is 3). 8.1.1.2 Style Code Overview Just as we did with the spell check action, we’ll run through the yaml file and explain the basic set up here. We also have included links about the resources used at each step. We encourage you to poke around with these resources and with this yaml file to really learn about what these actions are doing. style-code: runs-on: ubuntu-latest # This image has R and basic R packages container: image: rocker/tidyverse:4.0.2 We’re using a Docker image that has some R packages we will use so we don’t have to install everything individually: rocker/tidyverse:4.0.2 # Need to get the files specific to our branch from our pull request - name: Checkout files uses: actions/checkout@v3 with: ref: ${{ github.event.pull_request.head.ref }} We need this files in this repo. So we are checking out the files we need using actions/checkout@v3. # Our docker image doesn't have this one package though so we'll install it - name: Install packages run: Rscript -e "install.packages('styler')" Install a styler package we need that wasn’t on the Docker image already. This is a reasonable strategy if we only need one or two packages that don’t take long to install. # Here's the main thing we are running: styling R file code - name: Run styler on Rmd and R files run: Rscript -e "styler::style_file(list.files(pattern = 'Rmd$|R$', recursive = TRUE, full.names = TRUE));warnings()" Run a code styling command – this is where you’d need to customize this step. For example, if you wish to style Python code, you may look into this pycodestyle GitHub Action or perhaps other GitHub marketplace actions for your particular languages and needs. # We will automatically commit back our styled R files - name: Commit styled files run: | # Some config set up to establish creds git config --global --add safe.directory $GITHUB_WORKSPACE git config --global user.email "itcrtrainingnetwork@gmail.com" git config --global user.name "jhudsl-robot" # Now commit the styled files git add \\*.Rmd git add \\*.R git commit -m 'Style R files' || echo "No changes to commit" git push origin || echo "No changes to commit" In this step we are immediately merging any changed files that were styled back to the original branch that this pull request was on. The || allow these steps an alternative if there are no changes to commit. Without the || the action would break if the styling did not result in changes. What you’ll need to change here is the git config steps to use your own credentials. (aka git config --global user.email \"itcrtrainingnetwork@gmail.com\" should be your email). 8.1.2 Tips for adapting these to your own repository Here’s how you can adapt these to your own repository. First, you’ll want to add them to your own repository with a pull request. These actions will both be triggered by a pull request so you won’t need to edit them at all to start developing them to adapt them to your project needs. Note that for the spell check action, if you are deciding to use the customized script we included, you will need to copy that to the same file path as called in this action. Besides the points of customization you may need to work on that we discussed; you may also need to edit the Docker images used depending on what steps and uses you need. From here, its basically good luck! Take it one troubleshooting tactic at a time, Google your problems and look for GitHub Action marketplace actions that fit your needs. Best of luck! "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) Candace Savonen Content Author(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Illustrator(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Figure Artist(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Funding Funder(s) NCI UE5CA254170 Funding Staff Shasta Nicholson, Maleah O’Connor, Sandy Ombrek   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-02-29 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2023-03-28 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.0.2) ## cachem 1.0.7 2023-02-24 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.20 2023-01-17 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.0.2) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2023-03-28 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## ottrpal 1.0.1 2023-03-28 [1] Github (jhudsl/ottrpal@151e412) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.0 2023-03-14 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2023-03-28 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.0.2) ## sass 0.4.5 2023-01-24 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2023-03-28 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2023-03-28 [1] Github (yihui/xfun@74c2a66) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "Chapter 9 References", " Chapter 9 References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "GitHub Automation for Scientists About this Course", " GitHub Automation for Scientists March, 2024 About this Course This course is part of a series of courses for the Informatics Technology for Cancer Research (ITCR) called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. This initiative is funded by the following grant: National Cancer Institute (NCI) UE5 CA254170. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at www.itcrtraining.org for more information. "],["introduction.html", "Chapter 1 Introduction 1.1 Target Audience 1.2 Topics covered 1.3 Motivation 1.4 Curriculum 1.5 How to use the course", " Chapter 1 Introduction 1.1 Target Audience The course is intended for students in the biomedical sciences and researchers who use informatics tools in their research This course is written for individuals who: Are comfortable with GitHub and know how to make a pull request Wish to save time and enhance their scientific projects using automation Have perhaps tried to learn about GitHub Actions before but felt overwhelmed about how to start 1.2 Topics covered This course covers how to use GitHub actions for scientific software development. We encourage the recognition that scientific software can take many forms that can all benefit from the concepts of continuous integration and continuous deployment. This course builds on concepts introduced in the Reproducibility and Advanced Reproducibility courses from the ITCR Training Network. If you are unfamiliar with GitHub and/or do not have an account, we’d suggest you start with those courses by using the links or QR codes below. 1.3 Motivation Cancer datasets are plentiful, complicated, and hold untold amounts of information regarding cancer biology. Cancer researchers are working to apply their expertise to the analysis of these vast amounts of data but training opportunities to properly equip them in these efforts can be sparse. This includes training in reproducible data analysis methods. Data analyses are generally not reproducible without direct contact with the original researchers and a substantial amount of time and effort (Beaulieu-Jones and Greene 2017). Reproducibility in cancer informatics (as with other fields) is still not monitored or incentivized despite that it is fundamental to the scientific method. Despite the lack of incentive, many researchers strive for reproducibility in their own work but often lack the skills or training to do so effectively. Equipping researchers with the skills to create reproducible data analyses increases the efficiency of everyone involved. One tool among many for creating reproducible analyses is utilizing automation. Many individuals performing analyses on cancer data may not have formal training in software development and may be unfamiliar with the ideas of continuous integration and continuous deployment. By recognizing that biological data analysis code is a form of software development, we can try to adapt good development practices in scientific analyses and software contexts. Scientific software projects may include (but aren’t limited to): Software that built as tools to be utilized by others to analyze biologically derived data. Code that is built primarily for analyzing one project’s data. Code that is built as a workflow for a series of steps and analyses that might be reused among collaborators or within a lab. Any scripts and code that are built to handle data in a research setting. Any scripts and code a researcher might interact with. 1.4 Curriculum The course includes hands-on exercises for how to understand, build, and troubleshoot GitHub Actions as a continuous integration/continuous deployment tool for scientific software projects. Goal of this course: Equip learners with basics skills and confidence to utilize the concepts of continuous integration in the context of scientific software. What is not the goal This course is meant to teach learners how to create sophisticated GitHub Actions, but instead introduce learners to basic fundamentals of continuous integration and continuous deployment. This course focuses on GitHub Actions and will not cover any other (perfectly fine) tools for CI/CD. 1.5 How to use the course Ideally you should follow along with the chapters and perform the activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub. References "],["scientific-software-development-best-practices.html", "Chapter 2 Scientific software development best practices 2.1 Learning Objectives 2.2 Science and software as iterative processes 2.3 Software complexity as a spectrum 2.4 Examples 2.5 Automation for scientific software", " Chapter 2 Scientific software development best practices 2.1 Learning Objectives 2.2 Science and software as iterative processes Scientific papers are often arranged as a list of methods and results, building on themselves more or less sequentially. Each figure follows from the previous figure or text description, to describe the data that support a hypothesis or illustrate a conclusion in a linear, “story”-like order. However, the modern process of doing science, itself, is rarely linear. It is not realistic to do an experiment, and write a manuscript, and publish the paper, in that order and with no other complications – usually, there is some amount of iteration involved on one or more of these steps: You might do an experiment, then summarize it, then run more experiments based on the results to confirm/test/extend your findings You might do an experiment, write a manuscript, then revise the manuscript based on feedback from other scientists You might submit a manuscript, then a reviewer may request revisions or additional experiments, which will require you to go back and revisit your experimental setup and conclusions As scientists, we don’t generally expect science to be a static, “write once and forget” process. The same idea applies to developing research software! Rarely, you might be able to write a script or program for a scientific study and use it once, for a single well-defined purpose. But more often, you’ll write a script (or join several of them together in a more complex pipeline) and reuse it, possibly with changes or extensions as the project progresses. In this course, through the lens of automation, we hope to familiarize you with some of the skills necessary to think about research software in an iterative way, from the beginning of a research project. Although software development is not generally rewarded directly in academia, it turns out that writing good software does have less obvious rewards, even within the traditional academic structure. For example, software that is easy to install tends to be cited more often (Mangul et al. 2019), and software that is more consistently maintained tends to be more accurate (Gardner et al. 2022). 2.3 Software complexity as a spectrum Not all software is complex, and not all software requires complex infrastructure (or automation, for that matter)! It can be useful to think about the complexity of software engineering infrastructure necessary for a project proportionally to the complexity of the software itself: Simple software (math, data transformations, procedural/rule-based scripts) requires simpler infrastructure. Complex software (e.g. “pipelines” composed of many commands/software packages chained together, “libraries” that are intended to be reused in many different applications) requires more complex infrastructure, to check assumptions and test reproducibility at each step. We will take a closer look at two concrete examples, one on each end of the software complexity spectrum, in the next section. 2.4 Examples 2.4.1 t-test Imagine that you have two sampling distributions (lists/arrays of numbers) and you want to test whether the means of the distributions are statistically equivalent or not. This is the setup for a t-test. t-tests are implemented in standard functions in base R and Python scipy library, as well as most other commonly used programming languages. In both R and Python, a t-test is a very well-defined, specific function that takes two lists of numbers and returns the t-statistic and p-value. Since this is a part of a standard, widely used library in each language, it is already tested as part of those libraries. In your own software, you might need to do some verification of your input (for instance, what happens if you pass an empty list of numbers?) but probably not too much, since you can be fairly confident that the t-test function does what it is documented to do in the programming language you choose to use. 2.4.2 Sequencing analysis Imagine that you have a list of reads from a DNA sequencing machine, and you want to use these data to answer a biological question, or to make a plot/visualization to communicate a biological insight. This is a much less well-defined problem than our previous example, with many more independently operating components, and many more subjective decisions that a researcher must make along the way. Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals. Most sequencing analyses require multiple steps (i.e. different programs or scripts), and generate multiple intermediate files (e.g. read counts, normalized counts, quality information) that can be checked to verify that the pipeline is proceeding as expected. Sequencing analyses can also take hours or days to run, as compared to the t-test example which runs effectively instantaneously. This means that: The set of steps that need to take place is more complex than our previous example, and each step in the analysis likely builds from previous steps. Finding errors early in the process can save a lot of time and effort in later steps. A longer or more complex set of steps often means there are more ambiguous/“gray area” decisions that need to be made along the way. This usually means more iterations or experiments, to explore what works and what doesn’t. Introducing reproducible software practices from the ground up will help to make this exploratory process easier and clearer. 2.5 Automation for scientific software Good software practices do not necessarily have to rely on automation. However, complex projects can be unwieldy to check and revise in the absence of some sort of automated process to kick them off automatically, without too much human intervention. Steps that are involved might include: Rerunning the software itself (often on new or modified input data) Software testing Code style linting Rebuilding figures or processed datasets And many more! Each of these steps could be individually run by hand. Alternatively, they could be combined in a central script that runs all the steps in order or in parallel, which can also be triggered manually. Such a central script can itself be considered a form of automation. Automation like that of GitHub Actions, in contrast, can provide a “single point of truth”: a single central script to run these steps, and a single set of (automated) criteria for when to run them. This eliminates the need for you to remember to run tests, to clean up your code, to rebuild figures, or to kick off similar standard processes or commands on your own. Later in the course, we will talk more specifically about what exactly automation via continuous integration looks like, and go into more depth as to its uses and benefits. References "],["why-automation.html", "Chapter 3 Why Automation 3.1 Automation as an aid for reproducibility 3.2 Continuous integration / Continuous deployment", " Chapter 3 Why Automation 3.1 Automation as an aid for reproducibility All of science is built on results being reliable and continually working toward identifying more true/less wrong explanations about the world. The process of science is first repeatability – can the same researcher with the same data get the same results? Undoubtedly in the early stages of an analysis, sometimes the results and output can be in flux. But as the analysis gets further polished and decisions are made, it should be that the same results can be obtained no matter how many times an analysis is run or re-run by the same researcher. This brings us to the critical but previously historically overlooked part of the pyramid known as reproducibility. Reproducibility is what happens when another researcher can take the same data as the research #1 and get the same results. This is more difficult than it sounds at face value because data analysis requires so many decisions and variables. The order and ease of which something is re-run and the computing environment used to run the analysis are two such factors. Keep in mind that consistent results (like those seen with reproducible work) are not automatically true but inconsistent results (like those seen with irreproducible work) cannot be true. In other words, correctness is not the same as reproducibility but reproducibility is a necessary aspect of correctness. Reproducibility is the overlooked but critical step that allows replicability to happen. Replicability is when new data is collected that extends the findings of the first study. With this new data, hopefully the same type of analysis can be done that helps the field learn even more about the concepts that were learned in the first study. 3.1.1 Why reproducibility is so important. Reproducibility is not only important because all of science is built upon it but it also saves everyone time! We can often underestimate the extent to which our work, code, and data are being used and reused by others in the scientific community. The extent to which our work is reproducible then, not only affect us and our immediate collaborates but could aid or hinder other researcher’s work in an exponential scale. In other words, if 10 researchers reuse your work and all 10 of them spent 100 hours trying to get it to work without success, that’s a lot of time to waste (10,000 hours)! But conversely, if your work was made with reproducibility aiding tools and skillsets (like automation that we are discussing in this course) then you could save other researchers loads of time! Let’s say instead 9 out of 10 of the researchers who try to reproduce your work (as opposed to running their own analysis from scratch) are able to do so in that time allotment, that saves them an insane amount of time and stress! 3.1.2 Automation as a reproducibility tool Automation is just one of many tools and skillsets that can aid the reproducibility of your work! Returning to those 10 researchers, if instead of having those 10 people manually try to reproduce our work every time we change it, what if we had robots do that work instead? That would not only help us re-run our results more quickly (because researchers are often busy) but also robots are much better at repetitive work. In other words, your human collaborator is great at many things but even your most reliable collaborator will not be as punctual as a robot who is programmed to do the job. 3.2 Continuous integration / Continuous deployment Before we discuss the concept of Continuous integration / Continuous deployment (often abbreviated CI/CD), let’s use an analogy. Obviously we are getting at here, that generally its a good idea to check work along the way instead of waiting until something is completely finished to test it. Software is no exception to this idea. Often if we send a collaborator an enormous amount of code to review; they are likely to feel overwhelmed and may not be able to give useful feedback. But if you send a manageable, small chunk of code to review, they are likely to give more feedback. Continuous Integration / Continuous Deployment then is a manner of working that means we will have changes checked as they are being integrated and before the changes are deployed. This allows for continuous monitoring of the project and hopefully early catching of bugs! Bugs/mistakes are an unavoidable part of software development because software developers and researchers are generally humans and humans make mistakes! Let’s assume over the course of developing a project, bugs are introduced at a certain rate. Without using CI/CD you may find yourself trying to fix many bugs at once! This will make the bugs harder to isolate and harder to fix and pinpoint. The amount of time it will take to fix 3 bugs at once may be exponentially higher than if you caught these bugs one at a time. Additionally, the longer amount of time that goes on before you catch a bug, it may be more likely it will get accidentally incorporated into your published results – this will be a lot more work for you and others to rectify. However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run amock! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress! This is not only true for classic “my script won’t run” bugs but also “silent” bugs – bugs where the analysis still ran to completion but perhaps the results were slightly different. 3.2.1 Continuous Integration / Continuous Deployment A workflow that uses CI/CD principles may look like this: The idea is we use version control and build aspects of our software. Before what we’ve built is incorporated into the published version, we will stage it and test it. By staging we mean that perhaps we keep it stored on a different branch and have ways that we can play around with the beta version of the analysis or product before our most recent adds are incorporated. The above diagram is in reference to more traditional software products but CI/CD can also be thought of in the context of an scientific data analysis: In the case of a scientific analysis, we may modify or add to the analysis, but we’ll want to test that these changes work – aka we may want to re-run the analysis before we merge it into the main branch or public facing version. In this instance we may think of the final product as being a published manuscript as opposed to deployed website or app. But the same principles here apply. We’ll want to re-run the analysis and build tests that check if the results make sense. 3.2.2 A real world example Let’s bring this into the terms of a very common story for science. Let’s say you are a researcher who submitted a manuscript and a reviewer comes back and asks you to re run the analysis with a minor tweak; perhaps a parameter change. If you developed your analysis without using reproducibility aiding practices and without automation, it is very likely that this seemingly simple task could take a lot of your time and brain power. Because while you don’t think anything on your computer changed since you ran this analysis 6 months ago, your computing environment and the software it uses has been changing the entire time! This kind of simple “this should be easy” situation can easily devolve into a huge rabbit hole – when you thought this analysis was basically wrapped up. But, if you had been using the principles of CI/CD and reproducibility you may have a better chance that your analysis should still run reliably. And if it doesn’t re-run reliably you will have more runs and set up that you’ll be able to pull from to pinpoint where the bug is in your analysis re-run that is keeping it from running. By having automation keep tabs on your development, you will be less likely to be blind sided by bugs in a situation where you need to re-run your analysis (or adapt it for a new analysis!) 3.2.3 Other CI/CD services In this course we are focusing on using GitHub Actions for CI/CD. However, at this point we should mention that GitHub Actions is just one of many options for this. Circle CI, Appveyor, and Travis CI are all also perfectly fine options to use. But if you are using GitHub already, GitHub Actions may be the easiest to start out with. However, if at a later point in your automation development journey you find that GitHub Actions may not have a feature you need, we encourage you to explore these other options and use what works best for you. These other CI/CD options definitely have some commonalities with GitHub actions so learning GitHub Actions will still give you a good start in understanding how these services work. "],["github-actions-fundamentals.html", "Chapter 4 GitHub Actions Fundamentals 4.1 GHA structure 4.2 Exercise 1 - Running your first GitHub Action", " Chapter 4 GitHub Actions Fundamentals 4.1 GHA structure All GitHub Actions involve answering three questions: When should a thing run? What should be run? With what environment should the thing be run? These questions and other specifications are set by writing a YAML file. YAML files are human readable markup language files. Basically it’s a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we’re going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do. The headlines about working with YAML files: Everything is a list (kind of like a JSON file). Indentations = subsets of a list Spacing is VERY specific! – incorrect spacing will definitely result in errors for your GitHub Action run. Let’s take a look at an example YAML. Note that what comes before a : is generally a name and indentations indicate subsets of a list. So in the overall list of food we have sublists of vegetables and fruits. # can be used as a comment and will not be treated as code. Additionally, : are often names. So citrus is the name for the item oranges and etc. # A comment here which is ignored food: - vegetables: tomatoes - fruits: citrus: oranges tropical: bananas Two items that every GitHub Action YAML must contain is on: and jobs:. on: tells GitHub when something should be run. For example “whenever a pull request is opened”. jobs: tells GitHub what should be run. For example “run this bash script”. runs-on: tells GitHub with what environment should this be run. For example “windows-latest”. 4.1.1 on: When a thing should be run If you are to automate something, step one is to figure out when do you want the thing to happen. What should trigger your action? For that we use on: in a GitHub Action. There’s lots of possible answers for when something should be run. The triggers can be a lot of different events on GitHub: pull requests, issues, comments, times of day, etc. 4.1.2 jobs: What should be run Perhaps even more important, what is the job that this automated task needs to do? Description Trigger term When you click a button workflow_dispatch: When its a certain time of day schedule: When a pull request is opened or has a new commit pull_request: When a branch is merged push: When something happens with an issue issue: When a different github action runs workflow_call: When someone comments on a pull request pull_request_review_comment: Scenario: You are running an analysis using public data that continually has more samples added - You would like the analysis to rerun when new samples are added - You would like to be informed of when the analysis got rerun and what the results were on Slack That’s totally a thing a GitHub action can do! We will walk through some examples like it! And here’s the good news, you don’t have to write things from scratch or know ALL the languages. GitHub marketplace allows you to use really cool actions that other people have created. More on this later. 4.1.3 runs-on: with what: The runs-on: tag specifies with what environment the job is going to be run. What does this mean? Well let’s start by discussing that the term “cloud” computing is a tad misleading. When we send a job to an online service like GitHub Actions, its not a mysterious vague mass. Instead, its being sent to a real computer somewhere and that computer is setting up a computing environment to run your job and sends back the results to you through the GitHub website. What do we mean by a computing environment? As just like when you work on your personal computer, you install, update, and sometimes delete software in order to run different things, the GitHub Actions computers need to do the same in order to run your code. Although some person from Microsoft isn’t setting up a new physical computer and manually installing software, the specs you give underneath runs-on: tell GitHub Actions what kind of set up to use. So for example, there are built in operating systems like windows-latest, mac-latest, and ubuntu-latest. You can see more about the default GitHub runners here. Similarly to how a new Windows computer may not come equipped with all the software you need to execute certain code, you might require a more tailored computing environment. You can also create custom environments using containerization. 4.1.4 Containerization A “virtual machine” is basically when your computer creates its own fake computer inside of it. It’s acting like a different computer but it doesn’t have any additional physical parts. Containers, while not virtual machines, serve a similar purpose by creating an isolated computing environment where tasks can be performed. They are named “containers” due to their separation from the rest of your computer. Containerization is useful because it allows us to share our computing environments with others. This is useful because it can be a powerful tool for reproducing analyses if we are controlling our computing environments. The software you use, and the versions of the software you use can affect the results from an analysis (Beaulieu-Jones and Greene 2017). Real data and experiments have shown this! Below is a figure from Beaulieu-Jones and Casey S. Greene, 2017 that shows how a microarray data analysis had different results depending on the software versions used (Beaulieu-Jones and Greene 2017). And as time goes on, your computing environment changes; potentially in ways you don’t realize! Most languages and programs allow you to print out the specifications of your computing environment. See below a “session info” print out from R. What this shows is two different computing environments. Side by side we can see how they differ. There’s various containerization software programs, that will allow you to share your computing environments but a very popular one is Docker. We can picture how this makes analyses more reproducible. Docker and other containerization software work by allowing you to take a snapshot of your environment, called an image. This image can be shared and others can use this image to build the container from which they can run the analysis or whatever it is they plan to do. Docker is a whole other world. There’s whole conferences, hackathons, and etc devoted to Docker and other containerization software. It can be a lot to learn. To start, we recommend borrowing other people’s Docker images as much as possible instead of trying to build your own And then install the few packages you need. (more on this in a future chapter) Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data in your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed! 4.1.4.1 More resources about Docker Launching a Docker image Modifying a Docker image Docker for data scientists 4.1.5 Summarizing GitHub actions are specified by YAML files in .github/workflows/ folder on a GitHub repository. The specs from this YAML are used to run a job when an on trigger specifies it should be run. The runs-on spec tells the server what kind of environment it should be run with. Containers like those made with Docker can help you make custom computing environments. 4.2 Exercise 1 - Running your first GitHub Action Let’s apply what we’ve learned about GitHub Actions by running one! If you don’t have a standard workflow for how you use GitHub locally, or are unhappy with your current methods for this activity we recommend installing GitHub Desktop First we need to create a copy of the exercise GitHub repository we will use for this course. Go to https://github.com/fhdsl/github-action-workshop and click on the Use this template button. Fill out the form on this page about where you want this repository to be and what description you want it to have. And click Create Repository. Clone this repository to your local computer. In GitHub Desktop you can do this by clicking the Clone Repository button. But from command line you can use this kind of command: git clone https://github.com/<your-username>/github-actions-workshop Create a new branch by clicking the buttons as shown here or using the command line examples below cd github-actions-workshop git checkout -b "first-gha" Create the specific GitHub Actions folders. Recall that in order to run a GitHub action, GitHub will look for YAML files in a specific location. We will need to create these folders to get going. Use your operating system to create a .github folder and then inside that folder, a workflows folder. Don’t forget the s in workflows or the . in .github – these folder names have to be exactly written this way for your GitHub Action to be found and recognized by GitHub. You can use this command to do this: mkdir -p .github/workflows Now you will want to move the 00-my-first-action.yml file into the .github/workflows folder. In command line you can do this by using this command: mv activity-1-sample-github-actions/00-my-first-action.yml .github/workflows/00-my-first-action.yml Add and commit these changes to your branch. Then you will want to push your branch to the online GitHub repository. In command line this can be done like this: git add .github/* git commit -m "adding first gha" git push --set-upstream origin first-gha Open a pull request. In GitHub Desktop you can click this button: Or just navigate to your GitHub repository online and open a pull request through the website. Check your pull request to make sure the changes are what you expect. Then merge it! After merging, go to the Actions tab on GitHub You’ll be come very acquainted with this page if you use GitHub actions. On the left shows the workflows that are available or have been run before. We should see our new GitHub Action we just merged from our pull request here called “Basic GitHub Action”. Click on that. Underneath this we should now see a blue banner that allows us to click “Run workflow”. Click Run workflow and then Run workflow again. Because we made our on: trigger workflow_dispatch this means we have to tell the GitHub Action when to run (which means its not really automated in this case). Yay! 4.2.1 Checking results of a GitHub Action Go to the Action tab. You’ll see your newest run of your GitHub Action is logged here. All future GitHub Action runs will have their logs here. Click on the workflow run log so we can look into it. To see more run details we’ll click on the job name which in this case is hello. Click on the dropdown arrows to see even more details on each step. 4.2.2 Breaking down the YAML We can break down how what we wrote in the YAML lead to what is shown in this run’s log. We named this Action Basic GitHub Action and the log was named that. The only job being ran was named hello so in the log it shows up this way underneath the Jobs header. If we had more than one job, those jobs’ names would show up here too. Each job can contain as many steps as we want. Our one step is named Hello World. This step involved running some code using the run: tag which by default uses bash. The bash code just echoed “hello world”. Congrats! You’ve ran your first GitHub Action! In the next chapter we’ll run something a little more automated and a little more fun to build on what we’ve learned here. References "],["automating-re-running-analyses.html", "Chapter 5 Automating Re-running Analyses 5.1 Exercise 2 - Re-run analysis example 5.2 Diving into the details", " Chapter 5 Automating Re-running Analyses In the beginning of this course we discussed the benefits of using continuous integration/continuous deployment principles for scientific code including analyses. In this chapter we will go through example code that shows how this can be set up. We highly encourage you to take this code and adapt it to your own project’s needs. 5.1 Exercise 2 - Re-run analysis example For this exercise, we are going to continue to use the example repository that we set up in the previous chapter. Create a new branch to work from. As is good practice for adapting a GitHub workflow, we will create a new branch for us to work from. In GitHub Desktop you can click the branch button and follow the same steps we did in the previous exercise. From command line: `git checkout -b "more-ghas"` For this exercise we are going to copy over a second GitHub Action YAML file from the folder. This time, move the 01-re-run-analysis.yml file to your .github/workflows directories you made in the previous chapter. From command line: mv activity-1-sample-github-actions/01-re-run-analysis.yml .github/workflows/01-re-run-analysis.yml Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "adding more ghas" git push --set-upstream origin more-ghas Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). After you open your pull request, scroll down to the bottom of the page. If all went as expected, you should see a status message that shows a GitHub Action is running after opening your pull request. Think about it. Without looking at the YAML file… What do you suppose the on: value (the when) might be for these actions? Take a look at the file, .github/workflows/01-re-run-analysis.yml, to see if you are right! On your pull request page on GitHub, click on the Details button next to your workflow run. You can navigate to this same page by going to the Actions tab, then Scrolling down to see the most recent workflow run which should be named Re-run analysis and clicking on that. 5.2 Diving into the details Let’s break down what is in this GitHub Action YAML file and what this workflow run did. 5.2.1 name and on At the top of the file we have: name: Re-run analysis. This is what our workflow run shows up in the Actions tab log as and helps us differentiate it from other GitHub Action Workflows. Below that, there is the on: trigger. This workflow of re-running this analysis will only run when a pull request is open or pushed to. And further we’ve specified with branches: it will only run if the pull request is targeted to branches named main or staging. # Run this workflow when a pull request is opened or pushed to. on: pull_request: branches: [ main, staging ] 5.2.2 jobs In our jobs: we’ve named this job R run analysis. Additionally we are running this on a ubuntu-latest operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn’t need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, Python, and some specific packages. We could attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made Docker image that has R, python, and other packages we need already installed. This custom made Docker image is pulled from Dockerhub. If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others managed and version controlled on this GitHub repository. You may note we use GitHub Actions on this repository to help us manage these Docker images. jobs: re-run: name: Re run analysis runs-on: ubuntu-latest # This image has python, R and other things we need to run our mock analysis container: image: jhudsl/ottr_python:main 5.2.2.1 actions/checkout One of the most frequently use GitHub Actions from the GitHub Action Marketplace is actions/checkout. This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them). steps: # Need to get the files specific to our branch from our pull request - name: Checkout files uses: actions/checkout@v3 with: fetch-depth: 0 By default, it will checkout the files from the repository where this action is being run, but we could specify other repository and other branches. fetch-depth: 0 means we will grab all the files. 5.2.2.2 sh run_analysis.sh Now the main objective we were building to. We are going to run a script that re-runs our entire analysis. We’ve named this file run_analysis.sh to be clear about what it does. We’re giving this step an id of running (this will become clear in the next paragraph). Additionally the | tells run: to expect multiple lines of a command. We didn’t need this to be a multiple line command, but we thought it would be good to show you this. # We can call our main script then to re-run it to make sure it works - name: Run it id: running run: | sh run_analysis.sh We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the run_analysis.sh file, you will see it is basically a simple workflow step calling file. It looks like this: # This is a mock script that shows how you could have your whole analysis ran by one script call. ## Usage: To re-run this whole analysis, go to bash and # These specs will make sure that if one script fails this will fail the script set -e ## Run the first step python3 "01-python_test.py" ## Run the second step Rscript "02-r_test.R" ## Run a third step Rscript -e "rmarkdown::render('03-make-a-plot.Rmd')" The set -e is actually critical here. We need to make sure that this script will stop if it encounters an error. That is the main point of our GitHub Action here, is we want to know if something failed. (We also want to know if the results remained the same, but that will require a bit more engineering than we are showing in this simple example). A very tricky thing about GitHub Actions (and languages called by them) is that GitHub workflows do not always stop when there are errors as we would define them. When designing a new action, we need to carefully evaluate the steps of the job in the logs to make sure what we think happened and completed actually did complete successfully. Returning to our GitHub Action YAML file, we can see that the last step of this job has an if statement. What we are doing here is asking GitHub to evaluate whether the step running (remember the id we set?) had success as its outcome. # We can have this double check that the last step was successfully run - name: Check on re-run outcome if: steps.running.outcome != 'success' run: | echo Re-running status ${{steps.running.outcome}} exit 1 This steps.running.outcome is representative of a whole new world of GitHub Actions Environmental variables that we have not discussed yet but we will now! 5.2.3 Summary "],["github-action-variables.html", "Chapter 6 GitHub Action Variables 6.1 Exercise 3 - Exploring Variables", " Chapter 6 GitHub Action Variables The GitHub Actions environments have variables that are already set by default in the environment but you can also set environment variables yourself. 6.0.1 Types of variables There are two types of variables in GitHub Actions. Default - Ones GitHub already sets for you. User set - Ones you set yourself. To print things out, you can use this kind of notation in bash or other contexts in the yaml file. echo ${{ github.repository }} In this next exercise we’ll explore different ways to use variables. 6.1 Exercise 3 - Exploring Variables For this exercise, we are going to continue to use the example repository that we set up in the previous chapter. Create a new branch to work from. From command line: `git checkout -b "env-var"` For this exercise we are going to copy over another GHA yaml to explore. This time, move the exploring-var-and-secrets.yml file to your .github/workflows directories you made in the previous chapter. From command line: mv activity-1-sample-github-actions/exploring-var-and-secrets.yml .github/workflows/exploring-var-and-secrets.yml Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "exploring gha variables" git push --set-upstream origin env-var Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). On your pull request page on GitHub, click on the Details button next to your workflow run. Keep this handy because we will dive into the details of what we just ran. 6.1.1 Default variables You can read the latest documentation about GitHub Action default variables here. But here’s some highlights. name example output explanation GITHUB_REPOSITORY username/repository_name This prints out what repo this is run from GITHUB_REF refs/pull/1/merge The branch or tag that triggered this workflow. But note that this will be blank if the trigger is not based or related to branches or tags. For example a workflow_dispatch wouldn’t have this GITHUB_ACTOR cansavvy The GitHub handle of the person who caused this workflow to run Below shows an example of the log of the where we printed out these default GitHub variables. 6.1.2 User set variables 6.1.2.1 env: There are different ways to set variables. The simplest way to set variables is within a step you can set them using env:. Underneath env: you write the name of the variable on one side of the colon and then the definition on the other side. For example, in our yaml file we had: - name: Hello, but make it personal run: echo "Hello $First_Name." env: First_Name: Candace This set up printed out Hello Candace in the logs as our output. This might be useful, but if we want an environmental variable to be stored and retrieved between steps we’ll need to use something different. 6.1.2.2 Setting output variables If we’d like one step to be able to retrieve information from another step we’ll need to send a variable to the GITHUB_OUTPUT. To do this we can use this sort of set up: Step that sets a variable depending on some output: # How to export a variable to a next step - name: Setting output to the environment at large id: step_name run: echo "results=5" >> $GITHUB_OUTPUT Here we are naming the variable results and the notation >> $GITHUB_OUTPUT is always there. In this example, results is only set equal to 5 but you could see how this might be made to be more complicated. Like perhaps the results are a bash command output like: "time=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT This would allow us to have a time stamp of when this step was run. Or perhaps we are running a script that outputs a result: results=$(Rscript utils/script.R) 6.1.2.3 Using output variables To use this output variable in a subsequent step we have to use this kind of setup: steps.step_name.outputs.results where step_name is the id: we set for the step that set this variable (see above) and results is the name of the variable we set. And, as is typical we need the ${{ }} notation. # How to print out the variable we just saved - name: Print out that variable in a later step run: echo ${{ steps.step_name.outputs.results > 3 }} This is nifty because now we can use the result of one step to determine whether or not we run a subsequent step. GitHub Action steps can have conditional or if statements. Maybe we only want a step to run if the result is something specific: - name: Conditional step # Here we are only going to do this step if the results from the previous step are bigger than 3 if: ${{ steps.step_name.outputs.results > 3 }} run: echo 'the results are greater than 3!' Or, maybe we want to make sure the whole workflow shuts down if a variable is something in particular like this example below. - name: Shut it down # Here we are only going to do this step if the results from the previous step are bigger than 3 if: ${{ steps.step_name.outputs.results =< 3 }} run: | echo 'the results are less than or equal to 3! -- going to exit!' exit 1 6.1.2.4 Setting and grabbing secrets What if the string or variable we need is not something we can supply in the YAML itself? Perhaps we have credentials or something that cannot be shared publicly but that we need it to complete our steps. That’s where GitHub secrets come in handy! Read more about GitHub secrets here.. One very common type of GitHub secret you may need to add is a GitHub Personal Access Token (sometimes abbreviated PAT). A personal access token is a string set that, when provided, gives access to a user’s GitHub account. Read more about tokens here. For GitHub actions that are doing things that require authorization or particular permissions levels, you will need to provide your GitHub action with your personal access token (PAT) that you store as a GitHub secret. 6.1.3 Activity: Setting GitHub secrets Let’s practice this by setting a GitHub Access token as a secret! 6.1.3.1 Make a Personal Access token You can store any alphanumeric string as your GitHub secret. It may be an API key or authorization keys from some other software program. But for this example, we will use an authorization key for GitHub. Recall we have may have to give authorization to a GitHub action some times, because we are not actually running this with our user account, this job is being sent to GitHub for them to run on their servers somewhere. First make your own personal access token by going here: https://github.com/settings/tokens You can find this page by going to your own profile, and then to Settings and Developer settings. The GitHub Documentation for how to make PATs is here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens But we’ll walk through it together now. Underneath Tokens (classic) click Generate new token and pick Generate new token (classic). You will likely have to enter your password at this point. Underneath Note write something that will remind you about where you are using this PAT. Check the repo workflow. (Depending on what you are trying to do you may have to check other boxes but for a lot of the permissions you’ll need repo will do). Scroll to the bottom of the page and click Generate. Your token will be shown on the next page. You’ll keep this handy because they won’t show it to you again. Be careful not to share this any place publicly because it will give someone authorization to you GitHub account! 6.1.3.2 Creating a GitHub Secret Return to your repository that we were using for these activities. Settings > Secrets and variables > Actions > New Repository Secret. Name your secret something. In this example, let’s call it GH_PAT. You’ll want to name your secret something that relates to what it is. Now copy and paste in the secret section. 6.1.3.2.1 Referencing a GitHub secret in a GitHub action To retrieve a GitHub secret amidst a GitHub Action workflow run, you do this sort of notation: ${{ secrets.SECRET_NAME}} Where SECRET_NAME directly is the name you used for your GitHub secret. # Here's how we'd reference a secret - name: How do we reference a GITHUB secret? run: ${ secrets.SECRET_NAME } In the previous step we named our secret GH_PAT so if we needed to use it in our workflow we would use ${ secrets.GH_PAT }. Perhaps at this point you are worried that your logs may accidentally display your GitHub secret if you did something like: run: echo ${ secrets.GH_PAT } But, you don’t have to worry about that part, in your logs your secrets will show as *** and will not be displayed. 6.1.3.2.2 Activity: Using a GitHub secret On your repository, go to your 01-exploring-var-and-secrets.yml file from your working branch. Click the edit this file button. Scroll to the bottom. Uncomment the last step step. It should look like this: # Here's how we'd reference a secret - name: How do we reference a GITHUB secret? run: ${ secrets.SECRET_NAME } Replace SECRET_NAME with what you named your secret (probably GH_PAT). Commit that change to your file. Push that change to your file. Take a look at the log by clicking Details. What you should see is that the workflow runs again, tries to print the GitHub secret out but really just shows a ***. "],["troubleshooting-github-actions.html", "Chapter 7 Troubleshooting GitHub Actions 7.1 Tips 7.2 Activity: Troubleshooting GitHub Actions 7.3 Summary", " Chapter 7 Troubleshooting GitHub Actions Many of your standard programming troubleshooting skills are applicable with GitHub Actions. In this chapter we’ll give you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like. 7.1 Tips 7.1.1 Look out for silent errors! A well designed GitHub Action will: Successfully fail when you should be alerted to something that isn’t working Successfully pass when the test is working as you’d like. What makes point 1 tricky is that just because you get a green check mark, doesn’t mean that all your steps ran successfully or did what you thought they were doing. Especially when you are first developing a GitHub Action, it is a good idea to look through the logs and click on the dropdown arrows for each step to see what was printed out. It’s a great idea to add a test or evaluation that will be more specific to what you need to be done in your GitHub Actions. This is where the variable setting we discussed in the previous chapter can come in handy. Sometimes you might be able to do something as simple as this: - name: Check on re-run outcome if: steps.running.outcome != 'success' run: | echo Re-running status ${{steps.running.outcome}} exit 1 Where running is the id: of the step you want to evaluate. However this will have limited success and evaluations like this should always be made as specific as possible to what your GitHub Action is testing. In order to design these steps you are going to need to look closely at your logs to see when things are 7.1.2 Look at the logs closely! Whether your GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on. You should start by scrolling down on the Actions page to look at the Annotations. This is GitHub Action’s summary of how the workflow ran. However, the summary is often unlikely to give you enough information to troubleshoot a failed action. In order to find out what the error message really is, you may need to dig into the logs deeper than that. Usually when you open up the log it will open up the step that it detects has failed. Read carefully what output happened here versus what you expected to happen. You may want to use the arrow to show what commands were specifically run. Sometimes you may need to look in earlier steps to really pinpoint what has happened. You may want to Google those messages depending on what they are saying. If the message has to do with a script being called you will want to test those scripts you wrote elsewhere to make sure they are working. 7.1.3 Use workflow_dispatch/pull_request triggers for development Regardless of whether you want your final GitHub Action to run on a pull_request or workflow_dispatch triggers, it can be helpful during development to use these. You can have multiple triggers for a GitHub actions. The pull_request trigger is helpful for development so that every time you push to your pull request your action will be re-run automatically so you can see if what you tried worked. This is the default method (in my mind) for developing a GitHub action. The only instance you may not want to use this method is if you are using GitHub default variables that are different in pull requests than they will be in your final version of the GitHub Action. In that instance a pull_request may not be a representative test for you. The workflow_dispatch trigger is useful so you can re-trigger the workflow run whenever you need to test the next thing you tried for troubleshooting purposes. You might consider doing this if your testing as a pull request isn’t appropriate. You can also initiate workflow_dispatch workflow runs from any branch – so you don’t have to merge it before you’ve polished it. There are two caveats to this strategy: If you don’t want these triggers long term, make sure you delete them before you merge to main. Recall that if you are using default environment variables in your workflow runs that those change depending on the triggers, so those may not always be representative of the workflow run as you it will be run with the final version. 7.1.4 Print out things to test your assumptions You can check your assumptions about GitHub Actions is running things by printing out pieces of the action. For example, if you are using variables or file paths and suspect they are part of the issue, you can run ls or print out a variable with echo. run: | echo ${GITHUB_ACTION_PATH} ls This may help you figure out whether a variable or file isn’t in the place you think it is. 7.1.5 Use Marketplace Actions The great part about using GitHub Actions is that you can use other people’s actions from the marketplace so you don’t have to write everything from scratch! This however, does mean that you are dependent on the developers and maintainers of these GitHub Actions. There are three tips for troubleshooting a problematic GitHub Action that is borrowed from marketplace: Read their docs carefully and make sure you are using it as specified. Try bumping up to a later version if it looks like there’s a bug that may have been addressed. Try not to use Marketplace Actions that don’t show evidence of being maintained or don’t have fully fledged documentation! 7.2 Activity: Troubleshooting GitHub Actions What to know about these activities: None of these broken actions will require more than 1 simple line to fix it. So don’t spend too much time writing lots of code to fix these. There are clues in our descriptions here about what you should look out for in fixing these actions. So look out for these clues. Use standard troubleshooting tips to figure this out – Googling and iterative work and attempts is encouraged! Create a new branch to work from. From command line: `git checkout -b "troubleshoot-practice"` For this exercise we are going to copy purposely broken GitHub actions we will fix. Move all three files from the activity-3-find-the-break folder to to your .github/workflows directories you made in the previous chapter. From command line: mv activity-3-find-the-break/* .github/workflows/ Now follow the same set of steps we used in the previous chapters to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "troubleshooting exercise" git push --set-upstream origin troubleshooting Now create a pull request with the changes you just made. (Refer to the previous chapter if you need reminders on how to do this). Scroll to the bottom of this pull request. You’ll notice that only one of the three broken actions have a status here. If you don’t see a GitHub Action you expect to be running, you’ll need to go to Actions tab to see what’s happening. We’ll dive into this in this next section. 7.2.1 Broken Action 1 - Upload a file Let’s dive into the broken action 1 first. Let’s look into the logs. Go to the Actions tab and find the most recent workflow run that indicates its from .github/workflows/broken-action-1.yml. You’ll notice that this action and the second one don’t show up normally. They both have startup issues. Meaning their problems are so fundamental that GitHub can’t even process them to the point where they can begin to run. Generally start up issues have to do with: An essential specification is missing. There’s a syntax issue There’s a spacing issue Click on this issue’s log and scroll down to the bottom where it says “Annotation”. In this first case we have an error: a step cannot have both the `uses` and `run` keys What do you suppose this error means? When thinking about what you believe this error means, take a look at the parts of the yaml file that have uses and run keys. To recap, uses is a key we use when we are borrowing an action from the marketplace like the following: - name: Checkout files uses: actions/checkout@v3 And the run key we generally use for calling some bash or other language’s commands. like this: - name: Print out a thing run: echo Let's print a thing out! Both run and uses are calling commands. So given this information why do you suppose we are getting the error: a step cannot have both the `uses` and `run` keys 7.2.1.1 Fixing Action 1 Based on what you think is causing this error, attempt to make a change to the broken-action-1.yml file. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-1.yml. Look at the logs to see if the error is different or is fixed. If it ran successfully you’ll see it’s actual title show up in the logs. But regardless if it fails or succeeds you should check the logs. Repeat these steps until you have fixed action 1. 7.2.2 Broken Action 2 - Create an issue Let’s look into Action 2’s logs to try to fix it. Go to the Actions tab and find the most recent workflow run that indicates its from .github/workflows/broken-action-2.yml. This action, like the first one, has start up issues so it will not have its status shown on the pull request. Click on this issue’s log and scroll down to the bottom where it says “Annotation”. In this case we have an error: You have an error in your yaml syntax on line 11 What do you suppose this error means? What’s nice about this error is that it does tell us a specific line to look at. Keep in mind though sometimes when GitHub action tells us a line this may be approximate. We may need to look slightly before or slightly after the line it calls for us to know what to fix. 7.2.2.1 Fixing Action 2 Open up the broken-action-2.yml file. Take a look at the code around line 11 What do you notice that is different about these lines as compared to other actions we’ve looked at? Formulate a hypothesis on what you think is the problem and change that in broken-action-2.yml. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-2.yml. If it ran successfully you’ll see it’s actual title show up in the logs. But regardless if it fails or succeeds you should check the logs. Look at the logs to see if the error is different or is fixed. Repeat these steps until you have fixed action 2! 7.2.3 Broken Action 3 - Run script Finally, let’s look into Broken Action 3 - Run script. Go to the logs and look for a recent workflow run of that title. In this case, the Annotations might tell us Process completed with exit code 127. If we look online we can see that this means that either a script doesn’t exist or it can’t run. This is moderately helpful but doesn’t really help us identify the problem. So we have to dig into the log some more. When we click on the log it will likely open up the step that this workflow failed on. In this case we have an error: /__w/_temp/36dfc03e-56ed-43fa-9019-85d8b151f42a.sh: 2: python3: not found Running python here The /__w/_temp/36dfc03e-56ed-43fa-9019-85d8b151f42a.sh bit just tells us information about where this was being ran in the temporary workspace that GitHub was using to run our workflow. And if we look at our yaml file we can see the message: Running python here is just something we had echoed. What we’re going to want to zero in on here is the python3: not found. 7.2.3.1 Fixing Action 3 Open up the broken-action-3.yml file. It looks like Python is not able to be found. Look at the yaml file and try to figure out why that might be. Formulate a hypothesis on what you think is the problem and change that in broken-action-3.yml. Then add and commit that change and push it to GitHub. Return to your logs to see the most recent run of the action from .github/workflows/broken-action-3.yml. Look at the logs to see if the error is different or is fixed. Repeat these steps until you have fixed action 2! For a further hint about fixing this problem look here You’re going to want to look into what software packages the docker image referenced on in the image: key has. Does rocker/r-base image have Python? If not, you may need to look for a Docker image to use that has python. We’ve used one in a previous example that you could borrow. 7.3 Summary In this activity, we practiced troubleshooting GitHub Actions. We discussed some of the most common ways that GitHub Actions can be broken. Here’s a summary of troubleshooting tips covered in this chapter/ "],["applying-github-actions-examples.html", "Chapter 8 Applying GitHub Actions Examples 8.1 Explore the actions", " Chapter 8 Applying GitHub Actions Examples A great way to learn GitHub Actions is to borrow a yaml file someone else has written and incorporate it into your own project. In this chapter we will introduce you to two GitHub Actions and encourage you to adapt one or both of them to one of your own projects. We encourage you to follow these similar steps and tips for other actions that are written on the internet that you may find that you can use for your project. The two action examples we have in this chapter both work on pull_request triggers and do the following: Spell checks markdown and R Markdown files and saves the spelling errors in a file uploaded to GitHub . Style R code and commit it back to a branch. This option will take more work to adapt if you do not use R but is totally doable and we’ll walk you through some guidance on how to adapt it. 8.1 Explore the actions To get a sense of how these actions work, you guessed it, we are once again going to create a new branch and open a pull request. From command line: `git checkout -b "example-ghas"` For this exercise we are going to copy over a second GitHub Action YAML file from the folder. This time, move the spell-check.yml and style-code.yml files to your .github/workflows directories. From command line: mv activity-4-sample-github-actions/* .github/workflows/ Now follow the same set of steps we used in the previous chapter to Add, Commit, Push the changes. From command line: git add .github/* git commit -m "adding even more ghas" git push --set-upstream origin example-ghas Now create a pull request with the changes you just made. On your pull request page on GitHub, click on the Details button next to each of these workflow runs. Before moving on to the next section, take a look at the logs and yaml files and practice getting an idea of what these GitHub Actions are doing. 8.1.1 Diving deeper In this section, we’ll give you the basic recipe of these GitHub Actions. Hopefully by having a basic idea of what is in these actions you’ll be able to adapt them for your own purposes. 8.1.1.1 Spell Check Overview We’ll run through the yaml file and explain the basic set up here. We also have included links about the resources used at each step. We encourage you to poke around with these resources and with this yaml file to really learn about what these actions are doing. spell-check: runs-on: ubuntu-latest # We will run this on a Docker image so it has most of the things we need container: image: rocker/tidyverse:4.0.2 We’re using a Docker image that has some R packages we will use so we don’t have to install everything individually: rocker/tidyverse:4.0.2 # Need to get the files specific to our branch from our pull request - uses: actions/checkout@v3 with: ref: ${{ github.event.pull_request.head.ref }} Checking out the files we need using actions/checkout@v3 # Our docker image doesn't have this one package though so we'll install it - name: Install packages run: Rscript -e "install.packages('spelling')" Install a spelling package we need that wasn’t on the Docker image already. This is a reasonable strategy if we only need one or two packages that don’t take long to install. - name: Run spell check id: spell_check_run run: | sp_chk_results=$(Rscript "utils/spell-check.R") # This is where we are going to store output from this step to the environment so we can retrieve it in a later step echo "sp_chk_results=$sp_chk_results" >> $GITHUB_OUTPUT cat spell_check_results.tsv Run custom spell check script – this is where you’d really have to personalize this. We’re calling this custom R script that looks for the R Markdown and markdown files and spell checks them. Note that this means this script must be available to this GitHub Action if you are to use it. You’ll either need to download it or add it to whatever repo you add this to. We also have this print out the results in the log and save these results to the GITHUB_OUTPUT variable as discussed in the previous chapter. This allows us to retrieve the number of misspellings identified in a future step. # We want to retrieve this file after this runs so we can see what spell check errors were detected - name: Archive spelling errors uses: actions/upload-artifact@v3 # These arguments underneath `with` are generally action specific so we have to check the documentation: https://github.com/marketplace/actions/upload-a-build-artifact with: name: spell-check-results path: spell_check_results.tsv Archive spelling errors using actions/upload-artifact@v3. For Archived files, we can see them by going to Summary on the left side of the log page and scrolling down to the Artifacts section. This file we uploaded will tell us the spell check error our script detected. # If there are too many spelling errors, this will stop the workflow - name: Check spell check results - fail if too many errors # Here we are only going to through an error if there's more than 3 spell check errors detected if: ${{ steps.spell_check_run.outputs.sp_chk_results > 3 }} run: exit 1 We’ve discussed in previous chapters about conditional statements and using GitHub variables. Here we are using a conditional statement that will fail this GitHub Action if there are too many spelling errors (which here we’ve said is 3). 8.1.1.2 Style Code Overview Just as we did with the spell check action, we’ll run through the yaml file and explain the basic set up here. We also have included links about the resources used at each step. We encourage you to poke around with these resources and with this yaml file to really learn about what these actions are doing. style-code: runs-on: ubuntu-latest # This image has R and basic R packages container: image: rocker/tidyverse:4.0.2 We’re using a Docker image that has some R packages we will use so we don’t have to install everything individually: rocker/tidyverse:4.0.2 # Need to get the files specific to our branch from our pull request - name: Checkout files uses: actions/checkout@v3 with: ref: ${{ github.event.pull_request.head.ref }} We need this files in this repo. So we are checking out the files we need using actions/checkout@v3. # Our docker image doesn't have this one package though so we'll install it - name: Install packages run: Rscript -e "install.packages('styler')" Install a styler package we need that wasn’t on the Docker image already. This is a reasonable strategy if we only need one or two packages that don’t take long to install. # Here's the main thing we are running: styling R file code - name: Run styler on Rmd and R files run: Rscript -e "styler::style_file(list.files(pattern = 'Rmd$|R$', recursive = TRUE, full.names = TRUE));warnings()" Run a code styling command – this is where you’d need to customize this step. For example, if you wish to style Python code, you may look into this pycodestyle GitHub Action or perhaps other GitHub marketplace actions for your particular languages and needs. # We will automatically commit back our styled R files - name: Commit styled files run: | # Some config set up to establish creds git config --global --add safe.directory $GITHUB_WORKSPACE git config --global user.email "itcrtrainingnetwork@gmail.com" git config --global user.name "jhudsl-robot" # Now commit the styled files git add \\*.Rmd git add \\*.R git commit -m 'Style R files' || echo "No changes to commit" git push origin || echo "No changes to commit" In this step we are immediately merging any changed files that were styled back to the original branch that this pull request was on. The || allow these steps an alternative if there are no changes to commit. Without the || the action would break if the styling did not result in changes. What you’ll need to change here is the git config steps to use your own credentials. (aka git config --global user.email \"itcrtrainingnetwork@gmail.com\" should be your email). 8.1.2 Tips for adapting these to your own repository Here’s how you can adapt these to your own repository. First, you’ll want to add them to your own repository with a pull request. These actions will both be triggered by a pull request so you won’t need to edit them at all to start developing them to adapt them to your project needs. Note that for the spell check action, if you are deciding to use the customized script we included, you will need to copy that to the same file path as called in this action. Besides the points of customization you may need to work on that we discussed; you may also need to edit the Docker images used depending on what steps and uses you need. From here, its basically good luck! Take it one troubleshooting tactic at a time, Google your problems and look for GitHub Action marketplace actions that fit your needs. Best of luck! "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) Candace Savonen Content Author(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Technical Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Illustrator(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Figure Artist(s) Candace Savonen, and Jake Crawford - Scientific software development best practices Funding Funder(s) NCI UE5CA254170 Funding Staff Shasta Nicholson, Maleah O’Connor, [Sandy Ombrek]   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-03-15 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## askpass 1.1 2019-01-13 [1] RSPM (R 4.0.3) ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2024-03-13 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.6.1 2023-11-28 [1] CRAN (R 4.0.2) ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.2 2023-12-11 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.23 2023-11-01 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2024-03-13 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## openssl 1.4.3 2020-09-18 [1] RSPM (R 4.0.3) ## ottrpal 1.2.1 2024-03-13 [1] Github (jhudsl/ottrpal@48e8c44) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.3 2024-01-10 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2024-03-13 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.0.2) ## sass 0.4.8 2023-12-06 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2024-03-13 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2024-03-13 [1] Github (yihui/xfun@74c2a66) ## xml2 1.3.2 2020-04-23 [1] RSPM (R 4.0.3) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "Chapter 9 References", " Chapter 9 References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/no_toc/troubleshooting-github-actions.html b/docs/no_toc/troubleshooting-github-actions.html index cb02676..f16161f 100644 --- a/docs/no_toc/troubleshooting-github-actions.html +++ b/docs/no_toc/troubleshooting-github-actions.html @@ -131,9 +131,9 @@

    • 3.1.1 Why reproducibility is so important.
    • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -240,7 +240,7 @@

      Chapter 7 Troubleshooting GitHub Actions

      -

      Many of your standard programming troubleshooting skills are applicable with GitHub actions. In this chapter we’ll five you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like.

      +

      Many of your standard programming troubleshooting skills are applicable with GitHub Actions. In this chapter we’ll give you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like.

      7.1 Tips

      @@ -266,7 +266,7 @@

      7.1.1 Look out for silent errors!

      7.1.2 Look at the logs closely!

      -

      Whether you GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on.

      +

      Whether your GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on.

      You should start by scrolling down on the Actions page to look at the Annotations. This is GitHub Action’s summary of how the workflow ran. However, the summary is often unlikely to give you enough information to troubleshoot a failed action.

      In order to find out what the error message really is, you may need to dig into the logs deeper than that. Usually when you open up the log it will open up the step that it detects has failed.

      Read carefully what output happened here versus what you expected to happen. You may want to use the arrow to show what commands were specifically run. Sometimes you may need to look in earlier steps to really pinpoint what has happened.

      @@ -277,7 +277,7 @@

      7.1.3 Use workflow_dispatch/pull_

      Regardless of whether you want your final GitHub Action to run on a pull_request or workflow_dispatch triggers, it can be helpful during development to use these. You can have multiple triggers for a GitHub actions.

      The pull_request trigger is helpful for development so that every time you push to your pull request your action will be re-run automatically so you can see if what you tried worked. This is the default method (in my mind) for developing a GitHub action. The only instance you may not want to use this method is if you are using GitHub default variables that are different in pull requests than they will be in your final version of the GitHub Action. In that instance a pull_request may not be a representative test for you.

      The workflow_dispatch trigger is useful so you can re-trigger the workflow run whenever you need to test the next thing you tried for troubleshooting purposes. You might consider doing this if your testing as a pull request isn’t appropriate. You can also initiate workflow_dispatch workflow runs from any branch – so you don’t have to merge it before you’ve polished it.

      -

      There’s two caveats to this strategy:

      +

      There are two caveats to this strategy:

      1. If you don’t want these triggers long term, make sure you delete them before you merge to main.
      2. Recall that if you are using default environment variables in your workflow runs that those change depending on the triggers, so those may not always be representative of the workflow run as you it will be run with the final version.
      3. @@ -292,14 +292,14 @@

        7.1.4 Print out things to test yo

        This may help you figure out whether a variable or file isn’t in the place you think it is.

      -

      7.1.5 Use Marketplace actions

      +

      7.1.5 Use Marketplace Actions

      The great part about using GitHub Actions is that you can use other people’s actions from the marketplace so you don’t have to write everything from scratch!

      This however, does mean that you are dependent on the developers and maintainers of these GitHub Actions.

      There are three tips for troubleshooting a problematic GitHub Action that is borrowed from marketplace:

      1. Read their docs carefully and make sure you are using it as specified.
      2. Try bumping up to a later version if it looks like there’s a bug that may have been addressed.
      3. -
      4. Try not to use Marketplace actions that don’t show evidence of being maintained or don’t have fully fledged documentation!
      5. +
      6. Try not to use Marketplace Actions that don’t show evidence of being maintained or don’t have fully fledged documentation!
      diff --git a/docs/no_toc/why-automation.html b/docs/no_toc/why-automation.html index 24cc84c..a5a34a6 100644 --- a/docs/no_toc/why-automation.html +++ b/docs/no_toc/why-automation.html @@ -131,9 +131,9 @@
    • 3.1.1 Why reproducibility is so important.
    • 3.1.2 Automation as a reproducibility tool
  • -
  • 3.2 Continuous integration / Continous deployment +
  • 3.2 Continuous integration / Continuous deployment
  • @@ -182,7 +182,7 @@
  • 7.1.2 Look at the logs closely!
  • 7.1.3 Use workflow_dispatch/pull_request triggers for development
  • 7.1.4 Print out things to test your assumptions
  • -
  • 7.1.5 Use Marketplace actions
  • +
  • 7.1.5 Use Marketplace Actions
  • 7.2 Activity: Troubleshooting GitHub Actions
      @@ -263,9 +263,9 @@

      3.1.2 Automation as a reproducibi

      -
      -

      3.2 Continuous integration / Continous deployment

      -

      Before we discuss the concept of Continous integration / Continuous deployment (often abbreviated CI/CD), let’s use an analogy.

      +
      +

      3.2 Continuous integration / Continuous deployment

      +

      Before we discuss the concept of Continuous integration / Continuous deployment (often abbreviated CI/CD), let’s use an analogy.

      Obviously we are getting at here, that generally its a good idea to check work along the way instead of waiting until something is completely finished to test it.

      Software is no exception to this idea. Often if we send a collaborator an enormous amount of code to review; they are likely to feel overwhelmed and may not be able to give useful feedback.

      @@ -277,12 +277,12 @@

      3.2 Continuous integration / Cont

      Let’s assume over the course of developing a project, bugs are introduced at a certain rate.

      Without using CI/CD you may find yourself trying to fix many bugs at once! This will make the bugs harder to isolate and harder to fix and pinpoint. The amount of time it will take to fix 3 bugs at once may be exponentially higher than if you caught these bugs one at a time. Additionally, the longer amount of time that goes on before you catch a bug, it may be more likely it will get accidentally incorporated into your published results – this will be a lot more work for you and others to rectify.

      -

      However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run a muck! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress!

      +

      However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run amock! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress!

      This is not only true for classic “my script won’t run” bugs but also “silent” bugs – bugs where the analysis still ran to completion but perhaps the results were slightly different.

      -
      -

      3.2.1 Continous Integration / Continuous Deployment

      +
      +

      3.2.1 Continuous Integration / Continuous Deployment

      A workflow that uses CI/CD principles may look like this:

      The idea is we use version control and build aspects of our software. Before what we’ve built is incorporated into the published version, we will stage it and test it. By staging we mean that perhaps we keep it stored on a different branch and have ways that we can play around with the beta version of the analysis or product before our most recent adds are incorporated.