Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
…_Scientists into main
  • Loading branch information
jhudsl-robot committed Mar 15, 2024
2 parents ca0bfa6 + 42abefd commit 40d0539
Show file tree
Hide file tree
Showing 23 changed files with 139 additions and 130 deletions.
6 changes: 3 additions & 3 deletions docs/no_toc/01-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ The course is intended for students in the biomedical sciences and researchers w
_This course is written for individuals who:_

- Are comfortable with GitHub and know how to make a pull request
- Wish to save time and enhance their scientific projects using automation
- Perhaps previously tried to learn about GitHub Actions but felt overwhelmed on how to get started
- Wish to save time and enhance their scientific projects using automation
- Have perhaps tried to learn about GitHub Actions before but felt overwhelmed about how to start


<img src="resources/images/01-intro_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g1173f7473f7_0_0.png" width="100%" />
Expand Down Expand Up @@ -56,4 +56,4 @@ This course is meant to teach learners how to create sophisticated GitHub Action

## How to use the course

Ideally you should follow along with the chapters and perform they activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub.
Ideally you should follow along with the chapters and perform the activities as they are described. These activities involve using GitHub and GitHub actions. You will need a GitHub account and basic familiarity with GitHub.
5 changes: 2 additions & 3 deletions docs/no_toc/02-programming-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ We will take a closer look at two concrete examples, one on each end of the soft

Imagine that you have two sampling distributions (lists/arrays of numbers) and you want to test whether the means of the distributions are statistically equivalent or not.
This is the setup for a _t_-test.
_t_-tests are implemented in standard functions in R ([`base` library](https://rdrr.io/r/base/base-package.html)) and Python ([`scipy` library](https://scipy.org/)), as well as most other commonly used programming languages.
_t_-tests are implemented in standard functions in [base R](https://rdrr.io/r/base/base-package.html) and [Python `scipy` library](https://scipy.org/), as well as most other commonly used programming languages.

<img src="02-programming-practices_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g287bcb243d2_0_122.png" width="100%" style="display: block; margin: auto;" />

Expand All @@ -62,7 +62,7 @@ In your own software, you might need to do some verification of your input (for

Imagine that you have a list of reads from a [DNA sequencing](https://en.wikipedia.org/wiki/DNA_sequencing#High-throughput_methods) machine, and you want to use these data to answer a biological question, or to make a plot/visualization to communicate a biological insight.
This is a much less well-defined problem than our previous example, with many more independently operating components, and many more subjective decisions that a researcher must make along the way.
Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals.
Complex data analyses means complex decisions! This often means that decisions made are not so cut and dry and should rely on the scientific context of the data. In other words, analyses often are tailored to reflect the biology (or other science) and or perhaps the experimental goals.

<img src="02-programming-practices_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g28fb47c580d_0_152.png" width="100%" style="display: block; margin: auto;" />

Expand Down Expand Up @@ -96,4 +96,3 @@ This eliminates the need for you to remember to run tests, to clean up your code
<img src="02-programming-practices_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g28fb47c580d_0_55.png" width="100%" style="display: block; margin: auto;" />

Later in the course, we will talk more specifically about what exactly automation via continuous integration looks like, and go into more depth as to its uses and benefits.

8 changes: 4 additions & 4 deletions docs/no_toc/03-why-automation.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@ Returning to those 10 researchers, if instead of having those 10 people manually
<img src="03-why-automation_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g27a5d68b667_0_1764.png" width="100%" />


## Continuous integration / Continous deployment
## Continuous integration / Continuous deployment

Before we discuss the concept of Continous integration / Continuous deployment (often abbreviated CI/CD), let's use an analogy.
Before we discuss the concept of Continuous integration / Continuous deployment (often abbreviated CI/CD), let's use an analogy.


<img src="03-why-automation_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g27a8d088b93_0_11.png" width="100%" />
Expand All @@ -68,15 +68,15 @@ Let's assume over the course of developing a project, bugs are introduced at a c

Without using CI/CD you may find yourself trying to fix many bugs at once! This will make the bugs harder to isolate and harder to fix and pinpoint. The amount of time it will take to fix 3 bugs at once may be exponentially higher than if you caught these bugs one at a time. Additionally, the longer amount of time that goes on before you catch a bug, it may be more likely it will get accidentally incorporated into your published results -- this will be a lot more work for you and others to rectify.

However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run a muck! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress!
However with CI/CD you will likely catch these bugs earlier and have an easier time fixing them before they truly run amock! A good continuous integration / continuous deployment pipeline will help you identify these bugs early and save time and stress!

<img src="03-why-automation_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g286f0c8db1a_0_68.png" width="100%" />

This is not only true for classic "my script won't run" bugs but also "silent" bugs -- bugs where the analysis still ran to completion but perhaps the results were slightly different.

<img src="03-why-automation_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g286f0c8db1a_0_33.png" width="100%" />

### Continous Integration / Continuous Deployment
### Continuous Integration / Continuous Deployment

A workflow that uses CI/CD principles may look like this:

Expand Down
10 changes: 5 additions & 5 deletions docs/no_toc/04-gha-basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ All GitHub Actions involve answering three questions:
2. What should be run?
3. With what environment should the thing be run?

These questions and other specifications are set by writing a [YAML file](https://www.redhat.com/en/topics/automation/what-is-yaml). YAML files are human readable markup language files. Basically its a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we're going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do.
These questions and other specifications are set by writing a [YAML file](https://www.redhat.com/en/topics/automation/what-is-yaml). YAML files are human readable markup language files. Basically it's a list that is easy for humans to read and write and computers can read them too. This makes it good for writing a GitHub Action. Essentially, we're going to write a YAML file to make a recipe that GitHub will read to know what/when/with what we are trying to do.

<img src="04-gha-basics_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g280d2b56f79_0_2781.png" width="100%" style="display: block; margin: auto;" />

Expand All @@ -27,7 +27,7 @@ The headlines about working with YAML files:
1. Spacing is VERY specific! -- incorrect spacing will definitely result in errors for your GitHub Action run.

Let's take a look at an example YAML.
Note that the what comes before a `:` is generally a name and indent indicate subsets of a list. So in the overall list of `food` we have sublists of `vegetables` and `fruits`. `#` can be used as a comment and will not be treated as code.
Note that what comes before a `:` is generally a name and indentations indicate subsets of a list. So in the overall list of `food` we have sublists of `vegetables` and `fruits`. `#` can be used as a comment and will not be treated as code.

Additionally, `:` are often names. So `citrus` is the name for the item `oranges` and etc.
```
Expand Down Expand Up @@ -107,15 +107,15 @@ So for example, there are built in operating systems like `windows-latest`, `mac

<img src="04-gha-basics_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g280d2b56f79_0_2843.png" width="100%" style="display: block; margin: auto;" />

But just like a Windows machine straight out of a box is unlikely to have everything you need to run some code, you may need a more specific computing environment. You can also create custom environments using `containerization`.
Similarly to how a new Windows computer may not come equipped with all the software you need to execute certain code, you might require a more tailored computing environment. You can also create custom environments using `containerization`.

### Containerization

A "virtual machine" is basically when your computer creates its own fake computer inside of it. It's acting like a different computer but it doesn't have any additional physical parts.

<img src="04-gha-basics_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g285ab2029e8_0_2.png" width="100%" style="display: block; margin: auto;" />

Containers aren't virtual machines, but they do a similar thing, which is they spin up a computing environment where you can do things. They are called containers because they are isolated from the rest of your computer.
Containers, while not virtual machines, serve a similar purpose by creating an isolated computing environment where tasks can be performed. They are named "containers" due to their separation from the rest of your computer.

Containerization is useful because it allows us to share our computing environments with others. This is useful because it can be a powerful tool for reproducing analyses if we are controlling our computing environments.

Expand Down Expand Up @@ -151,7 +151,7 @@ Docker is a whole other world. There's whole conferences, hackathons, and etc de

<div class = "warning">

Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data on your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed!
Super important side note: DO NOT put data that needs to be secured like Personally Identifiable Information (PII) and Personal Health Information (PHI) data in your Docker images! Especially when you share them! They are not meant for this purpose and this data would be exposed!

</div>

Expand Down
13 changes: 7 additions & 6 deletions docs/no_toc/05-gha-run-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,12 @@ on:

In our `jobs:` we've named this job `R run analysis`.

Additionally we are running this on a `ubuntu-latest` operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn't need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, python, and some specific packages.
Additionally we are running this on a `ubuntu-latest` operating system, but as opposed to our first GitHub Action workflow from the previous chapter, where we didn't need any additional packages or software to run our job, this job, the analysis script we are running, requires things like R, Python, and some specific packages.

We could, attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made docker image that has R, python, and other packages we need already installed.
We could attempt to write a script that installs everything we need. However, that would likely be a lot of work, may not work reliably, and would be hard to track changes. Instead, we are using a custom made Docker image that has R, python, and other packages we need already installed.

This custom made Docker image is pulled from [Dockerhub](https://hub.docker.com/r/jhudsl/ottr_python). If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others [managed and version controlled on this GitHub repository](https://github.com/jhudsl/ottr_docker). You may note we use GitHub Actions on this repository to help us manage these Docker images.

This custom made docker image is pulled from [Dockerhub and it exists here](https://hub.docker.com/r/jhudsl/ottr_python). If you wish to make a custom Docker image to use in your analysis, easiest way to do this is to make a Dockerfile, build a Docker image from this file and then push it to Dockerhub. We have some Dockerfiles for this image and others [managed and version controlled here on this GitHub repository](https://github.com/jhudsl/ottr_docker). You may note we use GitHub Actions on this repository to help us manage these Docker images.
```
jobs:
re-run:
Expand All @@ -96,7 +97,7 @@ jobs:

#### actions/checkout

One of the most frequently use GitHub Actions [from the GitHub Action Marketplace is `actions/checkout`](https://github.com/actions/checkout). This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them).
One of the most frequently use GitHub Actions from the GitHub Action Marketplace is [`actions/checkout`](https://github.com/actions/checkout). This action will grab all the files from a GitHub repository so you can do things with those files in your workflow. (Recall that when you spin up a GitHub Action Environment it is a blank slate, so we have to put our files there too if we want to use them).

```
steps:
Expand All @@ -109,7 +110,7 @@ steps:

By default, it will checkout the files from the repository where this action is being run, but we could specify other repository and other branches.

`fetch-depth: 0` means we will grab all the file.
`fetch-depth: 0` means we will grab all the files.


#### sh run_analysis.sh
Expand All @@ -124,7 +125,7 @@ Additionally the `|` tells `run:` to expect multiple lines of a command. We didn
run: |
sh run_analysis.sh
```
We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the [run_analysis.sh](https://github.com/fhdsl/github-actions-workshop/blob/main/run_analysis.sh) file, you will see its basically simple workflow step calling file.
We have three steps in this fake analysis and the files are numbered in which order they are run. If you open up the [run_analysis.sh](https://github.com/fhdsl/github-actions-workshop/blob/main/run_analysis.sh) file, you will see it is basically a simple workflow step calling file.

It looks like this:
```
Expand Down
3 changes: 2 additions & 1 deletion docs/no_toc/06-gha-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ To do this we can use this sort of set up:

<img src="06-gha-variables_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g290614d43ec_0_2.png" width="100%" style="display: block; margin: auto;" />

Step that sets a variable depending on some output
Step that sets a variable depending on some output:

```
# How to export a variable to a next step
- name: Setting output to the environment at large
Expand Down
10 changes: 5 additions & 5 deletions docs/no_toc/07-gha-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<img src="07-gha-troubleshooting_files/figure-html//1x0Cnk2Wcsg8HYkmXnXo_0PxmYCxAwzVrUQzb8DUDvTA_g290614d43ec_0_62.png" width="100%" style="display: block; margin: auto;" />

Many of your standard programming troubleshooting skills are applicable with GitHub actions. In this chapter we'll five you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like.
Many of your standard programming troubleshooting skills are applicable with GitHub Actions. In this chapter we'll give you a few more tips for what might be the most common ways that GitHub Actions can break and what those error messages might look like.

## Tips

Expand Down Expand Up @@ -42,7 +42,7 @@ In order to design these steps you are going to need to look closely at your log

### Look at the logs closely!

Whether you GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on.
Whether your GitHub Action fails or not, go to the logs to see how they ran. You can get there by going to Actions tab and clicking on the workflow you want to check on.

You should start by scrolling down on the Actions page to look at the `Annotations`. This is GitHub Action's summary of how the workflow ran. However, the summary is often unlikely to give you enough information to troubleshoot a failed action.

Expand All @@ -60,7 +60,7 @@ The `pull_request` trigger is helpful for development so that every time you pus

The `workflow_dispatch` trigger is useful so you can re-trigger the workflow run whenever you need to test the next thing you tried for troubleshooting purposes. You might consider doing this if your testing as a pull request isn't appropriate. You can also initiate `workflow_dispatch` workflow runs from any branch -- so you don't have to merge it before you've polished it.

There's two caveats to this strategy:
There are two caveats to this strategy:

1. If you don't want these triggers long term, make sure you delete them before you merge to main.
2. Recall that if you are using default environment variables in your workflow runs that those change depending on the triggers, so those may not always be representative of the workflow run as you it will be run with the final version.
Expand All @@ -77,7 +77,7 @@ run: |

This may help you figure out whether a variable or file isn't in the place you think it is.

### Use Marketplace actions
### Use Marketplace Actions

The great part about using GitHub Actions is that you can use other people's actions from the marketplace so you don't have to write everything from scratch!

Expand All @@ -87,7 +87,7 @@ There are three tips for troubleshooting a problematic GitHub Action that is bor

1. Read their docs carefully and make sure you are using it as specified.
2. Try bumping up to a later version if it looks like there's a bug that may have been addressed.
3. Try not to use Marketplace actions that don't show evidence of being maintained or don't have fully fledged documentation!
3. Try not to use Marketplace Actions that don't show evidence of being maintained or don't have fully fledged documentation!

## Activity: Troubleshooting GitHub Actions

Expand Down
Loading

0 comments on commit 40d0539

Please sign in to comment.