tidytof: A user-friendly framework for interactive and highly reproducible cytometry data analysis
-
+
@@ -127,17 +127,17 @@ Installation
-if(!require(devtools)) install.packages("devtools")
+if (!require(devtools)) install.packages("devtools")
devtools::install_github("keyes-timothy/tidytof")
-if(!require(devtools)) install.packages("devtools")
+if (!require(devtools)) install.packages("devtools")
devtools::install_github("keyes-timothy/tidytof")
Once tidytof is installed, you can attach it to your current R session using the following code:
In addition, we can install and load the other packages we need for this vignette:
Reading data with tof_read_data
Using one of these directories (or any other directory containing cytometry data on your local machine), we can use tof_read_data
to read cytometry data from raw files. Acceptable formats include .fcs files and .csv files. Importantly, tof_read_data
is smart enough to read single .fcs/.csv files or multiple .fcs/.csv files depending on whether its first argument (path
) leads to a single file or to a directory of files.
Here, we can use tof_read_data
to read in all of the .fcs files in the “phenograph” example dataset bundled into tidytof and store it in the phenograph
variable.
-phenograph <-
- tidytof_example_data("phenograph") |>
- tof_read_data()
+phenograph <-
+ tidytof_example_data("phenograph") |>
+ tof_read_data()
-phenograph |>
- class()
+phenograph |>
+ class()
#> [1] "tof_tbl" "tbl_df" "tbl" "data.frame"
Regardless of its input format, tidytof reads data into an extended tibble
called a tof_tbl
(pronounced “tof tibble”), an S3 class identical to tbl_df
, but with one additional attribute (“panel”). tidytof stores this additional attribute in tof_tbl
s because, in addition to analyzing cytometry data from individual experiments, cytometry users often want to compare panels between experiments to find common markers or to compare which metals are associated with particular markers across panels.
A few notes about tof_tbl
s:
@@ -181,18 +181,18 @@ Reading data with tof_read_data
Because tof_tbl
s inherit from the tbl_df
class, all methods available to tibbles are also available to tof_tbl
s. For example, dplyr’s useful mutate
method can be applied to our tof_tbl
named phenograph
above to convert the columns encoding the phenograph cluster ID and stimulation condition to which each cell belongs into character vectors (instead of their original numeric codes in the uncleaned dataset).
-phenograph <-
- phenograph |>
- # mutate the input tof_tbl
- mutate(
- PhenoGraph = as.character(PhenoGraph),
- Condition = as.character(Condition)
- )
+phenograph <-
+ phenograph |>
+ # mutate the input tof_tbl
+ mutate(
+ PhenoGraph = as.character(PhenoGraph),
+ Condition = as.character(Condition)
+ )
-phenograph |>
- # use dplyr's select method to show that the columns have been changed
- select(where(is.character)) |>
- head()
+phenograph |>
+ # use dplyr's select method to show that the columns have been changed
+ select(where(is.character)) |>
+ head()
#> # A tibble: 6 × 3
#> file_name PhenoGraph Condition
#> <chr> <chr> <chr>
@@ -204,14 +204,14 @@ Reading data with tof_read_data
#> 6 H1_PhenoGraph_cluster1.fcs 12 12
The tof_tbl
class is preserved even after these transformations.
Finally, to retrieve panel information from a tof_tbl
, use tof_get_panel
:
-phenograph |>
- tof_get_panel() |>
- head()
+phenograph |>
+ tof_get_panel() |>
+ head()
#> # A tibble: 6 × 2
#> metals antigens
#> <chr> <chr>
@@ -232,9 +232,9 @@ Pre-processing with tof_prepro
As an example, we can preprocess our phenograph
tof_tibble
above and see how our first few measurements change before and after.
# before preprocessing
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -246,14 +246,14 @@ Pre-processing with tof_prepro
#> 6 448. 2.69 11.1
# perform preprocessing
-phenograph <-
- phenograph |>
- tof_preprocess()
+phenograph <-
+ phenograph |>
+ tof_preprocess()
# inspect new values
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -275,8 +275,8 @@ Downsampling with tof_downsample
data(phenograph_data)
-phenograph_data |>
- count(phenograph_cluster)
+phenograph_data |>
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -285,15 +285,15 @@ Downsampling with tof_downsample
#> 3 cluster3 1000
To randomly sample 200 cells per cluster, we can use tof_downsample
using the “constant” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "constant",
- group_cols = phenograph_cluster,
- num_cells = 200
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "constant",
+ group_cols = phenograph_cluster,
+ num_cells = 200
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -302,15 +302,15 @@ Downsampling with tof_downsample
#> 3 cluster3 200
Alternatively, if we wanted to sample 50% of the cells in each cluster, we could use the “prop” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "prop",
- group_cols = phenograph_cluster,
- prop_cells = 0.5
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "prop",
+ group_cols = phenograph_cluster,
+ prop_cells = 0.5
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -319,30 +319,30 @@ Downsampling with tof_downsample
#> 3 cluster3 500
And finally, you might also be interested in taking a slightly different approach to downsampling that downsamples the number of cells not to a fixed constant or proportion, but to a fixed density in phenotypic space. For example, the following scatterplot demonstrates that there are certain areas of phenotypic density in phenograph_data
that contain more cells than others along the cd34
/cd38
axes:
-phenograph_data |>
- # preprocess all numeric columns in the dataset
- tof_preprocess(undo_noise = FALSE) |>
- # make a scatterplot
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ # preprocess all numeric columns in the dataset
+ tof_preprocess(undo_noise = FALSE) |>
+ # make a scatterplot
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
To reduce the number of cells in our dataset until the local density around each cell in our dataset is relatively constant, we can use the “density” method
of tof_downsample
:
-phenograph_data |>
- tof_preprocess(undo_noise = FALSE) |>
- tof_downsample(
- density_cols = c(cd34, cd38),
- target_prop_cells = 0.25,
- method = "density",
- ) |>
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ tof_preprocess(undo_noise = FALSE) |>
+ tof_downsample(
+ density_cols = c(cd34, cd38),
+ target_prop_cells = 0.25,
+ method = "density",
+ ) |>
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
For more details, check out the documentation for the 3 underlying members of the tof_downsample_*
function family (which are wrapped by tof_downsample
):
@@ -357,16 +357,16 @@ Writing data with tof_write_data
Finally, users may wish to store single-cell data as .fcs or .csv files after transformation, concatenation, filtering, or other data processing steps such as dimensionality reduction and/or clustering (see below). To write single-cell data from a tof_tbl
into .fcs or .csv files, use tof_write_data
.
-# when copying and pasting this code, feel free to change this path
+# when copying and pasting this code, feel free to change this path
# to wherever you'd like to save your output files
my_path <- file.path("~", "Desktop", "tidytof_vignette_files")
-phenograph_data |>
- tof_write_data(
- group_cols = phenograph_cluster,
- out_path = my_path,
- format = "fcs"
- )
+phenograph_data |>
+ tof_write_data(
+ group_cols = phenograph_cluster,
+ out_path = my_path,
+ format = "fcs"
+ )
Using one of these directories (or any other directory containing cytometry data on your local machine), we can use tof_read_data
to read cytometry data from raw files. Acceptable formats include .fcs files and .csv files. Importantly, tof_read_data
is smart enough to read single .fcs/.csv files or multiple .fcs/.csv files depending on whether its first argument (path
) leads to a single file or to a directory of files.
Here, we can use tof_read_data
to read in all of the .fcs files in the “phenograph” example dataset bundled into tidytof and store it in the phenograph
variable.
-phenograph <-
- tidytof_example_data("phenograph") |>
- tof_read_data()
+phenograph <-
+ tidytof_example_data("phenograph") |>
+ tof_read_data()
-phenograph |>
- class()
+phenograph |>
+ class()
#> [1] "tof_tbl" "tbl_df" "tbl" "data.frame"
Regardless of its input format, tidytof reads data into an extended tibble
called a tof_tbl
(pronounced “tof tibble”), an S3 class identical to tbl_df
, but with one additional attribute (“panel”). tidytof stores this additional attribute in tof_tbl
s because, in addition to analyzing cytometry data from individual experiments, cytometry users often want to compare panels between experiments to find common markers or to compare which metals are associated with particular markers across panels.
A few notes about tof_tbl
s:
Reading data with tof_read_data
Because tof_tbl
s inherit from the tbl_df
class, all methods available to tibbles are also available to tof_tbl
s. For example, dplyr’s useful mutate
method can be applied to our tof_tbl
named phenograph
above to convert the columns encoding the phenograph cluster ID and stimulation condition to which each cell belongs into character vectors (instead of their original numeric codes in the uncleaned dataset).
-phenograph <-
- phenograph |>
- # mutate the input tof_tbl
- mutate(
- PhenoGraph = as.character(PhenoGraph),
- Condition = as.character(Condition)
- )
+phenograph <-
+ phenograph |>
+ # mutate the input tof_tbl
+ mutate(
+ PhenoGraph = as.character(PhenoGraph),
+ Condition = as.character(Condition)
+ )
-phenograph |>
- # use dplyr's select method to show that the columns have been changed
- select(where(is.character)) |>
- head()
+phenograph |>
+ # use dplyr's select method to show that the columns have been changed
+ select(where(is.character)) |>
+ head()
#> # A tibble: 6 × 3
#> file_name PhenoGraph Condition
#> <chr> <chr> <chr>
@@ -204,14 +204,14 @@ Reading data with tof_read_data
#> 6 H1_PhenoGraph_cluster1.fcs 12 12
The tof_tbl
class is preserved even after these transformations.
Finally, to retrieve panel information from a tof_tbl
, use tof_get_panel
:
-phenograph |>
- tof_get_panel() |>
- head()
+phenograph |>
+ tof_get_panel() |>
+ head()
#> # A tibble: 6 × 2
#> metals antigens
#> <chr> <chr>
@@ -232,9 +232,9 @@ Pre-processing with tof_prepro
As an example, we can preprocess our phenograph
tof_tibble
above and see how our first few measurements change before and after.
# before preprocessing
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -246,14 +246,14 @@ Pre-processing with tof_prepro
#> 6 448. 2.69 11.1
# perform preprocessing
-phenograph <-
- phenograph |>
- tof_preprocess()
+phenograph <-
+ phenograph |>
+ tof_preprocess()
# inspect new values
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -275,8 +275,8 @@ Downsampling with tof_downsample
data(phenograph_data)
-phenograph_data |>
- count(phenograph_cluster)
+phenograph_data |>
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -285,15 +285,15 @@ Downsampling with tof_downsample
#> 3 cluster3 1000
To randomly sample 200 cells per cluster, we can use tof_downsample
using the “constant” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "constant",
- group_cols = phenograph_cluster,
- num_cells = 200
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "constant",
+ group_cols = phenograph_cluster,
+ num_cells = 200
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -302,15 +302,15 @@ Downsampling with tof_downsample
#> 3 cluster3 200
Alternatively, if we wanted to sample 50% of the cells in each cluster, we could use the “prop” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "prop",
- group_cols = phenograph_cluster,
- prop_cells = 0.5
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "prop",
+ group_cols = phenograph_cluster,
+ prop_cells = 0.5
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -319,30 +319,30 @@ Downsampling with tof_downsample
#> 3 cluster3 500
And finally, you might also be interested in taking a slightly different approach to downsampling that downsamples the number of cells not to a fixed constant or proportion, but to a fixed density in phenotypic space. For example, the following scatterplot demonstrates that there are certain areas of phenotypic density in phenograph_data
that contain more cells than others along the cd34
/cd38
axes:
-phenograph_data |>
- # preprocess all numeric columns in the dataset
- tof_preprocess(undo_noise = FALSE) |>
- # make a scatterplot
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ # preprocess all numeric columns in the dataset
+ tof_preprocess(undo_noise = FALSE) |>
+ # make a scatterplot
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
To reduce the number of cells in our dataset until the local density around each cell in our dataset is relatively constant, we can use the “density” method
of tof_downsample
:
-phenograph_data |>
- tof_preprocess(undo_noise = FALSE) |>
- tof_downsample(
- density_cols = c(cd34, cd38),
- target_prop_cells = 0.25,
- method = "density",
- ) |>
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ tof_preprocess(undo_noise = FALSE) |>
+ tof_downsample(
+ density_cols = c(cd34, cd38),
+ target_prop_cells = 0.25,
+ method = "density",
+ ) |>
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
For more details, check out the documentation for the 3 underlying members of the tof_downsample_*
function family (which are wrapped by tof_downsample
):
@@ -357,16 +357,16 @@ Writing data with tof_write_data
Finally, users may wish to store single-cell data as .fcs or .csv files after transformation, concatenation, filtering, or other data processing steps such as dimensionality reduction and/or clustering (see below). To write single-cell data from a tof_tbl
into .fcs or .csv files, use tof_write_data
.
-# when copying and pasting this code, feel free to change this path
+# when copying and pasting this code, feel free to change this path
# to wherever you'd like to save your output files
my_path <- file.path("~", "Desktop", "tidytof_vignette_files")
-phenograph_data |>
- tof_write_data(
- group_cols = phenograph_cluster,
- out_path = my_path,
- format = "fcs"
- )
+phenograph_data |>
+ tof_write_data(
+ group_cols = phenograph_cluster,
+ out_path = my_path,
+ format = "fcs"
+ )
tof_tbl
s inherit from the tbl_df
class, all methods available to tibbles are also available to tof_tbl
s. For example, dplyr’s useful mutate
method can be applied to our tof_tbl
named phenograph
above to convert the columns encoding the phenograph cluster ID and stimulation condition to which each cell belongs into character vectors (instead of their original numeric codes in the uncleaned dataset).
-phenograph <-
- phenograph |>
- # mutate the input tof_tbl
- mutate(
- PhenoGraph = as.character(PhenoGraph),
- Condition = as.character(Condition)
- )
+phenograph <-
+ phenograph |>
+ # mutate the input tof_tbl
+ mutate(
+ PhenoGraph = as.character(PhenoGraph),
+ Condition = as.character(Condition)
+ )
-phenograph |>
- # use dplyr's select method to show that the columns have been changed
- select(where(is.character)) |>
- head()
+phenograph |>
+ # use dplyr's select method to show that the columns have been changed
+ select(where(is.character)) |>
+ head()
#> # A tibble: 6 × 3
#> file_name PhenoGraph Condition
#> <chr> <chr> <chr>
@@ -204,14 +204,14 @@ Reading data with tof_read_data
#> 6 H1_PhenoGraph_cluster1.fcs 12 12
The tof_tbl
class is preserved even after these transformations.
Finally, to retrieve panel information from a tof_tbl
, use tof_get_panel
:
-phenograph |>
- tof_get_panel() |>
- head()
+phenograph |>
+ tof_get_panel() |>
+ head()
#> # A tibble: 6 × 2
#> metals antigens
#> <chr> <chr>
@@ -232,9 +232,9 @@ Pre-processing with tof_prepro
As an example, we can preprocess our phenograph
tof_tibble
above and see how our first few measurements change before and after.
# before preprocessing
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -246,14 +246,14 @@ Pre-processing with tof_prepro
#> 6 448. 2.69 11.1
# perform preprocessing
-phenograph <-
- phenograph |>
- tof_preprocess()
+phenograph <-
+ phenograph |>
+ tof_preprocess()
# inspect new values
-phenograph |>
- select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
- head()
+phenograph |>
+ select(`CD45|Sm154`, `CD34|Nd148`, `CD38|Er167`) |>
+ head()
#> # A tibble: 6 × 3
#> `CD45|Sm154` `CD34|Nd148` `CD38|Er167`
#> <dbl> <dbl> <dbl>
@@ -275,8 +275,8 @@ Downsampling with tof_downsample
data(phenograph_data)
-phenograph_data |>
- count(phenograph_cluster)
+phenograph_data |>
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -285,15 +285,15 @@ Downsampling with tof_downsample
#> 3 cluster3 1000
To randomly sample 200 cells per cluster, we can use tof_downsample
using the “constant” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "constant",
- group_cols = phenograph_cluster,
- num_cells = 200
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "constant",
+ group_cols = phenograph_cluster,
+ num_cells = 200
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -302,15 +302,15 @@ Downsampling with tof_downsample
#> 3 cluster3 200
Alternatively, if we wanted to sample 50% of the cells in each cluster, we could use the “prop” method
:
-phenograph_data |>
- # downsample
- tof_downsample(
- method = "prop",
- group_cols = phenograph_cluster,
- prop_cells = 0.5
- ) |>
- # count the number of downsampled cells in each cluster
- count(phenograph_cluster)
+phenograph_data |>
+ # downsample
+ tof_downsample(
+ method = "prop",
+ group_cols = phenograph_cluster,
+ prop_cells = 0.5
+ ) |>
+ # count the number of downsampled cells in each cluster
+ count(phenograph_cluster)
#> # A tibble: 3 × 2
#> phenograph_cluster n
#> <chr> <int>
@@ -319,30 +319,30 @@ Downsampling with tof_downsample
#> 3 cluster3 500
And finally, you might also be interested in taking a slightly different approach to downsampling that downsamples the number of cells not to a fixed constant or proportion, but to a fixed density in phenotypic space. For example, the following scatterplot demonstrates that there are certain areas of phenotypic density in phenograph_data
that contain more cells than others along the cd34
/cd38
axes:
-phenograph_data |>
- # preprocess all numeric columns in the dataset
- tof_preprocess(undo_noise = FALSE) |>
- # make a scatterplot
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ # preprocess all numeric columns in the dataset
+ tof_preprocess(undo_noise = FALSE) |>
+ # make a scatterplot
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
To reduce the number of cells in our dataset until the local density around each cell in our dataset is relatively constant, we can use the “density” method
of tof_downsample
:
-phenograph_data |>
- tof_preprocess(undo_noise = FALSE) |>
- tof_downsample(
- density_cols = c(cd34, cd38),
- target_prop_cells = 0.25,
- method = "density",
- ) |>
- ggplot(aes(x = cd34, y = cd38)) +
- geom_point(alpha = 0.5) +
- scale_x_continuous(limits = c(NA, 1.5)) +
- scale_y_continuous(limits = c(NA, 4)) +
- theme_bw()
+phenograph_data |>
+ tof_preprocess(undo_noise = FALSE) |>
+ tof_downsample(
+ density_cols = c(cd34, cd38),
+ target_prop_cells = 0.25,
+ method = "density",
+ ) |>
+ ggplot(aes(x = cd34, y = cd38)) +
+ geom_point(alpha = 0.5) +
+ scale_x_continuous(limits = c(NA, 1.5)) +
+ scale_y_continuous(limits = c(NA, 4)) +
+ theme_bw()
For more details, check out the documentation for the 3 underlying members of the tof_downsample_*
function family (which are wrapped by tof_downsample
):
-
@@ -357,16 +357,16 @@
Writing data with tof_write_data
Finally, users may wish to store single-cell data as .fcs or .csv files after transformation, concatenation, filtering, or other data processing steps such as dimensionality reduction and/or clustering (see below). To write single-cell data from a tof_tbl
into .fcs or .csv files, use tof_write_data
.
-# when copying and pasting this code, feel free to change this path
+# when copying and pasting this code, feel free to change this path
# to wherever you'd like to save your output files
my_path <- file.path("~", "Desktop", "tidytof_vignette_files")
-phenograph_data |>
- tof_write_data(
- group_cols = phenograph_cluster,
- out_path = my_path,
- format = "fcs"
- )
tof_write_data
’s trickiest argument is group_cols
, the argument used to specify which columns in tof_tibble
should be used to group cells (i.e. the rows of tof_tibble
) into separate .fcs or .csv files. Simply put, this argument allows tof_write_data
to create a single .fcs or .csv file for each unique combination of values in the columns specified by the user. In the example above, cells are grouped into 3 output .fcs files - one for each of the 3 clusters encoded by the phenograph_cluster
column in phenograph_data
. These files should have the following names (derived from the values in the phenograph_cluster
column):
- cluster1.fcs @@ -375,15 +375,15 @@
Writing data with tof_write_data
However, suppose we wanted to write multiple files for each cluster by breaking cells into two groups: those that express high levels of pstat5
and those that express low levels of pstat5
. We can use dplyr::mutate
to create a new column in phenograph_data
that breaks cells into high- and low-pstat5
expression groups, then add this column to our group_cols
specification:
-phenograph_data |>
- # create a variable representing if a cell is above or below the median
- # expression level of pstat5
- mutate(expression_group = if_else(pstat5 > median(pstat5), "high", "low")) |>
- tof_write_data(
- group_cols = c(phenograph_cluster, expression_group),
- out_path = my_path,
- format = "fcs"
- )
phenograph_data |>
+ # create a variable representing if a cell is above or below the median
+ # expression level of pstat5
+ mutate(expression_group = if_else(pstat5 > median(pstat5), "high", "low")) |>
+ tof_write_data(
+ group_cols = c(phenograph_cluster, expression_group),
+ out_path = my_path,
+ format = "fcs"
+ )