-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
122 lines (85 loc) · 3.52 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# tidyusoc
<!-- badges: start -->
[![R-CMD-check](https://github.com/wklimowicz/tidyusoc/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/wklimowicz/tidyusoc/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
## Installation
You can install the development version of tidyusoc from [GitHub](https://github.com/) with:
```{r eval = FALSE}
# install.packages("devtools")
devtools::install_github("wklimowicz/tidyusoc")
```
## Setup
After downloading the Understanding Society Data from the UK Data Service website, run:
```{r example, eval = FALSE}
library(tidyusoc)
# Convert to Rds to save space and time
usoc_convert(usoc_directory = "spss/spss25",
new_directory = "rds",
filter_files = "indresp")
# Compile the INDRESP files in the rds folder
usoc <- usoc_compile(directory = "rds")
```
A small set of core variables is included by default:
```{r echo = FALSE}
tibble::tribble(~Variable, ~Description,
"pidp", "Personal ID",
"age", "Age",
"race", "Race/Ethnicity",
"sex", "Sex",
"country", "Country",
"gor_dv", "Government Office Region",
"qfhigh_dv", "Highest Qualification (Detailed, UKHLS Only)",
"hiqual_dv", "Highest Qualification",
"jbstat", "Job Status",
"jbsoc00_cc", "Occupation",
"fimnlabgrs_dv", "Labour Income"
) |> kableExtra::kbl(format = "pipe")
```
To add more variables, use the `extra_mappings` function. For variables that change over time use `pick_var`. For example, life satisfaction is:
* `lfsato` in waves 6-10 and 12-18 of the BHPS.
* `sclfsato` in waves 1-11 of the UKHLS.
The variable BMI (`bmi_dv`) doesn't change over time, so it's passed as a string.
```{r eval = FALSE}
custom_mappings <- function(cols) {
life_sat <- pick_var(c("sclfsato", "lfsato"), cols)
custom_variables <- tibble::tribble(
~usoc_name, ~new_name, ~type,
life_sat, "life_satisfaction", "factor",
"bmi_dv", "bmi_dv", "numeric"
)
return(custom_variables)
}
usoc <- usoc_compile("rds", extra_mappings = custom_mappings)
```
A good place to start looking for variables is the [Understanding Society Variable Search](https://www.understandingsociety.ac.uk/documentation/mainstage/dataset-documentation), and the [Documentation](https://www.understandingsociety.ac.uk/documentation).
## Compiling different surveys.
By default, these scripts compile the `indresp` file - you can change this with the `file` argument. For example, to compile the `youth` response files:
```{r eval = FALSE}
usoc_convert(usoc_directory = "spss/spss25",
new_directory = "rds",
filter_files = "youth")
usoc <- usoc_compile(directory = "rds", file = "youth")
```
## Using the data across multiple projects.
Adding an environment variable called `DATA_DIRECTORY` pointing at a folder, and using the `save_to_folder` argument saves an fst of the final dataset, so it can be loaded from anywhere.
```{r eval = FALSE}
library(tidyusoc)
usoc_compile(directory = "rds", save_to_folder = TRUE)
usoc <- usoc_load()
```
To read any of the other compiled files change the `file` argument:
```{r eval = FALSE}
usoc <- usoc_load(file = "youth")
```