-
Notifications
You must be signed in to change notification settings - Fork 5
03. Data Infrastructure
Galileu Kim edited this page Jul 17, 2023
·
4 revisions
The data pipeline is currently contained in the code
within the repo. The data pipeline leverages the (bookdown)[https://bookdown.org/yihui/bookdown/html.html] package in order to generate documentation on the pipeline, as well as executing it. The main file is therefore the index.Rmd
file, which executes the numbered .Rmd
files, each file corresponding to a particular step in our data ETL process.
The data required for initiating the Data ETL is located in the data/raw
folder. Per World Bank guidelines, this data is not stored on GitHub, but is accessible to collaborators upon request.
The data/raw
folder contains the following files:
-
db_variables.xlsx
: master file with all indicators, their definitions and families. -
merged_for_residuals-v2.rds
: original files, imported from adta
file processed prior to the ETL. -
CBIData_Romelli2022.dta
: contains CBI data. -
20211118_new_additions_notGov360.dta
: GTMI data and other non-Gov360 data. -
20211118_new_additions_notGov360_PMR.dta
: ? -
group_list.csv
: list of countries produced by team. -
CLASS.xlsx
: income group classification of countries, produced by the World Bank. -
WB_countries_Admin0_lowres.geojson
: administrative boundaries for countries, produced by the World Bank. -
WB_disputed_areas_Admin0_10m.geojson
: disputed areas, produced by the World Bank.