Skip to content

thedamnedrhino/airflow-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tolldata processing

An airflow pipeline to process tolldata for airflow running on docker.

Airflow config

data/ folder mounted on host machine in docker-compose.yml.

Pipeline Steps

1. Download tolldata. Output: tolldata.tgz.

-- via airflow BashOperator.

2. Extract tolldata files. Output: tolldata_unzipped.

-- via airflow BashOperator.

3. Preprocess vehicle-types.csv. Output: csv.csv.

-- pandas and Python @task decorator.

4. Preprocess tolldata-data.tsv -> Output: tsv.csv.

-- pandas and Python @task decorator.

5. Preprocess payment-data.txt. Output: fwf.csv.

-- pandas and Python @task decorator.

6. Merge the results. Output: merged.csv.

-- pandas and Python @task decorator.

7. Transform the result. Output: result.csv.

-- pandas and Python @task decorator.

8. Test the results.

-- pandas, Python @task decorator and assert statements`.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages