Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Glue ETL spreadsheet Google LIMS processing #19

Merged
merged 1 commit into from
Jan 11, 2025

Conversation

victorskl
Copy link
Member

  • Story: Let Glue the Google LIMS!
    Following the warehouse framework methodology that is being built, let
    glue the Google LIMS sheet as the second spreadsheet importing use case.
    The target is OrcaVault database staging data area tsa schema table.
    Since it is for data warehouse purpose, the ETL approach retain all
    factual information; without reshaping much or dropping any but column
    renaming and harmonisation. Light-weight data clean up tasks.
  • Think of it as the Google LIMS spreadsheet now become a table in database;
    as-is all columns and values are being retained.
  • Change history tracking and records archival will be further implemented
    by downstream warehouse layers in psa and vault schema.
  • Technical steps are now mainly inherited by the framework implemented in PRs
    Implemented Glue ETL spreadsheet processing #13 and Implemented Glue ETL job script deployment using terraform #14. Hence, this Glue data import job becomes pretty straight forward
    task and, cookiecutter template code with only need to focus on transformation.

* Story: Let Glue the Google LIMS!
  Following the warehouse framework methodology that is being built, let
  glue the Google LIMS sheet as the second spreadsheet importing use case.
  The target is OrcaVault database staging data area tsa schema table.
  Since it is for data warehouse purpose, the ETL approach retain all
  factual information; without reshaping much or dropping any but column
  renaming and harmonisation. Light-weight data clean up tasks.
* Think of it as the Google LIMS spreadsheet now become a table in database;
  as-is all columns and values are being retained.
* Change history tracking and records archival will be further implemented
  by downstream warehouse layers in psa and vault schema.
* Technical steps are now mainly inherited by the framework implemented in PRs
  #13 and #14. Hence, this Glue data import job becomes pretty straight forward
  task and, cookiecutter template code with only need to focus on transformation.
@victorskl victorskl self-assigned this Jan 11, 2025
@victorskl victorskl added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 11, 2025
@victorskl victorskl added this pull request to the merge queue Jan 11, 2025
Merged via the queue into main with commit cabd274 Jan 11, 2025
4 checks passed
@victorskl victorskl deleted the implement-glue-google-lims branch January 11, 2025 23:58
victorskl added a commit that referenced this pull request Jan 12, 2025
* Story: Let Glue the Google LIMS! (continue)
  Now that we have `tsa.spreadsheet_library_tracking_metadata` in staging data area
  by Glue ETL job in #19, we can source this data table with dbt to further feed into
  the downstream warehouse layers in psa and vault schema.
* Technical steps are now mainly inherited by the framework implemented in PR #15.
  Hence, this step becomes pretty straight forward task and template code.
victorskl added a commit that referenced this pull request Jan 12, 2025
* Story: Let Glue the Google LIMS! (continue)
  Now that we have `tsa.spreadsheet_google_lims` in staging data area
  by Glue ETL job in #19, we can source this data table with dbt to further feed into
  the downstream warehouse layers in psa and vault schema.
* Technical steps are now mainly inherited by the framework implemented in PR #15.
  Hence, this step becomes pretty straight forward task and template code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant