Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards v0.1.0 #62

Merged
merged 44 commits into from
Nov 22, 2024
Merged

Towards v0.1.0 #62

merged 44 commits into from
Nov 22, 2024

Conversation

eroell
Copy link
Collaborator

@eroell eroell commented Nov 2, 2024

Towards a Prototype 0.1.0
This PR entails multiple in-sync forward moving developments.
It should be a major step towards (although not yet completing) a prototype with limited, but partially stable functionality for further testing.

  • Supports a single backend: duckdb.
  • Fixes Add tests for functions #28
    • Add/fix tests for mimic_iv_omop, gibleed_omop, synthea27nj_omop
  • Fixes OMOP Extraction of measurement, observation, specimen #60
    • Only if all units for a feature are the same; otherwise raises Error
    • Allow to choose value columns "to keep" or just "encode" with a 1 if present.
  • Fixes OMOP extraction of era and other interval style tables #61
    • allow to keep either a) only the event start date, b) only the event end date, or c) an event-interval; the interval-table gets populated across the "stretch" of this event-interval, with all intervals intersecting with the event-interval getting populated with this event
    • Allow to choose value columns "to keep" or just "encode" with a 1 if present.
    • drug_exposure
    • condition_occurrence
    • procedure_occurrence
    • device_exposure
    • drug_era
    • dose_era
    • condition_era
    • episode
  • Use download slightly adapted/fixed from ehrapy.data._dataloader.py in ehrdata.dt.dataloader since Feature/physionet2012 dataset #64 for omop demo datasets
  • Fixes Consistent logging #65: logging instead of print and rich.print
  • Fixes Case insensitive handling of column names #66: Able to handle column names in tables to be of different capitalization; internally puts all column names to lowercase.
  • Fixes Create option to extract "presence-indicating" variable #67: can extract either the value of a column in a target data table, or a binary presence-indicating variable; for person P, Variable V, timestep T, then edata.r[P,V,T]=1 if P had V recorded within T, else 0. Implemented via data_field_to_keep=[<optionally more data fields>, "is_present"].

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@eroell eroell changed the title first change in notebook v0.1.0 Nov 4, 2024
@eroell eroell mentioned this pull request Nov 4, 2024
@eroell eroell changed the title v0.1.0 Towards v0.1.0 Nov 17, 2024
@eroell
Copy link
Collaborator Author

eroell commented Nov 18, 2024

The failing test with the duckdb release candidates for 1.1.4 (1.1.4.dev1919 currently) for table drug_exposure in mimic-iv-demo-data-in-the-omop-common-data-model-0.9 comes from the use of a % in the column drug_source_value of drug_exposure, e.g. row 14299 with value Syringe (0.9% Sodium Chloride) 1 Syringe.

Can be fixed by adding the escapechar="%" argument to duckdb.read_csv.

consider to raise to duckdb, as this works with latest stable release 1.1.3

@eroell
Copy link
Collaborator Author

eroell commented Nov 18, 2024

opened issue on duckdb duckdb/duckdb#14874

@eroell eroell marked this pull request as ready for review November 22, 2024 15:14
@eroell eroell merged commit 5be55fb into main Nov 22, 2024
6 checks passed
@eroell eroell deleted the dev/prototype-0.1.0 branch November 22, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant