-
Notifications
You must be signed in to change notification settings - Fork 15
STARR OMOP Visit Overlap
Filtering events by event.omop_table == "visit_occurrence"
results in visits that have higher overlap than using event.omop_table == "visit_detail"
:
Within starr_omop_cdm5_deid_2022_08_10...
- There are a total of 1,519,789 unique entries in the
visit_occurrence
table with concept code = 'Visit/IP'.- Of these, a total of 1,188,178 unique entries have
start_DATETIME == end_DATETIME
(i.e. are a single point in time).
- Of these, a total of 1,188,178 unique entries have
- There are a total of 306,382 unique entries (982,538 total including duplicates) in the
visit_occurrence
table that have a matching entry in thevisit_detail
table with concept code = 'Visit/IP' (so we lose 881,796 visits, but most of these are single point in time events - so we're actually only losing 25,685 non-single-point-in-time events when doing this INNER JOIN).- Of these, a total of 456 unique entries (456 total including duplicates) have
start_DATETIME == end_DATETIME
(i.e. are a single point in time).
- Of these, a total of 456 unique entries (456 total including duplicates) have
- There are a total of 1,213,407 entries in
visit_occurrence
that lack a corresponding entry invisit_detail
- Of these unmatched entries, 1,187,722 of them are single point events
- There are a total of 9,702,293 distinct entries in
visit_detail
(across all visit types, including outpatient)
Mapping There is a one-to-many relationship between visit_occurrence
and visit_detail
.
NOTE: There can be many visit_detail
entries for a single visit_occurrence
, peaking at around 80 for a single visit_occurrence
. Thus, we need to merge together visit_detail
s that share the same visit_occurrence_id
Query to generate the above plot:
WITH vo AS (
SELECT DISTINCT(visit_occurrence_id) AS visit_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` where visit_concept_id = 9201
)
SELECT COUNT(vd.visit_occurrence_id) AS counter, vd.visit_occurrence_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
INNER JOIN vo
ON vd.visit_occurrence_id = vo.visit_id
GROUP BY vd.visit_occurrence_id
On the other hand, there is only ever a maximum of one visit_occurrence
per visit_detail
.
Query to verify this fact:
WITH vd AS (
SELECT DISTINCT(visit_occurrence_id) AS visit_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail`
)
SELECT COUNT(vo.visit_occurrence_id) AS counter, vo.visit_occurrence_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo
INNER JOIN vd
ON vo.visit_occurrence_id = vd.visit_id
where vo.visit_concept_id = 9201
GROUP BY vo.visit_occurrence_id
HAVING counter > 1
It seems like joining together the visit_detail
entries for a given visit_occurrence
yields the same (start, end) times as the original visit_occurrence
entry.
This can be seen by running the following SQL query:
SELECT ANY_VALUE(vo.visit_concept_id), ANY_VALUE(vo.visit_start_DATETIME) as vo_start, ANY_VALUE(vo.visit_end_DATETIME) as vo_end, MIN(vd.visit_detail_start_DATETIME) as vd_start, MAX(vd.visit_detail_end_DATETIME) as vd_end
FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo
LEFT JOIN `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
ON vo.visit_occurrence_id = vd.visit_occurrence_id
where vo.visit_concept_id = 9201 and vd.visit_occurrence_id is not NULL
GROUP BY vo.visit_occurrence_id
having vo_start = vd_start and vo_end = vd_end
reveals that 294,508 'Visit/IP' entries in the visit_occurrence
table have completely accurate (start,end) times -- in other words, joining together all the visit_detail
entries that map to this visit_occurrence
have a min start time and max end time equal to the visit_occurrence
's start and end time.
And even in places where visit_occurrence
and visit_detail
disagree, it looks like visit_occurrence
is more accurate, as the following SQL query reveals by looking through some of the results:
SELECT ANY_VALUE(vo.visit_concept_id), ANY_VALUE(vo.visit_start_DATETIME) as vo_start, ANY_VALUE(vo.visit_end_DATETIME) as vo_end, MIN(vd.visit_detail_start_DATETIME) as vd_start, MAX(vd.visit_detail_end_DATETIME) as vd_end
FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo
LEFT JOIN `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
ON vo.visit_occurrence_id = vd.visit_occurrence_id
where vo.visit_concept_id = 9201 and vd.visit_occurrence_id is not NULL
GROUP BY vo.visit_occurrence_id
having vo_start = vd_start and vo_end = vd_end
Results, which show visit_detail
gets pushed to 23:59:59 or gets pushed to becoming a single-point-in-time event:
Conclusion: It seems like simply filtering by visit_occurrence = 'Visit/IP'
and discarding single-point-in-time events is the best way to go. We can ignore the visit_detail
table.
The alternative matching visit_detail
events with their corresponding visit_occurrence
event by visit_occurrence_id
is a complicated way for combining the accurate (start, end) time info in the visit_detail
table with the accurate inpatient concept labeling in the visit_occurrence
table, as this may require merging multiple visit_detail
events together.