Skip to content

STARR OMOP Visit Overlap

Michael Wornow edited this page Jan 26, 2023 · 10 revisions

Filtering events by event.omop_table == "visit_occurrence" results in visits that have higher overlap than using event.omop_table == "visit_detail":

image

image

Within starr_omop_cdm5_deid_2022_08_10...

  • There are a total of 1,519,789 unique entries in the visit_occurrence table with concept code = 'Visit/IP'.
    • Of these, a total of 1,188,178 unique entries have start_DATETIME == end_DATETIME (i.e. are a single point in time).
  • There are a total of 306,382 unique entries (982,538 total including duplicates) in the visit_occurrence table that have a matching entry in the visit_detail table with concept code = 'Visit/IP' (so we lose 881,796 visits, but most of these are single point in time events - so we're actually only losing 25,685 non-single-point-in-time events when doing this INNER JOIN).
    • Of these, a total of 456 unique entries (456 total including duplicates) have start_DATETIME == end_DATETIME (i.e. are a single point in time).
  • There are a total of 1,213,407 entries in visit_occurrence that lack a corresponding entry in visit_detail
    • Of these unmatched entries, 1,187,722 of them are single point events
  • There are a total of 9,702,293 distinct entries in visit_detail (across all visit types, including outpatient)

Mapping There is a one-to-many relationship between visit_occurrence and visit_detail.

NOTE: There can be many visit_detail entries for a single visit_occurrence, peaking at around 80 for a single visit_occurrence. Thus, we need to merge together visit_details that share the same visit_occurrence_id

Screen Shot 2023-01-26 at 3 10 21 AM

Query to generate the above plot:

WITH vo AS (
  SELECT DISTINCT(visit_occurrence_id) AS visit_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` where visit_concept_id = 9201
)
SELECT COUNT(vd.visit_occurrence_id) AS counter, vd.visit_occurrence_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
INNER JOIN vo
ON vd.visit_occurrence_id = vo.visit_id 
GROUP BY vd.visit_occurrence_id

On the other hand, there is only ever a maximum of one visit_occurrence per visit_detail.

Query to verify this fact:

WITH vd AS (
  SELECT DISTINCT(visit_occurrence_id) AS visit_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail`
)
SELECT COUNT(vo.visit_occurrence_id) AS counter, vo.visit_occurrence_id FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo
INNER JOIN vd
ON vo.visit_occurrence_id = vd.visit_id 
where vo.visit_concept_id = 9201
GROUP BY vo.visit_occurrence_id
HAVING counter > 1

It seems like joining together the visit_detail entries for a given visit_occurrence yields the same (start, end) times as the original visit_occurrence entry.

This can be seen by running the following SQL query:

SELECT ANY_VALUE(vo.visit_concept_id), ANY_VALUE(vo.visit_start_DATETIME) as vo_start, ANY_VALUE(vo.visit_end_DATETIME) as vo_end, MIN(vd.visit_detail_start_DATETIME) as vd_start, MAX(vd.visit_detail_end_DATETIME) as vd_end 
FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo 
LEFT JOIN `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
ON vo.visit_occurrence_id = vd.visit_occurrence_id
where vo.visit_concept_id = 9201 and vd.visit_occurrence_id is not NULL
GROUP BY vo.visit_occurrence_id
having vo_start = vd_start and vo_end = vd_end

reveals that 294,508 'Visit/IP' entries in the visit_occurrence table have completely accurate (start,end) times -- in other words, joining together all the visit_detail entries that map to this visit_occurrence have a min start time and max end time equal to the visit_occurrence's start and end time.

And even in places where visit_occurrence and visit_detail disagree, it looks like visit_occurrence is more accurate, as the following SQL query reveals by looking through some of the results:

SELECT ANY_VALUE(vo.visit_concept_id), ANY_VALUE(vo.visit_start_DATETIME) as vo_start, ANY_VALUE(vo.visit_end_DATETIME) as vo_end, MIN(vd.visit_detail_start_DATETIME) as vd_start, MAX(vd.visit_detail_end_DATETIME) as vd_end 
FROM `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_occurrence` vo 
LEFT JOIN `som-rit-phi-starr-prod.starr_omop_cdm5_deid_2022_08_10.visit_detail` vd
ON vo.visit_occurrence_id = vd.visit_occurrence_id
where vo.visit_concept_id = 9201 and vd.visit_occurrence_id is not NULL
GROUP BY vo.visit_occurrence_id
having vo_start = vd_start and vo_end = vd_end

Results, which show visit_detail gets pushed to 23:59:59 or gets pushed to becoming a single-point-in-time event:

Screen Shot 2023-01-26 at 11 41 20 AM

Conclusion: It seems like simply filtering by visit_occurrence = 'Visit/IP' and discarding single-point-in-time events is the best way to go. We can ignore the visit_detail table.

The alternative matching visit_detail events with their corresponding visit_occurrence event by visit_occurrence_id is a complicated way for combining the accurate (start, end) time info in the visit_detail table with the accurate inpatient concept labeling in the visit_occurrence table, as this may require merging multiple visit_detail events together.