Documentation still in development
Table Description
This table serves as the central identity management for of all Persons in the database. It contains records that uniquely identify each person or patient, and some demographic information.
User Guide
All records in this table are independent Persons.
ETL Conventions
All Persons in a database need one record in this table, unless they fail data quality requirements specified in the ETL. Persons with no Events should have a record nonetheless. If more than one data source contributes Events to the database, Persons must be reconciled across the sources to create one single record per Person. The content of the BIRTH_DATETIME must be equivalent to the content of BIRTH_DAY, BIRTH_MONTH and BIRTH_YEAR.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
person_id | It is assumed that every person with a different unique identifier is in fact a different person and should be treated independently. | Any person linkage that needs to occur to uniquely identify Persons ought to be done prior to writing this table. This identifier can be the original id from the source data provided it is an integer, otherwise it can be an autogenerated number. | integer | Yes | Yes | No | |||
gender_concept_id | This field is meant to capture the biological sex at birth of the Person. This field should not be used to study gender identity issues. | Use the gender or sex value present in the data under the assumption that it is the biological sex at birth. If the source data captures gender identity it should be stored in the OBSERVATION table. Accepted gender concepts. | integer | Yes | No | Yes | CONCEPT | Gender | |
year_of_birth | For data sources with date of birth, the year should be extracted. For data sources where the year of birth is not available, the approximate year of birth could be derived based on age group categorization, if available. | integer | Yes | No | No | ||||
month_of_birth | For data sources that provide the precise date of birth, the month should be extracted and stored in this field. | integer | No | No | No | ||||
day_of_birth | For data sources that provide the precise date of birth, the day should be extracted and stored in this field. | integer | No | No | No | ||||
birth_datetime | Compute age using birth_datetime. | For data sources that provide the precise datetime of birth, that value should be stored in this field. If birth_datetime is not provided in the source, use the following logic to infer the date: If day_of_birth is null and month_of_birth is not null then use the first of the month in that year. If month_of_birth is null or if day_of_birth AND month_of_birth are both null and the person has records during their year of birth then use the date of the earliest record, otherwise use the 15th of June of that year. If time of birth is not given use midnight (00:00:0000). | datetime | No | No | No | |||
race_concept_id | This field captures race or ethnic background of the person. | Only use this field if you have information about race or ethnic background. The Vocabulary contains Concepts about the main races and ethnic backgrounds in a hierarchical system. Due to the imprecise nature of human races and ethnic backgrounds, this is not a perfect system. Mixed races are not supported. If a clear race or ethnic background cannot be established, use Concept_Id 0. | integer | Yes | No | Yes | CONCEPT | Race | |
ethnicity_concept_id | This field captures Ethnicity as defined by the Office of Management and Budget (OMB) of the US Government: it distinguishes only between “Hispanic” and “Not Hispanic”. Races and Ethnic backgrounds are not stored here. | Only use this field if you have US-based data and a source of this information. Do not attempt to infer Ethnicity from the race or ethnic background of the Person. | integer | Yes | No | Yes | CONCEPT | Ethnicity | |
location_id | The location refers to the physical address of the person. This field should capture the last known location of the person. Any prior locations are captured in the LOCATION_HISTORY table. | Put the location_id from the LOCATION table here that represents the most granular location information for the person. This could represent anything from postal code or parts thereof, state, or county for example. Since many databases contain deindentified data, it is common that the precision of the location is reduced to prevent re-identification. This field should capture the last known location. Any prior locations are captured in the LOCATION_HISTORY table. | integer | No | No | Yes | LOCATION | ||
provider_id | The Provider refers to the last known primary care provider (General Practitioner). | Put the provider_id from the PROVIDER table of the last known general practitioner of the person. If there are multiple providers, it is up to the business to decide which to put here. | integer | No | No | Yes | PROVIDER | ||
care_site_id | The Care Site refers to where the Provider typically provides the primary care. | integer | No | No | Yes | CARE_SITE | |||
person_source_value | Use this field to link back to persons in the source data. This is typically used for error checking of ETL logic. | Some use cases require the ability to link back to persons in the source data. This field allows for the storing of the person value as it appears in the source. This field is not required but strongly recommended. | varchar(50) | No | No | No | |||
gender_source_value | This field is used to store the biological sex of the person from the source data. It is not intended for use in standard analytics but for reference only. | Put the biological sex of the person as it appears in the source data. | varchar(50) | No | No | No | |||
gender_source_concept_id | Due to the small number of options, this tends to be zero. | If the source data codes biological sex in a non-standard vocabulary, store the concept_id here. | Integer | No | No | Yes | CONCEPT | ||
race_source_value | This field is used to store the race of the person from the source data. It is not intended for use in standard analytics but for reference only. | Put the race of the person as it appears in the source data. | varchar(50) | No | No | No | |||
race_source_concept_id | Due to the small number of options, this tends to be zero. | If the source data codes race in an OMOP supported vocabulary store the concept_id here. | Integer | No | No | Yes | CONCEPT | ||
ethnicity_source_value | This field is used to store the ethnicity of the person from the source data. It is not intended for use in standard analytics but for reference only. | If the person has an ethnicity other than the OMB standard of “Hispanic” or “Not Hispanic” store that value from the source data here. | varchar(50) | No | No | No | |||
ethnicity_source_concept_id | Due to the small number of options, this tends to be zero. | If the source data codes ethnicity in an OMOP supported vocabulary, store the concept_id here. | Integer | No | No | Yes | CONCEPT |
Table Description
This table contains records which define spans of time during which two conditions are expected to hold: (i) Clinical Events that happened to the Person are recorded in the Event tables, and (ii) absense of records indicate such Events did not occur during this span of time.
User Guide
For each Person, one or more OBSERVATION_PERIOD records may be present, but they will not overlap or be back to back to each other. Events may exist outside all of the time spans of the OBSERVATION_PERIOD records for a patient, however, absence of an Event outside these time spans cannot be construed as evidence of absence of an Event. Incidence or prevalence rates should only be calculated for the time of active OBSERVATION_PERIOD records. When constructing cohorts, outside Events can be used for inclusion criteria definition, but without any guarantee for the performance of these criteria. Also, OBSERVATION_PERIOD records can be as short as a single day, greatly disturbing the denominator of any rate calculation as part of cohort characterizations. To avoid that, apply minimal observation time as a requirement for any cohort definition.
ETL Conventions
Each Person needs to have at least one OBSERVATION_PERIOD record, which should be represent time intervals with a high capture rate of Clinical Events. Some source data have very similar concepts, such as enrolment periods in insurance claims data. In other source data such as most EHR systems these time spans need to be inferred under a set of assumptions. It is the discretion of the ETL developer to define these assumptions. In many ETL solutions the start date of the first occurrence or the first high quality occurrence of a Clinical Event (Condition, Drug, Procedure, Device, Measurement, Visit) is defined as the start of the OBSERVATION_PERIOD record, and the end date of the last occurrence of last high quality occurrence of a Clinical Event, or the end of the database period becomes the end of the OBSERVATOIN_PERIOD for each Person. If a Person only has a single Clinical Event the OBSERVATION_PERIOD record can be as short as one day. Depending on these definitions it is possible, that Clinical Events fall outside the time spans defined by OBSERVATION_PERIOD records. Family history or history of Clinical Events generally are not used to generate OBSERVATION_PERIOD records around the time they are referring to. Any two overlapping or adjacent OBSERVATION_PERIOD records have to be merged into one.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
observation_period_id | A Person can have multiple discrete Observation Periods which are identified by the Observation_Period_Id. | Assign a unique observation_period_id to each discrete Observation Period for a Person. | integer | Yes | Yes | No | |||
person_id | The Person ID of the PERSON record for which the Observation Period is recorded. | integer | Yes | No | Yes | PERSON | |||
observation_period_start_date | Use this date to determine the start date of the Observation Period | It is often the case that the idea of Observation Periods does not exist in source data. In those cases, the observation_period_start_date can be inferred as the earliest Event date available for the Person. In insurance claim data, the Observation Period can be considered as the time period the Person is enrolled with a payer. If a Person switches plans but stays with the same payer, and therefore capturing of data continues, that change would be captured in PAYER_PLAN_PERIOD. | date | Yes | No | No | |||
observation_period_end_date | Use this date to determine the end date of the period for which we can assume that all events for a Person are recorded. | It is often the case that the idea of Observation Periods does not exist in source data. In those cases, the observation_period_end_date can be inferred as the last Event date available for the Person. In insurance claim data, the Observation Period can be considered as the time period the Person is enrolled with a payer. | date | Yes | No | No | |||
period_type_concept_id | This field can be used to determine the provenance of the Observation Period as in whether the period was determined from an insurance enrollment file, EHR healthcare encounters, or other sources. | Choose the observation_period_type_concept_id that best represents how the period was determined. | Integer | Yes | No | Yes | CONCEPT | Type Concept |
Table Description
This table contains Events where Persons engage with the healthcare system for a duration of time. They are often also called “Encounters”. Visits are defined by a configuration of circumstances under which they occur, such as (i) whether the patient comes to a healthcare institution, the other way around, or the interaction is remote, (ii) whether and what kind of trained medical staff is delivering the service during the Visit, and (iii) whether the Visit is transient or for a longer period involving a stay in bed.
User Guide
The configuration defining the Visit are described by Concepts in the Visit Domain, which form a hierarchical structure, but rolling up to generally familiar Visits adopted in most healthcare systems worldwide:
The Visit duration, or ‘length of stay’, is defined as VISIT_END_DATE - VISIT_START_DATE. For all Visits this is <1 day, except Inpatient Visits and Non-hospital institution Visits. The CDM also contains the VISIT_DETAIL table where additional information about the Visit is stored, for example, transfers between units during an inpatient Visit.
ETL Conventions
Visits can be derived easily if the source data contain coding systems for Place of Service or Procedures, like CPT codes for well visits. In those cases, the codes can be looked up and mapped to a Standard Visit Concept. Otherwise, Visit Concepts have to be identified in the ETL process. This table will contain concepts in the Visit domain. These concepts are arranged in a hierarchical structure to facilitate cohort definitions by rolling up to generally familiar Visits adopted in most healthcare systems worldwide. Visits can be adjacent to each other, i.e. the end date of one can be identical with the start date of the other. As a consequence, more than one-day Visits or their descendants can be recorded for the same day. Multi-day visits must not overlap, i.e. share days other than start and end days. It is often the case that some logic should be written for how to define visits and how to assign Visit_Concept_Id. For example, in US claims outpatient visits that appear to occur within the time period of an inpatient visit can be rolled into one with the same Visit_Occurrence_Id. In EHR data inpatient visits that are within one day of each other may be strung together to create one visit. It will all depend on the source data and how encounter records should be translated to visit occurrences. Providers can be associated with a Visit through the PROVIDER_ID field, or indirectly through PROCEDURE_OCCURRENCE records linked both to the VISIT and PROVIDER tables.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
visit_occurrence_id | Use this to identify unique interactions between a person and the health care system. This identifier links across the other CDM event tables to associate events with a visit. | This should be populated by creating a unique identifier for each unique interaction between a person and the healthcare system where the person receives a medical good or service over a span of time. | integer | Yes | Yes | No | |||
person_id | integer | Yes | No | Yes | PERSON | ||||
visit_concept_id | This field contains a concept id representing the kind of visit, like inpatient or outpatient. All concepts in this field should be standard and belong to the Visit domain. | Populate this field based on the kind of visit that took place for the person. For example this could be “Inpatient Visit”, “Outpatient Visit”, “Ambulatory Visit”, etc. This table will contain standard concepts in the Visit domain. These concepts are arranged in a hierarchical structure to facilitate cohort definitions by rolling up to generally familiar Visits adopted in most healthcare systems worldwide. | integer | Yes | No | Yes | CONCEPT | Visit | |
visit_start_date | For inpatient visits, the start date is typically the admission date. For outpatient visits the start date and end date will be the same. | When populating visit_start_date, you should think about the patient experience to make decisions on how to define visits. In the case of an inpatient visit this should be the date the patient was admitted to the hospital or institution. In all other cases this should be the date of the patient-provider interaction. | date | Yes | No | No | |||
visit_start_datetime | If no time is given for the start date of a visit, set it to midnight (00:00:0000). | datetime | No | No | No | ||||
visit_end_date | For inpatient visits the end date is typically the discharge date. | Visit end dates are mandatory. If end dates are not provided in the source there are three ways in which to derive them: Outpatient Visit: visit_end_datetime = visit_start_datetime Emergency Room Visit: visit_end_datetime = visit_start_datetime Inpatient Visit: Usually there is information about discharge. If not, you should be able to derive the end date from the sudden decline of activity or from the absence of inpatient procedures/drugs. Non-hospital institution Visits: Particularly for claims data, if end dates are not provided assume the visit is for the duration of month that it occurs. For Inpatient Visits ongoing at the date of ETL, put date of processing the data into visit_end_datetime and visit_type_concept_id with 32220 “Still patient” to identify the visit as incomplete. All other Visits: visit_end_datetime = visit_start_datetime. If this is a one-day visit the end date should match the start date. | date | Yes | No | No | |||
visit_end_datetime | If no time is given for the end date of a visit, set it to midnight (00:00:0000). | datetime | No | No | No | ||||
visit_type_concept_id | Use this field to understand the provenance of the visit record, or where the record comes from. | Populate this field based on the provenance of the visit record, as in whether it came from an EHR record or billing claim. | Integer | Yes | No | Yes | CONCEPT | Type Concept | |
provider_id | There will only be one provider per visit record and the ETL document should clearly state how they were chosen (attending, admitting, etc.). If there are multiple providers associated with a visit in the source, this can be reflected in the event tables (CONDITION_OCCURRENCE, PROCEDURE_OCCURRENCE, etc.) or in the VISIT_DETAIL table. | If there are multiple providers associated with a visit, you will need to choose which one to put here. The additional providers can be stored in the visit_detail table. | integer | No | No | Yes | PROVIDER | ||
care_site_id | This field provides information about the care site where the visit took place. | There should only be one care site associated with a visit. | integer | No | No | Yes | CARE_SITE | ||
visit_source_value | This field houses the verbatim value from the source data representing the kind of visit that took place (inpatient, outpatient, emergency, etc.) | If there is information about the kind of visit in the source data that value should be stored here. If a visit is an amalgamation of visits from the source then use a hierarchy to choose the visit source value, such as IP -> ER-> OP. This should line up with the logic chosen to determine how visits are created. | varchar(50) | No | No | No | |||
visit_source_concept_id | If the visit source value is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here. | integer | No | No | Yes | CONCEPT | |||
admitting_source_concept_id | Use this field to determine where the patient was admitted from. This concept is part of the visit domain and can indicate if a patient was admitted to the hospital from a long-term care facility, for example. | If available, map the admitted_from_source_value to a standard concept in the visit domain. | integer | No | No | Yes | CONCEPT | Visit | |
admitting_source_value | This information may be called something different in the source data but the field is meant to contain a value indicating where a person was admitted from. Typically this applies only to visits that have a length of stay, like inpatient visits or long-term care visits. | varchar(50) | No | No | No | ||||
discharge_to_concept_id | Use this field to determine where the patient was discharged to after a visit. This concept is part of the visit domain and can indicate if a patient was discharged to home or sent to a long-term care facility, for example. | If available, map the discharge_to_source_value to a standard concept in the visit domain. | integer | No | No | Yes | CONCEPT | Visit | |
discharge_to_source_value | This information may be called something different in the source data but the field is meant to contain a value indicating where a person was discharged to after a visit, as in they went home or were moved to long-term care. Typically this applies only to visits that have a length of stay of a day or more. | varchar(50) | No | No | No | ||||
preceding_visit_occurrence_id | Use this field to find the visit that occured for the person prior to the given visit. There could be a few days or a few years in between. | The preceding_visit_id can be used to link a visit immediately preceding the current visit. Note this is not symmetrical, and there is no such thing as a “following_visit_id”. | integer | No | No | Yes | VISIT_OCCURRENCE |
Table Description
This table contains records of Events of a Person suggesting the presence of a disease or medical condition stated as a diagnosis, a sign, or a symptom, which is either observed by a Provider or reported by the patient.
User Guide
Conditions span a time interval from start to end, but are typically recorded as single snapshot records with no end date. The reason is twofold: (i) At the time of the recording the duration is not known and later not recorded, and (ii) the Persons typically cease interacting with the healthcare system when they feel better, which leads to incomplete capture of resolved Conditions. The CONDITION_ERA table addresses this issue. Conditions are defined by Concepts from the Condition domain, which form a complex hierarchy. As a result, the same Person with the same disease may have multiple Condition records, which belong to the same hierarchical family. Most Condition records are mapped from diagnostic codes, but recorded signs, symptoms and summary descriptions also contribute to this table. Rule out diagnosis should not be recorded in this table, but in reality their negating nature is not always captured in the source data, and other precautions must be taken when when identifying Persons who should suffer from the recorded Condition.
ETL Conventions
Source codes and source text fields mapped to Standard Concepts of the Condition Domain have to be recorded here. Family history and past diagnoses (‘history of’) are not recorded in this table. Instead, they are listed in the OBSERVATION table. Codes written in the process of establishing the diagnosis, such as ‘question of’ of and ‘rule out’, should not represented here. Instead, they should be recorded in the OBSERVATION table, if they are used for analyses. However, this information is not always available.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
condition_occurrence_id | The unique key given to a condition record for a person. Refer to the ETL for how duplicate conditions during the same visit were handled. | Each instance of a condition present in the source data should be assigned this unique key. In some cases, a person can have multiple records of the same condition within the same visit. It is valid to keep these duplicates and assign them individual, unique, CONDITION_OCCURRENCE_IDs, though it is up to the ETL how they should be handled. | bigint | Yes | Yes | No | |||
person_id | The PERSON_ID of the PERSON for whom the condition is recorded. | bigint | Yes | No | Yes | PERSON | |||
condition_concept_id | The CONDITION_CONCEPT_ID field is recommended for primary use in analyses, and must be used for network studies. This is the standard concept mapped from the source value which represents a condition | The CONCEPT_ID that the CONDITION_SOURCE_VALUE maps to. Only records whose source values map to concepts with a domain of “Condition” should go in this table. | integer | Yes | No | Yes | CONCEPT | Condition | |
condition_start_date | Use this date to determine the start date of the condition | Most often data sources do not have the idea of a start date for a condition. Rather, if a source only has one date associated with a condition record it is acceptable to use that date for both the CONDITION_START_DATE and the CONDITION_END_DATE. | date | Yes | No | No | |||
condition_start_datetime | If a source does not specify datetime the convention is to set the time to midnight (00:00:0000) | datetime | No | No | No | ||||
condition_end_date | Use this date to determine the end date of the condition | Most often data sources do not have the idea of a start date for a condition. Rather, if a source only has one date associated with a condition record it is acceptable to use that date for both the CONDITION_START_DATE and the CONDITION_END_DATE. | date | No | No | No | |||
condition_end_datetime | If a source does not specify datetime the convention is to set the time to midnight (00:00:0000) | datetime | No | No | No | ||||
condition_type_concept_id | This field can be used to determine the provenance of the Condition record, as in whether the condition was from an EHR system, insurance claim, registry, or other sources. | Choose the condition_type_concept_id that best represents the provenance of the record. | integer | Yes | No | Yes | CONCEPT | Type Concept | |
condition_status_concept_id | This concept represents the point during the visit the diagnosis was given (admitting diagnosis, final diagnosis), whether the diagnosis was determined due to laboratory findings, if the diagnosis was exclusionary, or if it was a preliminary diagnosis, among others. | Presently, there is no designated vocabulary, domain, or class that represents condition status. The concepts with a relationship_id of “subsumes” with CONCEPT_ID 4021918 “Qualifier for type of diagnosis” should be used. These include admitting diagnosis, principal diagnosis, and secondary diagnosis. | integer | No | No | Yes | CONCEPT | ||
stop_reason | The Stop Reason indicates why a Condition is no longer valid with respect to the purpose within the source data. Note that a Stop Reason does not necessarily imply that the condition is no longer occurring. | This information is often not populated in source data and it is a valid etl choice to leave it blank if the information does not exist. | varchar(20) | No | No | No | |||
provider_id | The provider associated with condition record. | The ETL may need to make a choice as to which PROVIDER_ID to put here. Based on what is available this may or may not be different than the provider associated with the overall VISIT_OCCURRENCE record. | integer | No | No | Yes | PROVIDER | ||
visit_occurrence_id | The visit during which the condition was diagnosed. | Depending on the structure of the source data, this may have to be determined based on dates. | integer | No | No | Yes | VISIT_OCCURRENCE | ||
visit_detail_id | The VISIT_DETAIL record during which the condition was diagnosed. For example, if the person was in the ICU at the time of the diagnosis the VISIT_OCCURRENCE record would reflect the overall hospital stay and the VISIT_DETAIL record would reflect the ICU stay during the hospital visit. | integer | No | No | Yes | VISIT_DETAIL | |||
condition_source_value | This field is discouraged from use in analysis because it is not required to contain Standard Concepts that are used across the OHDSI community, and should only be used when Standard Concepts do not adequately represent the source detail for the Condition necessary for a given analytic use case. Consider using CONDITION_CONCEPT_ID instead to enable standardized analytics that can be consistent across the network. | This code is mapped to a Standard Condition Concept in the Standardized Vocabularies and the original code is stored here for reference. | varchar(50) | No | No | No | |||
condition_source_concept_id | If the CONDITION_SOURCE_VALUE is coded in the source data using an OMOP supported vocabulary put the concept id representing the source value here. | integer | No | No | Yes | CONCEPT | |||
condition_status_source_value | This information may be called something different in the source data but the field is meant to contain a value indicating when and how a diagnosis was given to a patient. This source value is mapped to a standard concept which is stored in the CONDITION_STATUS_CONCEPT_ID field. | varchar(50) | No | No | No |
Table Description
This table captures records about the exposure to a Drug ingested or otherwise introduced into the body. A Drug is a biochemical substance formulated in such a way that when administered to a Person it will exert a certain biochemical effect on the metabolism.
User Guide
Drugs include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies.
ETL Conventions
When the Drug Source Value of the code cannot be translated into Standard Drug Concept IDs, a Drug exposure entry is stored with only the corresponding SOURCE_CONCEPT_ID and DRUG_SOURCE_VALUE and a DRUG_CONCEPT_ID of 0. The Drug Concept with the most detailed content of information is preferred during the mapping process. These are indicated in the CONCEPT_CLASS_ID field of the Concept and are recorded in the following order of precedence: ‘Branded Pack’, ‘Clinical Pack’, ‘Branded Drug’, ‘Clinical Drug’, ‘Branded Drug Component’, ‘Clinical Drug Component’, ‘Branded Drug Form’, ‘Clinical Drug Form’, and only if no other information is available ‘Ingredient’. Note: If only the drug class is known, the DRUG_CONCEPT_ID field should contain 0.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
drug_exposure_id | Use this to identify unique dispensings or administrations of a drug product to a person | This should be populated by creating a unique identifier for each unique instance where a person receives a dispensing or administration of a drug. | bigint | Yes | Yes | No | |||
person_id | bigint | Yes | No | Yes | PERSON | ||||
drug_concept_id | The DRUG_CONCEPT_ID field is recommended for primary use in analyses, and must be used for network studies. This is the standard concept mapped from the source value which represents a drug product or molecule otherwise introduced to the body. | Map source values to standard concepts. All concepts in the DRUG_EXPOSURE table should be in the ‘Drug’ domain. | integer | Yes | No | Yes | CONCEPT | Drug | |
drug_exposure_start_date | Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded. | date | Yes | No | No | ||||
drug_exposure_start_datetime | If time is unknown set it to midnight. | datetime | No | No | No | ||||
drug_exposure_end_date | The DRUG_EXPOSURE_END_DATE denotes the day the drug exposure ended for the patient. This could be that the duration of DAYS_SUPPLY was reached, or because the exposure was stopped (medication changed, medication discontinued, etc.). To populate this field, start first with DAYS_SUPPLY using the calculation DRUG_EXPOSURE_END_DATE = DRUG_EXPOSURE_START_DATE + DAYS_SUPPLY -1 day. If DAYS_SUPPLY is not available then use the VERBATIM_END_DATE as it is given in the source data. If there is no verbatim end date then set DRUG_EXPOSURE_END_DATE equal to DRUG_EXPOSURE_START_DATE. When the native data suggests a drug exposure has a days supply less than 0, drop the record as it is unknown if a person has received the drug or not (THEMIS issue #24). If a patient has multiple records on the same day for the same drug or procedures the ETL should not de-dupe them unless there is probable reason to believe the item is a true data duplicate (THEMIS issue #14). Depending on different sources, it could be a known or an inferred date and denotes the last day at which the patient was still exposed to Drug. | date | Yes | No | No | ||||
drug_exposure_end_datetime | If time is unknown set it to midnight. | datetime | No | No | No | ||||
verbatim_end_date | This is the end date as it appears in the source data, if it is given. | Put the end date for the drug exposure as it appears in the source data. This may or may not be the same as DRUG_EXPOSURE_END_DATE given the logic for assigning DRUG_EXPOSURE_END_DATE. | date | No | No | No | |||
drug_type_concept_id | You can use the TYPE_CONCEPT_ID to delineate between prescriptions written vs. prescriptions dispensed vs. medication history vs. patient-reported exposure | This field is meant to preserve the provenance of the record. Any standard concepts in the ‘Type Concept’ domain are valid here. | integer | Yes | No | Yes | CONCEPT | Type Concept | |
stop_reason | Reason a person stopped a medication. Reasons include regimen completed, changed, removed, etc. | varchar(20) | No | No | No | ||||
refills | The content of the refills field determines the current refill number, not the number of remaining refills. For example, for a drug prescription with 2 refills, the content of this field for the 3 Drug Exposure events are null, 1 and 2. | integer | No | No | No | ||||
quantity | float | No | No | No | |||||
days_supply | integer | No | No | No | |||||
sig | (and printed on the container) | varchar(MAX) | No | No | No | ||||
route_concept_id | Route information can also be inferred from the Drug product itself by determining the Drug Form of the Concept, creating some partial overlap of the same type of information. Therefore, route information should be stored in DRUG_CONCEPT_ID (as a drug with corresponding Dose Form). The ROUTE_CONCEPT_ID could be used for storing more granular forms e.g. ‘Intraventricular cardiac’. | integer | No | No | Yes | CONCEPT | Route | ||
lot_number | varchar(50) | No | No | No | |||||
provider_id | integer | No | No | Yes | PROVIDER | ||||
visit_occurrence_id | integer | No | No | Yes | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | Yes | VISIT_DETAIL | ||||
drug_source_value | This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference. | varchar(50) | No | No | No | ||||
drug_source_concept_id | integer | No | No | Yes | CONCEPT | ||||
route_source_value | varchar(50) | No | No | No | |||||
dose_unit_source_value | varchar(50) | No | No | No |
Table Description
The PROCEDURE_OCCURRENCE table contains records of activities or processes ordered by, or carried out by, a healthcare provider on the patient to have a diagnostic or therapeutic purpose.
ETL Conventions
Procedures are expected to be carried out within one day and therefore have no end date. When dealing with duplicate records, the ETL must determine whether to sum them up into one record or keep them separate. Things to consider are: - Same Procedure - Same PROCEDURE_DATETIME - Same Visit Occurrence or Visit Detail - Same Provider - Same Modifier for Procedures - Same COST_ID
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
procedure_occurrence_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
procedure_concept_id | integer | Yes | No | Yes | CONCEPT | Procedure | |||
procedure_date | date | Yes | No | No | |||||
procedure_datetime | datetime | No | No | No | |||||
procedure_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
modifier_concept_id | These concepts are typically distinguished by ‘Modifier’ concept classes (e.g., ‘CPT4 Modifier’ as part of the ‘CPT4’ vocabulary). | integer | No | No | Yes | CONCEPT | |||
quantity | If the quantity value is omitted, a single procedure is assumed. | If a Procedure has a quantity of ‘0’ in the source, this should default to ‘1’ in the ETL. If there is a record in the source it can be assumed the exposure occurred at least once (THEMIS issue #26). | integer | No | No | No | |||
provider_id | integer | No | No | No | PROVIDER | ||||
visit_occurrence_id | integer | No | No | No | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | No | VISIT_DETAIL | ||||
procedure_source_value | This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes. | varchar(50) | No | No | No | ||||
procedure_source_concept_id | integer | No | No | No | CONCEPT | ||||
modifier_source_value | varchar(50) | No | No | No |
Table Description
The Device domain captures information about a person’s exposure to a foreign physical object or instrument which is used for diagnostic or therapeutic purposes through a mechanism beyond chemical action. Devices include implantable objects (e.g. pacemakers, stents, artificial joints), medical equipment and supplies (e.g. bandages, crutches, syringes), other instruments used in medical procedures (e.g. sutures, defibrillators) and material used in clinical care (e.g. adhesives, body material, dental material, surgical material).
User Guide
The distinction between Devices or supplies and Procedures are sometimes blurry, but the former are physical objects while the latter are actions, often to apply a Device or supply.
ETL Conventions
When dealing with duplicate records, the ETL must determine whether to sum them up into one record or keep them separate. Things to consider are: - Same Device/Procedure - Same DEVICE_EXPOSURE_START_DATETIME - Same Visit Occurrence or Visit Detail - Same Provider - Same Modifier for Procedures - Same COST_ID
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
device_exposure_id | bigint | Yes | Yes | No | |||||
person_id | bigint | Yes | No | Yes | PERSON | ||||
device_concept_id | integer | Yes | No | Yes | CONCEPT | Device | |||
device_exposure_start_date | date | Yes | No | No | |||||
device_exposure_start_datetime | datetime | No | No | No | |||||
device_exposure_end_date | date | No | No | No | |||||
device_exposure_end_datetime | datetime | No | No | No | |||||
device_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
unique_device_id | For medical devices that are regulated by the FDA, a Unique Device Identification (UDI) is provided if available in the data source and is recorded in the UNIQUE_DEVICE_ID field. | varchar(50) | No | No | No | ||||
quantity | integer | No | No | No | |||||
provider_id | integer | No | No | Yes | PROVIDER | ||||
visit_occurrence_id | integer | No | No | Yes | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | Yes | VISIT_DETAIL | ||||
device_source_value | varchar(50) | No | No | No | |||||
device_source_concept_id | integer | No | No | Yes | CONCEPT |
Table Description
The MEASUREMENT table contains records of Measurement, i.e. structured values (numerical or categorical) obtained through systematic and standardized examination or testing of a Person or Person’s sample. The MEASUREMENT table contains both orders and results of such Measurements as laboratory tests, vital signs, quantitative findings from pathology reports, etc. Measurements are stored as attribute value pairs, with the attribute as the Measurement Concept and the value representing the result. The value can be a Concept (stored in VALUE_AS_CONCEPT), or a numerical value (VALUE_AS_NUMBER) with a Unit (UNIT_CONCEPT_ID).
User Guide
Measurements differ from Observations in that they require a standardized test or some other activity to generate a quantitative or qualitative result. For example, LOINC 1755-8 concept_id 3027035 ‘Albumin [Mass/time] in 24 hour Urine’ is the lab test to measure a certain chemical in a urine sample. Even though each Measurement always have a result, the fields VALUE_AS_NUMBER and VALUE_AS_CONCEPT_ID are not mandatory. When the result is not known, the Measurement record represents just the fact that the corresponding Measurement was carried out, which in itself is already useful information for some use cases.
ETL Conventions
Even though each Measurement always have a result, the fields VALUE_AS_NUMBER and VALUE_AS_CONCEPT_ID are not mandatory. When the result is not known, the Measurement record represents just the fact that the corresponding Measurement was carried out, which in itself is already useful information for some use cases. For some Measurement Concepts, the result is included in the test. For example, ICD10 concept_id 45595451 ‘Presence of alcohol in blood, level not specified’ indicates a Measurement and the result (present). In those situations, the CONCEPT_RELATIONSHIP table in addition to the ‘Maps to’ record contains a second record with the relationship_id set to ‘Maps to value’. In this example, the ‘Maps to’ relationship directs to 4041715 ‘Blood ethanol measurement’ as well as a ‘Maps to value’ record to 4181412 ‘Present’.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
measurement_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
measurement_concept_id | integer | Yes | No | Yes | CONCEPT | Measurement | |||
measurement_date | date | Yes | No | No | |||||
measurement_datetime | datetime | No | No | No | |||||
measurement_time | This is present for backwards compatibility and will be deprecated in an upcoming version | varchar(10) | No | No | No | ||||
measurement_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
operator_concept_id | The meaning of Concept 4172703 for ‘=’ is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it’s important when devising analyses to not to forget testing for the content of this field for values different from =. | If there is a negative value coming from the source, set the VALUE_AS_NUMBER to NULL, with the exception of the following Measurements (listed as LOINC codes): 1925-7 Base excess in Arterial blood by calculation 1927-3 Base excess in Venous blood by calculation Operators are <, <=, =, >=, > and these concepts belong to the ‘Meas Value Operator’ domain. 8632-2 QRS-Axis 11555-0 Base excess in Blood by calculation 1926-5 Base excess in Capillary blood by calculation 28638-5 Base excess in Arterial cord blood by calculation 28639-3 Base excess in Venous cord blood by calculation THEMIS issue #16 | integer | No | No | Yes | CONCEPT | ||
value_as_number | float | No | No | No | |||||
value_as_concept_id | integer | No | No | Yes | CONCEPT | ||||
unit_concept_id | integer | No | No | Yes | CONCEPT | Unit | |||
range_low | Ranges have the same unit as the VALUE_AS_NUMBER. | If reference ranges for upper and lower limit of normal as provided (typically by a laboratory) these are stored in the RANGE_HIGH and RANGE_LOW fields. Ranges have the same unit as the VALUE_AS_NUMBER. | float | No | No | No | |||
range_high | Ranges have the same unit as the VALUE_AS_NUMBER. | float | No | No | No | ||||
provider_id | integer | No | No | Yes | PROVIDER | ||||
visit_occurrence_id | integer | No | No | Yes | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | Yes | VISIT_DETAIL | ||||
measurement_source_value | varchar(50) | No | No | No | |||||
measurement_source_concept_id | integer | No | No | Yes | CONCEPT | ||||
unit_source_value | varchar(50) | No | No | No | |||||
value_source_value | varchar(50) | No | No | No |
Table Description
The VISIT_DETAIL table is an optional table used to represents details of each record in the parent visit_occurrence table. For every record in visit_occurrence table there may be 0 or more records in the visit_detail table with a 1:n relationship where n may be 0. The visit_detail table is structurally very similar to visit_occurrence table and belongs to the similar domain as the visit.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
visit_detail_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
visit_detail_concept_id | integer | Yes | No | Yes | CONCEPT | Visit | |||
visit_detail_start_date | date | Yes | No | No | |||||
visit_detail_start_datetime | datetime | No | No | No | |||||
visit_detail_end_date | date | Yes | No | No | |||||
visit_detail_end_datetime | datetime | No | No | No | |||||
visit_detail_type_concept_id | Integer | Yes | No | Yes | CONCEPT | Type Concept | |||
provider_id | integer | No | No | Yes | PROVIDER | ||||
care_site_id | integer | No | No | Yes | CARE_SITE | ||||
visit_detail_source_value | varchar(50) | No | No | No | |||||
visit_detail_source_concept_id | Integer | No | No | Yes | CONCEPT | ||||
admitting_source_value | Varchar(50) | No | No | No | |||||
admitting_source_concept_id | Integer | No | No | Yes | CONCEPT | Visit | |||
discharge_to_source_value | Varchar(50) | No | No | No | |||||
discharge_to_concept_id | Integer | No | No | Yes | CONCEPT | Visit | |||
preceding_visit_detail_id | Integer | No | No | Yes | VISIT_DETAIL | ||||
visit_detail_parent_id | Integer | No | No | Yes | VISIT_DETAIL | ||||
visit_occurrence_id | Integer | Yes | No | Yes | VISIT_OCCURRENCE |
Table Description
The NOTE table captures unstructured information that was recorded by a provider about a patient in free text notes on a given date.
ETL Conventions
The NOTE table contains free text (in ASCII, or preferably in UTF8 format). The type of note_text is CLOB or varchar(MAX) depending on RDBMS. Mapping of clinical documents to Clinical Document Ontology (CDO) and standard terminology HL7/LOINC CDO is a standard for consistent naming of documents to support a range of use cases: retrieval, organization, display, and exchange. It guides the creation of LOINC codes for clinical notes. CDO annotates each document with 5 dimensions:
Kind of Document: Characterizes the general structure of the document at a macro level (e.g. Anesthesia Consent) Type of Service: Characterizes the kind of service or activity (e.g. evaluations, consultations, and summaries). The notion of time sequence, e.g., at the beginning (admission) at the end (discharge) is subsumed in this axis. Example: Discharge Teaching. Setting: Setting is an extension of CMS’s definitions (e.g. Inpatient, Outpatient) Subject Matter Domain (SMD): Characterizes the subject matter domain of a note (e.g. Anesthesiology) Role: Characterizes the training or professional level of the author of the document, but does not break down to specialty or subspecialty (e.g. Physician) Each combination of these 5 dimensions rolls up to a unique LOINC code.
According to CDO requirements, only 2 of the 5 dimensions are required to properly annotate a document: Kind of Document and any one of the other 4 dimensions. However, not all the permutations of the CDO dimensions will necessarily yield an existing LOINC code.2 HL7/LOINC workforce is committed to establish new LOINC codes for each new encountered combination of CDO dimensions. The full document ontology as it exists in the Vocabulary is too extensive to list here, but it is possible to explore through the ATHENA tool starting with the ‘LOINC Document Ontology - Type of Service and Kind of Document’ by walking through the ‘Is a’/‘Subsumes’ relationship hierarchies.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
note_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
note_date | date | Yes | No | No | |||||
note_datetime | datetime | No | No | No | |||||
note_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
note_class_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
note_title | varchar(250) | No | No | No | |||||
note_text | varchar(MAX) | Yes | No | No | |||||
encoding_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
language_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
provider_id | integer | No | No | Yes | PROVIDER | ||||
visit_occurrence_id | integer | No | No | Yes | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | Yes | VISIT_DETAIL | ||||
note_source_value | varchar(50) | No | No | No |
Table Description
The NOTE_NLP table will encode all output of NLP on clinical notes. Each row represents a single extracted term from a note.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
note_nlp_id | integer | Yes | Yes | No | |||||
note_id | integer | Yes | No | No | |||||
section_concept_id | integer | No | No | Yes | CONCEPT | ||||
snippet | varchar(250) | No | No | No | |||||
offset | varchar(50) | No | No | No | |||||
lexical_variant | varchar(250) | Yes | No | No | |||||
note_nlp_concept_id | integer | No | No | Yes | CONCEPT | ||||
note_nlp_source_concept_id | integer | No | No | Yes | CONCEPT | ||||
nlp_system | varchar(250) | No | No | No | |||||
nlp_date | date | Yes | No | No | |||||
nlp_datetime | datetime | No | No | No | |||||
term_exists | Term_exists is defined as a flag that indicates if the patient actually has or had the condition. Any of the following modifiers would make Term_exists false: Negation = true Subject = [anything other than the patient] Conditional = true/li> Rule_out = true Uncertain = very low certainty or any lower certainties A complete lack of modifiers would make Term_exists true. | varchar(1) | No | No | No | ||||
term_temporal | Term_temporal is to indicate if a condition is �present� or just in the �past�. The following would be past: History = true Concept_date = anything before the time of the report | varchar(50) | No | No | No | ||||
term_modifiers | For the modifiers that are there, they would have to have these values: Negation = false Subject = patient Conditional = false Rule_out = false Uncertain = true or high or moderate or even low (could argue about low). Term_modifiers will concatenate all modifiers for different types of entities (conditions, drugs, labs etc) into one string. Lab values will be saved as one of the modifiers. A list of allowable modifiers (e.g., signature for medications) and their possible values will be standardized later. | varchar(2000) | No | No | No |
Table Description
The OBSERVATION table captures clinical facts about a Person obtained in the context of examination, questioning or a procedure. Any data that cannot be represented by any other domains, such as social and lifestyle facts, medical history, family history, etc. are recorded here.
ETL Conventions
Observations differ from Measurements in that they do not require a standardized test or some other activity to generate clinical fact. Typical observations are medical history, family history, the stated need for certain treatment, social circumstances, lifestyle choices, healthcare utilization patterns, etc. If the generation clinical facts requires a standardized testing such as lab testing or imaging and leads to a standardized result, the data item is recorded in the MEASUREMENT table. If the clinical fact observed determines a sign, symptom, diagnosis of a disease or other medical condition, it is recorded in the CONDITION_OCCURRENCE table. Observations can be stored as attribute value pairs, with the attribute as the Observation Concept and the value representing the clinical fact. This fact can be a Concept (stored in VALUE_AS_CONCEPT), a numerical value (VALUE_AS_NUMBER), a verbatim string (VALUE_AS_STRING), or a datetime (VALUE_AS_DATETIME). Even though Observations do not have an explicit result, the clinical fact can be stated separately from the type of Observation in the VALUE_AS_* fields. It is recommended for Observations that are suggestive statements of positive assertion should have a value of ‘Yes’ (concept_id=4188539), recorded, even though the null value is the equivalent.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
observation_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
observation_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
observation_date | date | Yes | No | No | |||||
observation_datetime | datetime | No | No | No | |||||
observation_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
value_as_number | float | No | No | No | |||||
value_as_string | varchar(60) | No | No | No | |||||
value_as_concept_id | Note that the value of VALUE_AS_CONCEPT_ID may be provided through mapping from a source Concept which contains the content of the Observation. In those situations, the CONCEPT_RELATIONSHIP table in addition to the ‘Maps to’ record contains a second record with the relationship_id set to ‘Maps to value’. For example, ICD9CM V17.5 concept_id 44828510 ‘Family history of asthma’ has a ‘Maps to’ relationship to 4167217 ‘Family history of clinical finding’ as well as a ‘Maps to value’ record to 317009 ‘Asthma’. | Integer | No | No | Yes | CONCEPT | |||
qualifier_concept_id | integer | No | No | Yes | CONCEPT | ||||
unit_concept_id | integer | No | No | Yes | CONCEPT | Unit | |||
provider_id | integer | No | No | Yes | PROVIDER | ||||
visit_occurrence_id | integer | No | No | Yes | VISIT_OCCURRENCE | ||||
visit_detail_id | integer | No | No | Yes | VISIT_DETAIL | ||||
observation_source_value | varchar(50) | No | No | No | |||||
observation_source_concept_id | integer | No | No | Yes | CONCEPT | ||||
unit_source_value | varchar(50) | No | No | No | |||||
qualifier_source_value | varchar(50) | No | No | No |
Table Description
The specimen domain contains the records identifying biological samples from a person.
ETL Conventions
Anatomic site is coded at the most specific level of granularity possible, such that higher level classifications can be derived using the Standardized Vocabularies.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
specimen_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
specimen_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
specimen_type_concept_id | integer | Yes | No | Yes | CONCEPT | Type Concept | |||
specimen_date | date | Yes | No | No | |||||
specimen_datetime | datetime | No | No | No | |||||
quantity | float | No | No | No | |||||
unit_concept_id | integer | No | No | Yes | CONCEPT | ||||
anatomic_site_concept_id | integer | No | No | Yes | CONCEPT | ||||
disease_status_concept_id | integer | No | No | Yes | CONCEPT | ||||
specimen_source_id | varchar(50) | No | No | No | |||||
specimen_source_value | varchar(50) | No | No | No | |||||
unit_source_value | varchar(50) | No | No | No | |||||
anatomic_site_source_value | varchar(50) | No | No | No | |||||
disease_status_source_value | varchar(50) | No | No | No |
Table Description
The FACT_RELATIONSHIP table contains records about the relationships between facts stored as records in any table of the CDM. Relationships can be defined between facts from the same domain, or different domains. Examples of Fact Relationships include: Person relationships (parent-child), care site relationships (hierarchical organizational structure of facilities within a health system), indication relationship (between drug exposures and associated conditions), usage relationships (of devices during the course of an associated procedure), or facts derived from one another (measurements derived from an associated specimen).
ETL Conventions
All relationships are directional, and each relationship is represented twice symmetrically within the FACT_RELATIONSHIP table. For example, two persons if person_id = 1 is the mother of person_id = 2 two records are in the FACT_RELATIONSHIP table (all strings in fact concept_id records in the Concept table: - Person, 1, Person, 2, parent of - Person, 2, Person, 1, child of
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
domain_concept_id_1 | integer | Yes | No | Yes | CONCEPT | ||||
fact_id_1 | integer | Yes | No | No | |||||
domain_concept_id_2 | integer | Yes | No | Yes | CONCEPT | ||||
fact_id_2 | integer | Yes | No | No | |||||
relationship_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The LOCATION table represents a generic way to capture physical location or address information of Persons and Care Sites.
User Guide
For standardized geospatial visualization and analysis, addresses need to be, at the minimum be geocoded into latitude and longitude.
ETL Conventions
Each address or Location is unique and is present only once in the table. Locations do not contain names, such as the name of a hospital. In order to construct a full address that can be used in the postal service, the address information from the Location needs to be combined with information from the Care Site.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
location_id | integer | Yes | Yes | No | |||||
address_1 | varchar(50) | No | No | No | |||||
address_2 | varchar(50) | No | No | No | |||||
city | varchar(50) | No | No | No | |||||
state | varchar(2) | No | No | No | |||||
zip | Zip codes are handled as strings of up to 9 characters length. For US addresses, these represent either a 3-digit abbreviated Zip code as provided by many sources for patient protection reasons, the full 5-digit Zip or the 9-digit (ZIP + 4) codes. Unless for specific reasons analytical methods should expect and utilize only the first 3 digits. For international addresses, different rules apply. | varchar(9) | No | No | No | ||||
county | varchar(20) | No | No | No | |||||
location_source_value | varchar(50) | No | No | No |
Table Description
The CARE_SITE table contains a list of uniquely identified institutional (physical or organizational) units where healthcare delivery is practiced (offices, wards, hospitals, clinics, etc.).
ETL Conventions
Care site is a unique combination of location_id and place_of_service_source_value. Care site does not take into account the provider (human) information such a specialty. Many source data do not make a distinction between individual and institutional providers. The CARE_SITE table contains the institutional providers. If the source, instead of uniquely identifying individual Care Sites, only provides limited information such as Place of Service, generic or “pooled” Care Site records are listed in the CARE_SITE table. There can be hierarchical and business relationships between Care Sites. For example, wards can belong to clinics or departments, which can in turn belong to hospitals, which in turn can belong to hospital systems, which in turn can belong to HMOs.The relationships between Care Sites are defined in the FACT_RELATIONSHIP table.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
care_site_id | Assign an id to each unique combination of location_id and place_of_service_source_value | integer | Yes | Yes | No | ||||
care_site_name | The name of the care_site as it appears in the source data | varchar(255) | No | No | No | ||||
place_of_service_concept_id | This is a high-level way of characterizing a Care Site. Typically, however, Care Sites can provide care in multiple settings (inpatient, outpatient, etc.) and this granularity should be reflected in the visit. | Choose the concept in the visit domain that best represents the setting in which healthcare is provided in the Care Site. If most visits in a Care Site are Inpatient, then the place_of_service_concept_id should represent Inpatient. If information is present about a unique Care Site (e.g. Pharmacy) then a Care Site record should be created. | integer | No | No | Yes | CONCEPT | ||
location_id | The location_id from the LOCATION table representing the physical location of the care_site. | integer | No | No | Yes | LOCATION | |||
care_site_source_value | The identifier of the care_site as it appears in the source data. This could be an identifier separate from the name of the care_site. | varchar(50) | No | No | No | ||||
place_of_service_source_value | Put the place of service of the care_site as it appears in the source data. | varchar(50) | No | No | No |
Table Description
The PROVIDER table contains a list of uniquely identified healthcare providers. These are individuals providing hands-on healthcare to patients, such as physicians, nurses, midwives, physical therapists etc.
User Guide
Many sources do not make a distinction between individual and institutional providers. The PROVIDER table contains the individual providers. If the source, instead of uniquely identifying individual providers, only provides limited information such as specialty, generic or ‘pooled’ Provider records are listed in the PROVIDER table.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
provider_id | It is assumed that every provider with a different unique identifier is in fact a different person and should be treated independently. | This identifier can be the original id from the source data provided it is an integer, otherwise it can be an autogenerated number. | integer | Yes | Yes | No | |||
provider_name | This field is not necessary as it is not necessary to have the actual identity of the Provider. Rather, the idea is to uniquely and anonymously identify providers of care across the database. | varchar(255) | No | No | No | ||||
npi | This is the National Provider Number issued to health care providers in the US by the Centers for Medicare and Medicaid Services (CMS). | varchar(20) | No | No | No | ||||
dea | This is the identifier issued by the DEA, a US federal agency, that allows a provider to write prescriptions for controlled substances. | varchar(20) | No | No | No | ||||
specialty_concept_id | This field either represents the most common specialty that occurs in the data or the most specific concept that represents all specialties listed, should the provider have more than one. This includes physician specialties such as internal medicine, emergency medicine, etc. and allied health professionals such as nurses, midwives, and pharmacists. | If a Provider has more than one Specialty, there are two options: 1. Choose a concept_id which is a common ancestor to the multiple specialties, or, 2. Choose the specialty that occurs most often for the provider. Concepts in this field should be Standard with a domain of Provider. | integer | No | No | Yes | CONCEPT | ||
care_site_id | This is the CARE_SITE_ID for the location that the provider primarily practices in. | If a Provider has more than one Care Site, the main or most often exerted CARE_SITE_ID should be recorded. | integer | No | No | Yes | CARE_SITE | ||
year_of_birth | integer | No | No | No | |||||
gender_concept_id | This field represents the recorded gender of the provider in the source data. | If given, put a concept from the gender domain representing the recorded gender of the provider. | integer | No | No | Yes | CONCEPT | Gender | |
provider_source_value | Use this field to link back to providers in the source data. This is typically used for error checking of ETL logic. | Some use cases require the ability to link back to providers in the source data. This field allows for the storing of the provider identifier as it appears in the source. | varchar(50) | No | No | No | |||
specialty_source_value | This is the kind of provider or specialty as it appears in the source data. This includes physician specialties such as internal medicine, emergency medicine, etc. and allied health professionals such as nurses, midwives, and pharmacists. | Put the kind of provider as it appears in the source data. This field is up to the discretion of the ETL-er as to whether this should be the coded value from the source or the text description of the lookup value. | varchar(50) | No | No | No | |||
specialty_source_concept_id | This is often zero as many sites use propietary codes to store physician speciality. | If the source data codes provider specialty in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT | ||
gender_source_value | This is provider’s gender as it appears in the source data. | Put the provider’s gender as it appears in the source data. This field is up to the discretion of the ETL-er as to whether this should be the coded value from the source or the text description of the lookup value. | varchar(50) | No | No | No | |||
gender_source_concept_id | This is often zero as many sites use propietary codes to store provider gender. | If the source data codes provider gender in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT |
Table Description
The PAYER_PLAN_PERIOD table captures details of the period of time that a Person is continuously enrolled under a specific health Plan benefit structure from a given Payer. Each Person receiving healthcare is typically covered by a health benefit plan, which pays for (fully or partially), or directly provides, the care. These benefit plans are provided by payers, such as health insurances or state or government agencies. In each plan the details of the health benefits are defined for the Person or her family, and the health benefit Plan might change over time typically with increasing utilization (reaching certain cost thresholds such as deductibles), plan availability and purchasing choices of the Person. The unique combinations of Payer organizations, health benefit Plans and time periods in which they are valid for a Person are recorded in this table.
User Guide
A Person can have multiple, overlapping, Payer_Plan_Periods in this table. For example, medical and drug coverage in the US can be represented by two Payer_Plan_Periods. The details of the benefit structure of the Plan is rarely known, the idea is just to identify that the Plans are different.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
payer_plan_period_id | A unique identifier for each unique combination of a Person, Payer, Plan, and Period of time. | integer | Yes | Yes | Yes | PERSON | |||
person_id | The Person covered by the Plan. | A single Person can have multiple, overlapping, PAYER_PLAN_PERIOD records | integer | Yes | No | Yes | PERSON | ||
payer_plan_period_start_date | Start date of Plan coverage. | date | Yes | No | No | ||||
payer_plan_period_end_date | End date of Plan coverage. | date | Yes | No | No | ||||
payer_concept_id | This field represents the organization who reimburses the provider which administers care to the Person. | Map the Payer directly to a standard CONCEPT_ID. If one does not exists please contact the vocabulary team. There is no global controlled vocabulary available for this information. The point is to stratify on this information and identify if Persons have the same payer, though the name of the Payer is not necessary. | integer | No | No | Yes | CONCEPT | ||
payer_source_value | This is the Payer as it appears in the source data. | varchar(50) | No | No | No | ||||
payer_source_concept_id | If the source data codes the Payer in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT | |||
plan_concept_id | This field represents the specific health benefit Plan the Person is enrolled in. | Map the Plan directly to a standard CONCEPT_ID. If one does not exists please contact the vocabulary team. There is no global controlled vocabulary available for this information. The point is to stratify on this information and identify if Persons have the same health benefit Plan though the name of the Plan is not necessary. | integer | No | No | Yes | CONCEPT | ||
plan_source_value | This is the health benefit Plan of the Person as it appears in the source data. | varchar(50) | No | No | No | ||||
plan_source_concept_id | If the source data codes the Plan in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT | |||
sponsor_concept_id | This field represents the sponsor of the Plan who finances the Plan. This includes self-insured, small group health plan and large group health plan. | Map the sponsor directly to a standard CONCEPT_ID. If one does not exists please contact the vocabulary team. There is no global controlled vocabulary available for this information. The point is to stratify on this information and identify if Persons have the same sponsor though the name of the sponsor is not necessary. | integer | No | No | Yes | CONCEPT | ||
sponsor_source_value | The Plan sponsor as it appears in the source data. | varchar(50) | No | No | No | ||||
sponsor_source_concept_id | If the source data codes the sponsor in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT | |||
family_source_value | The common identifier for all people (often a family) that covered by the same policy. | Often these are the common digits of the enrollment id of the policy members. | varchar(50) | No | No | No | |||
stop_reason_concept_id | This field represents the reason the Person left the Plan, if known. | Map the stop reason directly to a standard CONCEPT_ID. If one does not exists please contact the vocabulary team. There is no global controlled vocabulary available for this information. | integer | No | No | Yes | CONCEPT | ||
stop_reason_source_value | The Plan stop reason as it appears in the source data. | varchar(50) | No | No | No | ||||
stop_reason_source_concept_id | If the source data codes the stop reason in an OMOP supported vocabulary store the concept_id here. | integer | No | No | Yes | CONCEPT |
Table Description
The COST table captures records containing the cost of any medical event recorded in one of the OMOP clinical event tables such as DRUG_EXPOSURE, PROCEDURE_OCCURRENCE, VISIT_OCCURRENCE, VISIT_DETAIL, DEVICE_OCCURRENCE, OBSERVATION or MEASUREMENT.
Each record in the cost table account for the amount of money transacted for the clinical event. So, the COST table may be used to represent both receivables (charges) and payments (paid), each transaction type represented by its COST_CONCEPT_ID. The COST_TYPE_CONCEPT_ID field will use concepts in the Standardized Vocabularies to designate the source (provenance) of the cost data. A reference to the health plan information in the PAYER_PLAN_PERIOD table is stored in the record for information used for the adjudication system to determine the persons benefit for the clinical event.
User Guide
When dealing with summary costs, the cost of the goods or services the provider provides is often not known directly, but derived from the hospital charges multiplied by an average cost-to-charge ratio.
ETL Conventions
One cost record is generated for each response by a payer. In a claims databases, the payment and payment terms reported by the payer for the goods or services billed will generate one cost record. If the source data has payment information for more than one payer (i.e. primary insurance and secondary insurance payment for one entity), then a cost record is created for each reporting payer. Therefore, it is possible for one procedure to have multiple cost records for each payer, but typically it contains one or no record per entity. Payer reimbursement cost records will be identified by using the PAYER_PLAN_ID field. Drug costs are composed of ingredient cost (the amount charged by the wholesale distributor or manufacturer), the dispensing fee (the amount charged by the pharmacy and the sales tax).
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
cost_id | INTEGER | Yes | Yes | No | |||||
cost_event_id | INTEGER | Yes | No | No | |||||
cost_domain_id | VARCHAR(20) | Yes | No | Yes | DOMAIN | ||||
cost_type_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
currency_concept_id | integer | No | No | Yes | CONCEPT | ||||
total_charge | FLOAT | No | No | No | |||||
total_cost | FLOAT | No | No | No | |||||
total_paid | FLOAT | No | No | No | |||||
paid_by_payer | FLOAT | No | No | No | |||||
paid_by_patient | FLOAT | No | No | No | |||||
paid_patient_copay | FLOAT | No | No | Yes | CONCEPT | ||||
paid_patient_coinsurance | FLOAT | No | No | No | |||||
paid_patient_deductible | FLOAT | No | No | No | |||||
paid_by_primary | FLOAT | No | No | No | |||||
paid_ingredient_cost | FLOAT | No | No | No | |||||
paid_dispensing_fee | FLOAT | No | No | No | |||||
payer_plan_period_id | INTEGER | No | No | No | |||||
amount_allowed | FLOAT | No | No | No | |||||
revenue_code_concept_id | integer | No | No | Yes | CONCEPT | ||||
revenue_code_source_value | Revenue codes are a method to charge for a class of procedures and conditions in the U.S. hospital system. | VARCHAR(50) | No | No | No | ||||
drg_concept_id | integer | No | No | Yes | CONCEPT | ||||
drg_source_value | Diagnosis Related Groups are US codes used to classify hospital cases into one of approximately 500 groups. | VARCHAR(3) | No | No | No |
Table Description
A Drug Era is defined as a span of time when the Person is assumed to be exposed to a particular active ingredient. A Drug Era is not the same as a Drug Exposure: Exposures are individual records corresponding to the source when Drug was delivered to the Person, while successive periods of Drug Exposures are combined under certain rules to produce continuous Drug Eras.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
drug_era_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
drug_concept_id | integer | Yes | No | Yes | CONCEPT | Drug | Ingredient | ||
drug_era_start_date | The Drug Era Start Date is the start date of the first Drug Exposure for a given ingredient. (NOT RIGHT) | datetime | Yes | No | No | ||||
drug_era_end_date | The Drug Era End Date is the end date of the last Drug Exposure. The End Date of each Drug Exposure is either taken from the field drug_exposure_end_date or, as it is typically not available, inferred using the following rules: For pharmacy prescription data, the date when the drug was dispensed plus the number of days of supply are used to extrapolate the End Date for the Drug Exposure. Depending on the country-specific healthcare system, this supply information is either explicitly provided in the day_supply field or inferred from package size or similar information. For Procedure Drugs, usually the drug is administered on a single date (i.e., the administration date). A standard Persistence Window of 30 days (gap, slack) is permitted between two subsequent such extrapolated DRUG_EXPOSURE records to be considered to be merged into a single Drug Era. (ARENT WE REQUIRING TO USE DRUG_EXPOSURE_END_DATE NOW????) | datetime | Yes | No | No | ||||
drug_exposure_count | integer | No | No | No | |||||
gap_days | The Gap Days determine how many total drug-free days are observed between all Drug Exposure events that contribute to a DRUG_ERA record. It is assumed that the drugs are “not stockpiled” by the patient, i.e. that if a new drug prescription or refill is observed (a new DRUG_EXPOSURE record is written), the remaining supply from the previous events is abandoned. The difference between Persistence Window and Gap Days is that the former is the maximum drug-free time allowed between two subsequent DRUG_EXPOSURE records, while the latter is the sum of actual drug-free days for the given Drug Era under the above assumption of non-stockpiling. | integer | No | No | No |
Table Description
A Dose Era is defined as a span of time when the Person is assumed to be exposed to a constant dose of a specific active ingredient.
ETL Conventions
Dose Eras will be derived from records in the DRUG_EXPOSURE table and the Dose information from the DRUG_STRENGTH table using a standardized algorithm. Dose Form information is not taken into account. So, if the patient changes between different formulations, or different manufacturers with the same formulation, the Dose Era is still spanning the entire time of exposure to the Ingredient. The total dose of a DRUG_EXPOSURE record is calculated with the help of the DRUG_STRENGTH table containing the dosage information for each drug as following:
5 Tablets and other fixed amount formulations Example: Acetaminophen (Paracetamol) 500 mg, 20 tablets. DRUG_STRENGTH The denominator_unit is empty DRUG_EXPOSURE The quantity refers to number of pieces, e.g. tablets In the example: 20 Ingredient dose= quantity x amount_value [amount_unit_concept_id] Acetaminophen dose = 20 x 500mg = 10,000mg 6 Puffs of an inhaler Note: There is no difference to use case 1 besides that the DRUG_STRENGTH table may put {actuat} in the denominator unit. In this case the strength is provided in the numerator. DRUG_STRENGTH The denominator_unit is {actuat} DRUG_EXPOSURE The quantity refers to the number of pieces, e.g. puffs Ingredient dose= quantity x numerator_value [numerator_unit_concept_id] 7 Quantified Drugs which are formulated as a concentration Example: The Clinical Drug is Acetaminophen 250 mg/mL in a 5mL oral suspension. The Quantified Clinical Drug would have 1250 mg / 5 ml in the DRUG_STRENGTH table. Two suspensions are dispensed. DRUG_STRENGTH The denominator_unit is either mg or mL. The denominator_value might be different from 1. DRUG_EXPOSURE The quantity refers to a fraction or, multiple of the pack. Example: 2 Ingredient dose= quantity x numerator_value [numerator_unit_concept_id] Acetaminophen dose = 2 x 1250mg = 2500mg 8 Drugs with the total amount provided in quantity, e.g. chemotherapeutics Example: 42799258 “Benzyl Alcohol 0.1 ML/ML / Pramoxine hydrochloride 0.01 MG/MG Topical Gel” dispensed in a 1.25oz pack. DRUG_STRENGTH The denominator_unit is either mg or mL. Example: Benzyl Alcohol in mL and Pramoxine hydrochloride in mg DRUG_EXPOSURE The quantity refers to mL or g. Example: 1.25 x 30 (conversion factor oz -> mL) = 37 Ingredient dose= quantity x numerator_value [numerator_unit_concept_id] Benzyl Alcohol dose = 37 x 0.1mL = 3.7mL Pramoxine hydrochloride dose = 37 x 0.01mg x 1000 = 370mg Note: The analytical side should check the denominator in the DRUG_STRENGTH table. As mg is used for the second ingredient the factor 1000 will be applied to convert between g and mg. 9 Compounded drugs Example: Ibuprofen 20%/Piroxicam 1% Cream, 30ml in 5ml tubes. DRUG_STRENGTH We need entries for the ingredients of Ibuprofen and Piroxicam, probably with an amount_value of 1 and a unit of mg. DRUG_EXPOSURE The quantity refers to the total amount of the compound. Use one record in the DRUG_EXPOSURE table for each compound. Example: 20% Ibuprofen of 30ml = 6mL, 1% Piroxicam of 30ml = 0.3mL Ingredient dose= Depends on the drugs involved: One of the use cases above. Ibuprofen dose = 6 x 1mg x 1000 = 6000mg Piroxicam dose = 0.3 x 1mg x 1000 = 300mg Note: The analytical side determines that the denominator for both ingredients in the DRUG_STRENGTH table is mg and applies the factor 1000 to convert between mL/g and mg. 10 Drugs with the active ingredient released over time, e.g. patches Example: Ethinyl Estradiol 0.000833 MG/HR / norelgestromin 0.00625 MG/HR Weekly Transdermal Patch DRUG_STRENGTH The denominator units refer to hour. Example: Ethinyl Estradiol 0.000833 mg/h / norelgestromin 0.00625 mg/h DRUG_EXPOSURE The quantity refers to the number of pieces. Example: 1 patch Ingredient rate= numerator_value [numerator_unit_concept_id] Ethinyl Estradiol rate = 0.000833 mg/h norelgestromin rate 0.00625 mg/h Note: This can be converted to a daily dosage by multiplying it with 24. (Assuming 1 patch at a time for at least 24 hours)
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
dose_era_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | Yes | PERSON | ||||
drug_concept_id | integer | Yes | No | Yes | CONCEPT | Drug | Ingredient | ||
unit_concept_id | integer | Yes | No | Yes | CONCEPT | Unit | |||
dose_value | float | Yes | No | No | |||||
dose_era_start_date | datetime | Yes | No | No | |||||
dose_era_end_date | datetime | Yes | No | No |
Table Description
A Condition Era is defined as a span of time when the Person is assumed to have a given condition. Similar to Drug Eras, Condition Eras are chronological periods of Condition Occurrence. Combining individual Condition Occurrences into a single Condition Era serves two purposes:
It allows aggregation of chronic conditions that require frequent ongoing care, instead of treating each Condition Occurrence as an independent event. It allows aggregation of multiple, closely timed doctor visits for the same Condition to avoid double-counting the Condition Occurrences. For example, consider a Person who visits her Primary Care Physician (PCP) and who is referred to a specialist. At a later time, the Person visits the specialist, who confirms the PCP’s original diagnosis and provides the appropriate treatment to resolve the condition. These two independent doctor visits should be aggregated into one Condition Era.
ETL Conventions
Each Condition Era corresponds to one or many Condition Occurrence records that form a continuous interval. The condition_concept_id field contains Concepts that are identical to those of the CONDITION_OCCURRENCE table records that make up the Condition Era. In contrast to Drug Eras, Condition Eras are not aggregated to contain Conditions of different hierarchical layers. The Condition Era Start Date is the start date of the first Condition Occurrence. The Condition Era End Date is the end date of the last Condition Occurrence. Condition Eras are built with a Persistence Window of 30 days, meaning, if no occurrence of the same condition_concept_id happens within 30 days of any one occurrence, it will be considered the condition_era_end_date.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
condition_era_id | integer | Yes | Yes | No | |||||
person_id | integer | Yes | No | No | PERSON | ||||
condition_concept_id | integer | Yes | No | Yes | CONCEPT | Condition | |||
condition_era_start_date | datetime | Yes | No | No | |||||
condition_era_end_date | datetime | Yes | No | No | |||||
condition_occurrence_count | integer | No | No | No |
Table Description
The Standardized Vocabularies contains records, or Concepts, that uniquely identify each fundamental unit of meaning used to express clinical information in all domain tables of the CDM. Concepts are derived from vocabularies, which represent clinical information across a domain (e.g. conditions, drugs, procedures) through the use of codes and associated descriptions. Some Concepts are designated Standard Concepts, meaning these Concepts can be used as normative expressions of a clinical entity within the OMOP Common Data Model and within standardized analytics. Each Standard Concept belongs to one domain, which defines the location where the Concept would be expected to occur within data tables of the CDM.
Concepts can represent broad categories (like ‘Cardiovascular disease’), detailed clinical elements (‘Myocardial infarction of the anterolateral wall’) or modifying characteristics and attributes that define Concepts at various levels of detail (severity of a disease, associated morphology, etc.).
Records in the Standardized Vocabularies tables are derived from national or international vocabularies such as SNOMED-CT, RxNorm, and LOINC, or custom Concepts defined to cover various aspects of observational data analysis. For a detailed description of these vocabularies, their use in the OMOP CDM and their relationships to each other please refer to the specifications.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
concept_id | integer | Yes | Yes | No | |||||
concept_name | varchar(255) | Yes | No | No | |||||
domain_id | varchar(20) | Yes | No | Yes | DOMAIN | ||||
vocabulary_id | varchar(20) | Yes | No | Yes | VOCABULARY | ||||
concept_class_id | varchar(20) | Yes | No | Yes | CONCEPT_CLASS | ||||
standard_concept | varchar(1) | No | No | No | |||||
concept_code | varchar(50) | Yes | No | No | |||||
valid_start_date | date | Yes | No | No | |||||
valid_end_date | date | Yes | No | No | |||||
invalid_reason | varchar(1) | No | No | No |
Table Description
The VOCABULARY table includes a list of the Vocabularies collected from various sources or created de novo by the OMOP community. This reference table is populated with a single record for each Vocabulary source and includes a descriptive name and other associated attributes for the Vocabulary.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
vocabulary_id | varchar(20) | Yes | Yes | No | |||||
vocabulary_name | varchar(255) | Yes | No | No | |||||
vocabulary_reference | varchar(255) | Yes | No | No | |||||
vocabulary_version | varchar(255) | No | No | No | |||||
vocabulary_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The DOMAIN table includes a list of OMOP-defined Domains the Concepts of the Standardized Vocabularies can belong to. A Domain defines the set of allowable Concepts for the standardized fields in the CDM tables. For example, the “Condition” Domain contains Concepts that describe a condition of a patient, and these Concepts can only be stored in the condition_concept_id field of the CONDITION_OCCURRENCE and CONDITION_ERA tables. This reference table is populated with a single record for each Domain and includes a descriptive name for the Domain.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
domain_id | varchar(20) | Yes | Yes | No | |||||
domain_name | varchar(255) | Yes | No | No | |||||
domain_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The CONCEPT_CLASS table is a reference table, which includes a list of the classifications used to differentiate Concepts within a given Vocabulary. This reference table is populated with a single record for each Concept Class.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
concept_class_id | varchar(20) | Yes | Yes | No | |||||
concept_class_name | varchar(255) | Yes | No | No | |||||
concept_class_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The CONCEPT_RELATIONSHIP table contains records that define direct relationships between any two Concepts and the nature or type of the relationship. Each type of a relationship is defined in the RELATIONSHIP table.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
concept_id_1 | integer | Yes | No | Yes | CONCEPT | ||||
concept_id_2 | integer | Yes | No | Yes | CONCEPT | ||||
relationship_id | varchar(20) | Yes | No | Yes | RELATIONSHIP | ||||
valid_start_date | date | Yes | No | No | |||||
valid_end_date | date | Yes | No | No | |||||
invalid_reason | varchar(1) | No | No | No |
Table Description
The RELATIONSHIP table provides a reference list of all types of relationships that can be used to associate any two concepts in the CONCEPT_RELATIONSHP table.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
relationship_id | varchar(20) | Yes | Yes | No | |||||
relationship_name | varchar(255) | Yes | No | No | |||||
is_hierarchical | varchar(1) | Yes | No | No | |||||
defines_ancestry | varchar(1) | Yes | No | No | |||||
reverse_relationship_id | varchar(20) | Yes | No | No | |||||
relationship_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The CONCEPT_SYNONYM table is used to store alternate names and descriptions for Concepts.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
concept_id | integer | Yes | No | Yes | CONCEPT | ||||
concept_synonym_name | varchar(1000) | Yes | No | No | |||||
language_concept_id | integer | Yes | No | Yes | CONCEPT |
Table Description
The CONCEPT_ANCESTOR table is designed to simplify observational analysis by providing the complete hierarchical relationships between Concepts. Only direct parent-child relationships between Concepts are stored in the CONCEPT_RELATIONSHIP table. To determine higher level ancestry connections, all individual direct relationships would have to be navigated at analysis time. The CONCEPT_ANCESTOR table includes records for all parent-child relationships, as well as grandparent-grandchild relationships and those of any other level of lineage. Using the CONCEPT_ANCESTOR table allows for querying for all descendants of a hierarchical concept. For example, drug ingredients and drug products are all descendants of a drug class ancestor.
This table is entirely derived from the CONCEPT, CONCEPT_RELATIONSHIP and RELATIONSHIP tables.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
ancestor_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
descendant_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
min_levels_of_separation | integer | Yes | No | No | |||||
max_levels_of_separation | integer | Yes | No | No |
Table Description
The source to concept map table is a legacy data structure within the OMOP Common Data Model, recommended for use in ETL processes to maintain local source codes which are not available as Concepts in the Standardized Vocabularies, and to establish mappings for each source code into a Standard Concept as target_concept_ids that can be used to populate the Common Data Model tables. The SOURCE_TO_CONCEPT_MAP table is no longer populated with content within the Standardized Vocabularies published to the OMOP community.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
source_code | varchar(50) | Yes | No | No | |||||
source_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
source_vocabulary_id | varchar(20) | Yes | No | No | |||||
source_code_description | varchar(255) | No | No | No | |||||
target_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
target_vocabulary_id | varchar(20) | Yes | No | Yes | VOCABULARY | ||||
valid_start_date | date | Yes | No | No | |||||
valid_end_date | date | Yes | No | No | |||||
invalid_reason | varchar(1) | No | No | No |
Table Description
The DRUG_STRENGTH table contains structured content about the amount or concentration and associated units of a specific ingredient contained within a particular drug product. This table is supplemental information to support standardized analysis of drug utilization.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
drug_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
ingredient_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
amount_value | float | No | No | No | |||||
amount_unit_concept_id | integer | No | No | Yes | CONCEPT | ||||
numerator_value | float | No | No | No | |||||
numerator_unit_concept_id | integer | No | No | Yes | CONCEPT | ||||
denominator_value | float | No | No | No | |||||
denominator_unit_concept_id | integer | No | No | Yes | CONCEPT | ||||
box_size | integer | No | No | No | |||||
valid_start_date | date | Yes | No | No | |||||
valid_end_date | date | Yes | No | No | |||||
invalid_reason | varchar(1) | No | No | No |
Table Description
The COHORT_DEFINITION table contains records defining a Cohort derived from the data through the associated description and syntax and upon instantiation (execution of the algorithm) placed into the COHORT table. Cohorts are a set of subjects that satisfy a given combination of inclusion criteria for a duration of time. The COHORT_DEFINITION table provides a standardized structure for maintaining the rules governing the inclusion of a subject into a cohort, and can store operational programming code to instantiate the cohort within the OMOP Common Data Model.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
cohort_definition_id | integer | Yes | No | No | |||||
cohort_definition_name | varchar(255) | Yes | No | No | |||||
cohort_definition_description | varchar(MAX) | No | No | No | |||||
definition_type_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
cohort_definition_syntax | varchar(MAX) | No | No | No | |||||
subject_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
cohort_initiation_date | date | No | No | No |
Table Description
The ATTRIBUTE_DEFINITION table contains records to define each attribute through an associated description and syntax. Attributes are derived elements that can be selected or calculated for a subject within a cohort. The ATTRIBUTE_DEFINITION table provides a standardized structure for maintaining the rules governing the calculation of covariates for a subject in a cohort, and can store operational programming code to instantiate the attributes for a given cohort within the OMOP Common Data Model.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
attribute_definition_id | integer | Yes | No | No | |||||
attribute_name | varchar(255) | Yes | No | No | |||||
attribute_description | varchar(MAX) | No | No | No | |||||
attribute_type_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
attribute_syntax | varchar(MAX) | No | No | No |
Table Description
The METADATA table contains metadata information about a dataset that has been transformed to the OMOP Common Data Model.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
metadata_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
metadata_type_concept_id | integer | Yes | No | Yes | CONCEPT | ||||
name | varchar(250) | Yes | No | No | |||||
value_as_string | varchar(250) | No | No | No | |||||
value_as_concept_id | integer | No | No | Yes | CONCEPT | ||||
metadata_date | date | No | No | No | |||||
metadata_datetime | datetime | No | No | No |
Table Description
The CDM_SOURCE table contains detail about the source database and the process used to transform the data into the OMOP Common Data Model.
CDM Field | User Guide | ETL Conventions | Datatype | Required | Primary Key | Foreign Key | FK Table | FK Domain | FK Class |
---|---|---|---|---|---|---|---|---|---|
cdm_source_name | varchar(255) | Yes | No | No | |||||
cdm_source_abbreviation | varchar(25) | No | No | No | |||||
cdm_holder | varchar(255) | No | No | No | |||||
source_description | varchar(MAX) | No | No | No | |||||
source_documentation_reference | varchar(255) | No | No | No | |||||
cdm_etl_reference | varchar(255) | No | No | No | |||||
source_release_date | date | No | No | No | |||||
cdm_release_date | date | No | No | No | |||||
cdm_version | varchar(10) | No | No | No | |||||
vocabulary_version | varchar(20) | No | No | No |
Table Description
The death domain contains the clinical event for how and when a Person dies. A person can have up to one record if the source system contains evidence about the Death, such as: