From 7411c499d9ff1fe93ebe71446d441aeb1fd6342a Mon Sep 17 00:00:00 2001 From: Erica Voss Date: Thu, 11 Oct 2018 09:53:27 -0400 Subject: [PATCH] CDM v6.0 updates --- Background/Background.md | 8 +- Background/Data-Model-Conventions.md | 79 ++--- Background/Design-Principles.md | 6 +- Frequently-Asked-Questions.md | 18 +- License.md | 2 +- .../COHORT.md | 0 .../COHORT_DEFINITION.md | 0 ResultsSchema/Results-Schema.md | 4 + .../CONDITION_OCCURRENCE.md | 56 ++-- StandardizedClinicalDataTables/DEATH.md | 30 +- .../DEVICE_EXPOSURE.md | 52 ++-- .../DRUG_EXPOSURE.md | 76 ++--- .../FACT_RELATIONSHIP.md | 9 +- StandardizedClinicalDataTables/MEASUREMENT.md | 45 +-- StandardizedClinicalDataTables/NOTE.md | 142 ++------- StandardizedClinicalDataTables/NOTE_NLP.md | 42 +-- StandardizedClinicalDataTables/OBSERVATION.md | 39 ++- .../OBSERVATION_PERIOD.md | 19 +- StandardizedClinicalDataTables/PERSON.md | 35 ++- .../PROCEDURE_OCCURRENCE.md | 30 +- StandardizedClinicalDataTables/SPECIMEN.md | 13 +- .../SURVEY_CONDUCT.md | 40 +++ .../Standardized-Clinical-Data-Tables.md | 15 +- .../VISIT_DETAIL.md | 53 ++-- .../VISIT_OCCURRENCE.md | 50 ++- .../COHORT_ATTRIBUTE.md | 17 - StandardizedDerivedElements/CONDITION_ERA.md | 16 +- StandardizedDerivedElements/DOSE_ERA.md | 32 +- StandardizedDerivedElements/DRUG_ERA.md | 26 +- .../Standardized-Derived-Elements.md | 2 - StandardizedHealthEconomicsDataTables/COST.md | 107 ++++--- .../PAYER_PLAN_PERIOD.md | 30 +- .../CARE_SITE.md | 23 +- .../LOCATION.md | 18 +- .../LOCATION_HISTORY.md | 23 ++ .../PROVIDER.md | 21 +- .../Standardized-Health-System-Data-Tables.md | 3 +- StandardizedMetadata/CDM_SOURCE.md | 8 +- StandardizedMetadata/METADATA.md | 4 +- StandardizedMetadata/Standardized-Metadata.md | 7 +- .../ATTRIBUTE_DEFINITION.md | 14 - StandardizedVocabularies/CONCEPT.md | 34 +- StandardizedVocabularies/CONCEPT_ANCESTOR.md | 8 +- StandardizedVocabularies/CONCEPT_CLASS.md | 120 +------- .../CONCEPT_RELATIONSHIP.md | 15 +- StandardizedVocabularies/CONCEPT_SYNONYM.md | 8 +- StandardizedVocabularies/DOMAIN.md | 12 +- StandardizedVocabularies/DRUG_STRENGTH.md | 24 +- StandardizedVocabularies/RELATIONSHIP.md | 291 +----------------- .../SOURCE_TO_CONCEPT_MAP.md | 20 +- .../Standardized-Vocabularies.md | 4 +- StandardizedVocabularies/VOCABULARY.md | 82 +---- _Footer.md | 2 +- _Sidebar.md | 22 +- images/Thumbs.db | Bin 0 -> 50688 bytes images/entity_diagram.png | Bin 0 -> 735285 bytes 56 files changed, 724 insertions(+), 1132 deletions(-) rename {StandardizedDerivedElements => ResultsSchema}/COHORT.md (100%) rename {StandardizedVocabularies => ResultsSchema}/COHORT_DEFINITION.md (100%) create mode 100644 ResultsSchema/Results-Schema.md create mode 100644 StandardizedClinicalDataTables/SURVEY_CONDUCT.md delete mode 100644 StandardizedDerivedElements/COHORT_ATTRIBUTE.md create mode 100644 StandardizedHealthSystemDataTables/LOCATION_HISTORY.md delete mode 100644 StandardizedVocabularies/ATTRIBUTE_DEFINITION.md create mode 100644 images/Thumbs.db create mode 100644 images/entity_diagram.png diff --git a/Background/Background.md b/Background/Background.md index afb3004..d7b9388 100644 --- a/Background/Background.md +++ b/Background/Background.md @@ -4,9 +4,9 @@ The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership established to inform the appropriate use of observational healthcare databases for studying the effects of medical products. Over the course of the 5-year project and through its community of researchers from industry, government, and academia, OMOP successfully achieved its aims to: - - Conduct methodological research to empirically evaluate the performance of various analytical methods on their ability to identify true associations and avoid false findings, - - Develop tools and capabilities for transforming, characterizing, and analyzing disparate data sources across the health care delivery spectrum, and - - Establish a shared resource so that the broader research community can collaboratively advance the science. + - Conduct methodological research to empirically evaluate the performance of various analytical methods on their ability to identify true associations and avoid false findings + - Develop tools and capabilities for transforming, characterizing, and analysing disparate data sources across the health care delivery spectrum + - Establish a shared resource so that the broader research community can collaboratively advance the science The results of OMOP's research has been widely published and presented at scientific conferences, including [annual symposia](https://www.ohdsi.org/events/2018-ohdsi-symposium/). @@ -16,4 +16,4 @@ The community is actively using the OMOP Common Data Model for their various res The Observational Health Data Sciences and Informatics (OHDSI) has been established as a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics. The OHDSI collaborative includes all of the original OMOP research investigators, and will develop its tools using the OMOP Common Data Model. Learn more at [ohdsi.org](http://ohdsi.org). -The OMOP Common Data Model will continue to be an open-source, community standard for observational healthcare data. The model specifications and associated work products will be placed in the public domain, and the entire research community is encouraged to use these tools to support everybody's own research activities. +The OMOP Common Data Model will continue to be an open-source community standard for observational healthcare data. The model specifications and associated work products will be placed in the public domain, and the entire research community is encouraged to use these tools to support everybody's own research activities. diff --git a/Background/Data-Model-Conventions.md b/Background/Data-Model-Conventions.md index d003d2c..f4a8f22 100644 --- a/Background/Data-Model-Conventions.md +++ b/Background/Data-Model-Conventions.md @@ -1,10 +1,14 @@ -There are a number of implicit and explicit conventions that have been adopted in the CDM. Developers of methods that run methods against the CDM need to understand these conventions. +There are a number of implicit and explicit conventions that have been adopted in the CDM. Developers of methods that run against the CDM need to understand these conventions. + +### General conventions of schemas + +New to CDM v6.0 is the concept of schemas. This allows for more separation between read-only and writeable tables. The clinical data, event, and vocabulary tables are in the 'CDM' schema and tables that need to be manipulated by web-based tools or end users have moved to the 'Results' schema. Currently the only two tables in the 'Results' schema are COHORT and COHORT_DEFINITON, though likely more will be added over the course of v6.0 point releases. ### General conventions of data tables -The CDM is platform-independent. Data types are defined generically using ANSI SQL data types (VARCHAR, INTEGER, FLOAT, DATE, TIME, CLOB). Precision is provided only for VARCHAR. It reflects the minimal required string length and can be expanded within a CDM instantiation. The CDM does not prescribe the date and time format. Standard queries against CDM may vary for local instantiations and date/time configurations. +The CDM is platform-independent. Data types are defined generically using ANSI SQL data types (VARCHAR, INTEGER, FLOAT, DATE, DATETIME, CLOB). Precision is provided only for VARCHAR. It reflects the minimal required string length and can be expanded within a CDM instantiation. The CDM does not prescribe the date and datetime format. Standard queries against CDM may vary for local instantiations and date/datetime configurations. -In most cases, the first field in each table ends in "_id", containing a record identifier that can be used as a foreign key in another table. +In most cases, the first field in each table ends in '_ID', containing a record identifier that can be used as a foreign key in another table. ### General conventions of fields @@ -12,74 +16,75 @@ Variable names across all tables follow one convention: Notation|Description ---------------------|-------------------------------------------------- -|_SOURCE_VALUE|Verbatim information from the source data, typically used in ETL to map to CONCEPT_ID, and not to be used by any standard analytics. For example, condition_source_value = '787.02' was the ICD-9 code captured as a diagnosis from the administrative claim| -|_ID|Unique identifiers for key entities, which can serve as foreign keys to establish relationships across entities For example, person_id uniquely identifies each individual. visit_occurrence_id uniquely identifies a PERSON encounter at a point of care.| -|_CONCEPT_ID|Foreign key into the Standardized Vocabularies (i.e. the standard_concept attribute for the corresponding term is true), which serves as the primary basis for all standardized analytics For example, condition_concept_id = 31967 contains reference value for SNOMED concept of 'Nausea'| -|_SOURCE_CONCEPT_ID|Foreign key into the Standardized Vocabularies representing the concept and terminology used in the source data, when applicable For example, condition_source_concept_id = 35708202 denotes the concept of 'Nausea' in the MedDRA terminology; the analogous condition_concept_id might be 31967, since SNOMED-CT is the Standardized Vocabularies for most clinical diagnoses and findings.| -|_TYPE_CONCEPT_ID|Delineates the origin of the source information, standardized within the Standardized Vocabularies For example, drug_type_concept_id can allow analysts to discriminate between 'Pharmacy dispensing' and 'Prescription written'| +|_SOURCE_VALUE|Verbatim information from the source data, typically used in ETL to map to CONCEPT_ID, and not to be used by any standard analytics. For example, CONDITION_SOURCE_VALUE = '787.02' was the ICD-9 code captured as a diagnosis from the administrative claim.| +|_ID|Unique identifiers for key entities, which can serve as foreign keys to establish relationships across entities. For example, PERSON_ID uniquely identifies each individual. VISIT_OCCURRENCE_ID uniquely identifies a PERSON encounter at a point of care.| +|_CONCEPT_ID|Foreign key into the Standardized Vocabularies (i.e. the standard_concept attribute for the corresponding term is true), which serves as the primary basis for all standardized analytics. For example, CONDITION_CONCEPT_ID = [31967](http://athena.ohdsi.org/search-terms/terms/31967) contains the reference value for the SNOMED concept of 'Nausea'| +|_SOURCE_CONCEPT_ID|Foreign key into the Standardized Vocabularies representing the concept and terminology used in the source data, when applicable. For example, CONDITION_SOURCE_CONCEPT_ID = [45431665](http://athena.ohdsi.org/search-terms/terms/45431665) denotes the concept of 'Nausea' in the Read terminology; the analogous CONDITION_CONCEPT_ID might be 31967, since SNOMED-CT is the Standardized Vocabulary for most clinical diagnoses and findings.| +|_TYPE_CONCEPT_ID|Delineates the origin of the source information, standardized within the Standardized Vocabularies. For example, DRUG_TYPE_CONCEPT_ID can allow analysts to discriminate between 'Pharmacy dispensing' and 'Prescription written'| ### Representation of content through Concepts -In CDM data tables the meaning of the content of each record is represented using Concepts. Concepts are stored with their concept_id as foreign keys to the CONCEPT table in the Standardized Vocabularies, which contains Concepts necessary to describe the healthcare experience of a patient. If a Standard Concept does not exist or cannot be identified, the Concept with the concept_id 0 is used, representing a non-existing or unmappable concept. +In CDM data tables the meaning of the content of each record is represented using Concepts. Concepts are stored with their CONCEPT_ID as foreign keys to the CONCEPT table in the Standardized Vocabularies, which contains Concepts necessary to describe the healthcare experience of a patient. If a Standard Concept does not exist or cannot be identified, the Concept with the CONCEPT_ID 0 is used, representing a non-existing or un-mappable concept. -Records in the CONCEPT table contain all the detailed information about it (name, relationships, types etc.). Concepts, Concept Relationships and other information relating to Concepts contained in the tables of the Standardized Vocabularies. +Records in the CONCEPT table contain all the detailed information about it (name, domain, class etc.). Concepts, Concept Relationships and other information relating to Concepts is contained in the tables of the Standardized Vocabularies. ### Difference between Concept IDs and Source Values Many tables contain equivalent information multiple times: As a Source Value, a Source Concept and as a Standard Concept. - * Source Values contains the codes from public code systems such as ICD-9-CM, NDC, CPT-4 etc. or local controlled vocabularies (such as F for female and M for male) copied from the source data. Source Values are stored in the _source_value field in the data tables. - * Concepts are CDM-specific entities that represent the meaning of a clinical fact. Most concepts are based on code systems used in healthcare (called Source Concepts), while others were created de-novo (concept_code = "OMOP generated"). Concepts have unique IDs across all domains. - * Source Concepts are the concepts that represent the code used in the source. Source Concepts are only used for common healthcare code systems, but not for OMOP-generated Concepts. Source Concepts are stored in the source_concept_id field in the data tables. - * Standard Concepts are those concepts that are used to define the unique meaning of a clinical entity. For each entity there is one Standard Concept. Standard Concepts are typically drawn from existing public vocabulary sources. Concepts that have the equivalent meaning to a Standard Concept are mapped to the Standard Concept. Standard Concepts are referred to in the concept_id field of the data tables. + * Source Values contain the codes from public code systems such as ICD-9-CM, NDC, CPT-4 etc. or locally controlled vocabularies (such as F for female and M for male) copied from the source data. Source Values are stored in the *_SOURCE_VALUE fields in the data tables. + * Concepts are CDM-specific entities that represent the meaning of a clinical fact. Most concepts are based on code systems used in healthcare (called Source Concepts), while others were created de-novo (CONCEPT_CODE = 'OMOP generated'). Concepts have unique IDs across all domains. + * Source Concepts are the concepts that represent the code used in the source. Source Concepts are only used for common healthcare code systems, not for OMOP-generated Concepts. Source Concepts are stored in the *_SOURCE_CONCEPT_ID field in the data tables. + * Standard Concepts are those concepts that are used to define the unique meaning of a clinical entity. For each entity there is one Standard Concept. Standard Concepts are typically drawn from existing public vocabulary sources. Concepts that have the equivalent meaning to a Standard Concept are mapped to the Standard Concept. Standard Concepts are referred to in the _CONCEPT_ID field of the data tables. Source Values are only provided for convenience and quality assurance (QA) purposes. Source Values and Source Concepts are optional, while Standard Concepts are mandatory. Source Values may contain information that is only meaningful in the context of a specific data source. ### Difference between general Concepts and Type Concepts -Type Concepts (ending in _type_concept_id) and general Concepts (ending in _concept_id) are part of many tables. The former are special Concepts with the purpose of indicating where the data are derived from in the source. For example, the Type Concept field can be used to distinguish a DRUG_EXPOSURE record that is derived from a pharmacy-dispensing claim from one indicative of a prescription written in an electronic health record (EHR). +Type Concepts (ending in _TYPE_CONCEPT_ID) and general Concepts (ending in _CONCEPT_ID) are part of many tables. The former are special Concepts with the purpose of indicating where the data are derived from in the source. For example, the Type Concept field can be used to distinguish a DRUG_EXPOSURE record that is derived from a pharmacy-dispensing claim from one indicative of a prescription written in an electronic health record (EHR). ### Time span of available data -Data tables for clinical data contain a date stamp (ending in _date, _start_date or _end_date), indicating when that clinical event occurred. As a rule, no record can be outside of a valid OBSERVATION_PERIOD time period. Clinical information that relates to events happened prior the first OBSERVATION_PERIOD, it will be captured as a record in the OBSERVATION table of 'Medical history' (concept_id = 43054928), with the observation_date set to the first observation_period_start_date of that patient, and the value_as_concept_id set to the corresponding concept_id for the condition/drug/procedure that occurred in the past. No data occurring after the last observation_period_end_date can be valid records in the CDM. +Data tables for clinical data contain a datetime stamp (ending in _DATETIME, _START_DATETIME or _END_DATETIME), indicating when that clinical event occurred. As a rule, no record can be outside of a valid OBSERVATION_PERIOD time period. Clinical information that relates to events that happened prior to the first OBSERVATION_PERIOD will be captured as a record in the OBSERVATION table as 'Medical history' (CONCEPT_ID = 43054928), with the OBSERVATION_DATETIME set to the first OBSERVATION_PERIOD_START_DATE of that patient, and the VALUE_AS_CONCEPT_ID set to the corresponding CONCEPT_ID for the condition/drug/procedure that occurred in the past. No data occurring after the last OBSERVATION_PERIOD_END_DATE can be valid records in the CDM. + * When mapping source data to the CDM, if time is unknown the default time of 00:00:00 should be chosen. If a time of 00:00:00 is given in the source data it should be shifted to 00:00:01 ([THEMIS issue #10](https://github.com/OHDSI/Themis/issues/10)). ### Content of each table -For the tables of the main domains of the CDM it is imperative that used concepts are strictly limited to the domain. For example, the CONDITION_OCCURRENCE table contains only information about conditions (diagnoses, signs, symptoms), but no information about procedures. Not all source coding schemes adhere to such rules. For example, ICD-9-CM codes, which contain mostly diagnoses of human disease, also contain information about the status of patients having received a procedure: V20.3 "Newborn health supervision" defines a continuous procedure and is therefore stored in the PROCEDURE_OCCURRENCE table. +For the tables of the main domains of the CDM it is imperative that concepts used are strictly limited to the domain. For example, the CONDITION_OCCURRENCE table contains only information about conditions (diagnoses, signs, symptoms), but no information about procedures. Not all source coding schemes adhere to such rules. For example, ICD-9-CM codes, which contain mostly diagnoses of human disease, also contain information about the status of patients having received a procedure. The ICD-9-CM code V20.3 'Newborn health supervision' defines a continuous procedure and is therefore stored in the PROCEDURE_OCCURRENCE table. -### Differentiating between source values, source concept ids, and standard concept ids +### Differentiating between Source Values, Source Concept Ids, and Standard Concept Ids -Each table contains fields for source values, source concept ids, and standard concept ids. +Each table contains fields for Source Values, Source Concept Ids, and Standard Concept Ids. - * Source values are fields to maintain the verbatim information from the source database, are stored as unstructured text, and are generally not to be used by any standardized analytics. - * Source concept ids provide a repeatable representation of the source concept, when the source data are drawn from a commonly-used internationally-recognized vocabulary that has been distributed with the OMOP Common Data Model. Specific use cases where source vocabulary-specific analytics are required can be accommodated by the use of the source concept id fields, but these are generally not applicable across disparate data sources. The standard concept id fields are **strongly suggested** to be used in all standardized analytics, as specific vocabularies have been established within each data domain to facilitate standardization of both structure and content within the OMOP Common Data Model. + * Source Values are fields to maintain the verbatim information from the source database, stored as unstructured text, and are generally not to be used by any standardized analytics. There is no standardization for these fields and these columns can be used as the local CDM builders see fit. A typical example would be an ICD-9 code without the decimal from an administrative claim as condition_source_value = '78702' which is how it appeared in the source ([THEMIS issue #15](https://github.com/OHDSI/Themis/issues/15)). + * Source Concept Ids provide a repeatable representation of the source concept, when the source data are drawn from a commonly-used, internationally-recognized vocabulary that has been distributed with the OMOP Common Data Model. Specific use cases where source vocabulary-specific analytics are required can be accommodated by the use of the *_SOURCE_CONCEPT_ID fields, but these are generally not applicable across disparate data sources. The standard *_CONCEPT_ID fields are **strongly suggested** to be used in all standardized analytics, as specific vocabularies have been established within each data domain to facilitate standardization of both structure and content within the OMOP Common Data Model. The following provide conventions for processing source data using these three fields in each domain: When processing data where the source value is either free text or a reference to a coding scheme that is not contained within the Standardized Vocabularies: - - Map all source values directly to standard concept_ids. Store these mappings in the SOURCE_TO_CONCEPT_MAP table. - - If the source code is not mappable to a vocabulary term, the source_concept_id field is set to 0 + - Map all Source Values to the corresponding Standard CONCEPT_IDs. Store the CONCEPT_IDs in the TARGET_CONCEPT_ID field of the SOURCE_TO_CONCEPT_MAP table. + - If a CONCEPT_ID is not available for the source code, the TARGET_CONCEPT_ID field is set to 0. -When processing your data where source value is a reference to a coding scheme contained within the Standardized Vocabularies: +When processing your data where Source Value is a reference to a coding scheme contained within the Standardized Vocabularies: - - Map all your source values to the corresponding concept_ids in the source vocabulary. Store the result in the source_concept_id field. + - Find all CONCEPT_IDs in the Source Vocabulary that correspond to your Source Values. Store the result in the SOURCE_CONCEPT_ID field. - If the source code follows the same formatting as the distributed vocabulary, the mapping can be directly obtained from the CONCEPT table using the CONCEPT_CODE field. - - If the source code uses alternative formatting (ex. format has removed decimal point from ICD-9 codes), you will need to perform the formatting transformation within the ETL. In this case, you may wish to store the mappings from original codes to source concept ids in the SOURCE_TO_CONCEPT_MAP table. - - If the source code is not mappable to a vocabulary term, the source_concept_id field is set to 0 - - Use the CONCEPT_RELATIONSHIP table to identify the standard concept_id that corresponds to the source_concept_id in the domain. - - Each source_concept_id can have 1 or more Standard concept_id mapped to it. Each Standard concept_id belongs to only one primary domain, but when a source concept_id maps to multiple standard concept_ids, it is possible for that source_concept_id to result in records being produced across multiple domains. For example, ICD10CM Z34.00 'Encounter for supervision of normal first pregnancy, unspecified trimester' will be mapped to the SNOMED concept in the procedure domain 'Routine antenatal care' and the concept in the condition domain 'Primagravida'. It is also possible for one source_concept_id to map to multiple standard concept_ids within the same domain. For example, ICD-9 for 'Viral hepatitis with hepatic coma' maps to SNOMED 'Viral hepatitis' and a different concept for 'hepatic coma' in which case multiple condition_occurrence records will be generated for the one source value record. - - If the source_concept_id is not mappable to any standard concept_id, the concept_id field is set to 0. - - Write the data record into table(s) corresponding to the domain of the standard concept_id(s). - - If the source value is mapped to source_concept_id, but the source_concept_id is not mapped to a standard concept_id, then the domain for the data record, and hence it's table location, is determined by the domain_id field of the CONCEPT record the source_concept_id refers to. The standard concept_id is set to 0. - - If the source value cannot be mapped to a source_concept_id or standard concept_id, then direct the data record to the most appropriate CDM domain based on your local knowledge of the intent of the source data and associated value. For example, if the unmappable source_value came from a 'diagnosis' table, then in the absence of other information, you may choose to record that fact in the CONDITION_OCCURRENCE table. + - If the source code uses alternative formatting (ex. format has removed decimal point from ICD-9 codes), you will need to perform the formatting transformation within the ETL. In this case, you may wish to store the mappings from original codes to SOURCE_CONCEPT_IDs in the SOURCE_TO_CONCEPT_MAP table. + - If the source code is not found in a vocabulary, the SOURCE_CONCEPT_ID field is set to 0 + - Use the CONCEPT_RELATIONSHIP table to identify the Standard CONCEPT_ID that corresponds to the SOURCE_CONCEPT_ID in the domain. + - Each SOURCE_CONCEPT_ID can have 1 or more Standard CONCEPT_IDs mapped to it. Each Standard CONCEPT_ID belongs to only one primary domain but when a source CONCEPT_ID maps to multiple Standard CONCEPT_IDs, it is possible for that SOURCE_CONCEPT_ID to result in records being produced across multiple domains. For example, ICD-10-CM code Z34.00 'Encounter for supervision of normal first pregnancy, unspecified trimester' will be mapped to the SNOMED concept 'Routine antenatal care' in the procedure domain and the concept in the condition domain 'Primagravida'. It is also possible for one SOURCE_CONCEPT_ID to map to multiple Standard CONCEPT_IDs within the same domain. For example, ICD-9-CM code 070.43 'Hepatitis E with hepatic coma' maps to the SNOMED concept for 'Acute hepatitis E' and a second SNOMED concept for 'Hepatic coma', in which case multiple CONDITION_OCCURRENCE records will be generated for the one source value record. + - If the SOURCE_CONCEPT_ID is not mappable to any Standard CONCEPT_ID, the _CONCEPT_ID field is set to 0. + - Write the data record into the table(s) corresponding to the domain of the Standard CONCEPT_ID(s). + - If the Source Value has a SOURCE_CONCEPT_ID but the SOURCE_CONCEPT_ID is not mapped to a Standard CONCEPT_ID, then the domain for the data record, and hence it's table location, is determined by the DOMAIN_ID field of the CONCEPT record the SOURCE_CONCEPT_ID refers to. The Standard _CONCEPT_ID is set to 0. + - If the Source Value cannot be mapped to a SOURCE_CONCEPT_ID or Standard CONCEPT_ID, then direct the data record to the most appropriate CDM domain based on your local knowledge of the intent of the source data and associated value. For example, if the un-mappable Source Value came from a 'diagnosis' table then, in the absence of other information, you may choose to record that fact in the CONDITION_OCCURRENCE table. -Each standard concept_id field has a set of allowable concept_id values. The allowable values are defined by the domain of the concepts. For example, there is a domain concept of 'Gender', for which there are only two allowable standard concepts of practical use (8507- 'Male', 8532- 'Female') and one allowable generic concept to represent a standard notion of 'no information' (concept_id = 0). +Each Standard CONCEPT_ID field has a set of allowable CONCEPT_ID values. The allowable values are defined by the domain of the concepts. For example, there is a domain concept of 'Gender', for which there are only two allowable standard concepts of practical use (8507 - 'Male', 8532- 'Female') and one allowable generic concept to represent a standard notion of 'no information' (concept_id = 0). This 'no information' concept should be used when there is no mapping to a standard concept available or if there is no information available for that field. The exceptions are MEASUREMENT.VALUE_AS_CONCEPT_ID, OBSERVATION.VALUE_AS_CONCEPT_ID, MEASUREMENT.UNIT_CONCEPT_ID, OBSERVATION.UNIT_CONCEPT_ID, MEASUREMENT.OPERATOR_CONCEPT_ID, and OBSERVATION.MODIFIER_CONCEPT_ID, which can be NULL if the data do not contain the information ([THEMIS issue #11](https://github.com/OHDSI/Themis/issues/11)). -There is no constraint on allowed concept_ids within the source_concept_id fields. +There is no constraint on allowed CONCEPT_IDs within the SOURCE_CONCEPT_ID fields. -### Custom source_to_concept_maps +### Custom SOURCE_TO_CONCEPT_MAPs -When the source data uses coding systems that are not currently in the Standardized Vocabularies (e.g. ICPC codes for diagnoses), the convention is to store the mapping of such source codes to Standard Concepts in the SOURCE_TO_CONCEPT_MAP table. The codes used in the data source can be recorded in the source_value fields, but no source_concept_id will be available. +When the source data uses coding systems that are not currently in the Standardized Vocabularies (e.g. ICPC codes for diagnoses), the convention is to store the mapping of such source codes to Standard Concepts in the SOURCE_TO_CONCEPT_MAP table. The codes used in the data source can be recorded in the SOURCE_VALUE fields, but no SOURCE_CONCEPT_ID will be available. Custom source codes are not allowed to map to Standard Concepts that are marked as invalid. diff --git a/Background/Design-Principles.md b/Background/Design-Principles.md index 090a345..278e902 100644 --- a/Background/Design-Principles.md +++ b/Background/Design-Principles.md @@ -2,13 +2,13 @@ The CDM is designed to include all observational health data elements (experienc Therefore, the CDM is designed to store observational data to allow for research, under the following principles: - - **Suitability for purpose:** The CDM aims at providing data organized in a way optimal for analysis, rather than for the purpose of operational needs of health care providers or payers. + - **Suitability for purpose:** The CDM aims to provide data organized in a way optimal for analysis, rather than for the purpose of addressing the operational needs of health care providers or payers. - **Data protection:** All data that might jeopardize the identity and protection of patients, such as names, precise birthdays etc. are limited. Exceptions are possible where the research expressly requires more detailed information, such as precise birth dates for the study of infants. - **Design of domains:** The domains are modeled in a person-centric relational data model, where for each record the identity of the person and a date is captured as a minimum. - - **Rationale for domains:** Domains are identified and separately defined in an Entity-relationship model if they have an analysis use case and the domain has specific attributes that are not otherwise applicable. All other data can be preserved as an observation in an entity-attribute-value structure. + - **Rationale for domains:** Domains are identified and separately defined in an entity-relationship model if they have an analysis use case and the domain has specific attributes that are not otherwise applicable. All other data can be preserved as an observation in an entity-attribute-value structure. - **Standardized Vocabularies:** To standardize the content of those records, the CDM relies on the Standardized Vocabularies containing all necessary and appropriate corresponding standard healthcare concepts. - **Reuse of existing vocabularies:** If possible, these concepts are leveraged from national or industry standardization or vocabulary definition organizations or initiatives, such as the National Library of Medicine, the Department of Veterans' Affairs, the Center of Disease Control and Prevention, etc. - **Maintaining source codes:** Even though all codes are mapped to the Standardized Vocabularies, the model also stores the original source code to ensure no information is lost. - **Technology neutrality:** The CDM does not require a specific technology. It can be realized in any relational database, such as Oracle, SQL Server etc., or as SAS analytical datasets. - **Scalability:** The CDM is optimized for data processing and computational analysis to accommodate data sources that vary in size, including databases with up to hundreds of millions of persons and billions of clinical observations. - - **Backwards compatibility:** All changes from previous CDMs are clearly delineated. Older versions of the CDM can be easily created from this CDMv5, and no information is lost that was present previously. + - **Backwards compatibility:** All changes from previous CDMs are clearly delineated in the [github repository](https://github.com/OHDSI/CommonDataModel). Older versions of the CDM can be easily created from the CDMv5, and no information is lost that was present previously. diff --git a/Frequently-Asked-Questions.md b/Frequently-Asked-Questions.md index 249fb28..1c79bc7 100644 --- a/Frequently-Asked-Questions.md +++ b/Frequently-Asked-Questions.md @@ -31,9 +31,13 @@ Year_of_birth, month_of_birth, day_of_birth and birth_datetime are all fields in Standard Concepts are used to denote all clinical entities throughout the OMOP common data model, including gender, race, and ethnicity. Source values are mapped to Standard Concepts during the extract, transform, load (ETL) process of converting a database to the OMOP Common Data Model. These are then stored in the Gender_concept_id, Race_concept_id and Ethnicity_concept_id fields in the Person table. Because the standard concepts span across all clinical domains, and in keeping with Cimino’s ‘Desiderata for Controlled Medical Vocabularies in the Twenty-First Century’, the identifiers are unique, persistent nonsematic identifiers. Gender, for example, is stored as either 8532 (female) or 8507 (male) in gender_concept_id while the original value from the source is stored in gender_source_value (M, male, F, etc).. +**7. Are there conditions/procedures/drugs or other domains that should be masked or hidden in the CDM? + +The masking of information related to a person is dependent on the organization's privacy policies and may vary by data asset ([THEMIS issue #21](https://github.com/OHDSI/Themis/issues/21)). + **7. How is time-varying patient information such as location of residence addressed in the model?** -The OMOP common data model has been pragmatically defined based on the desired analytic use cases of the community, as well as the available types of data that community members have access to. Currently in the model, Each each person record has associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation. For example, the location or primary care provider is expected to have a unique value per person, even though in life these data may change over time. Typically, the most recent information is chosen though it is up to the person performing the transformation which value to store. +The OMOP common data model has been pragmatically defined based on the desired analytic use cases of the community, as well as the available types of data that community members have access to. Prior to CDM v6.0, each person record had associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation, like location and primary care provider. With the release of CDM v6.0, the Location_History table is now available to track the movements of people, care sites, and providers over time. Only the most recent location_id should be stored in the Person table to eliminate duplication, while the person's movements are stored in Location_History. Something like marital status is a little different as it is considered to be an observation rather than a demographic attribute. This means that it is housed in the Observation table rather than the Person table, giving the opportunity to store each change in status as a unique record. @@ -57,7 +61,7 @@ An observation period is considered as the time at which a patient is at-risk to If your data use any of the 55 source vocabularies that are currently supported, the mappings have been done for you. The full list is available from the open-source [ATHENA](http://athena.ohdsi.org/search-terms/terms) tool under the download tab (see below). You can choose to download the ten [vocabulary tables](https://github.com/OHDSI/CommonDataModel/wiki/Standardized-Vocabularies) from there as well – you will need a copy in your environment if you plan on building a CDM. -![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/Athena_download_box.png) +![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/images/Athena_download_box.png) The [ATHENA](http://athena.ohdsi.org/search-terms/terms) tool also allows you to explore the vocabulary before downloading it if you are curious about the mappings or if you have a specific code in mind and would like to know which standard concept it is associated with; just click on the search tab and type in a keyword to begin searching. @@ -65,7 +69,7 @@ The [ATHENA](http://athena.ohdsi.org/search-terms/terms) tool also allows you to Yes, all mappings are available in the [Concept_relationship](https://github.com/OHDSI/CommonDataModel/wiki/CONCEPT_RELATIONSHIP) table (which can be downloaded from [ATHENA](http://athena.ohdsi.org/search-terms/terms)). Each value in a supported source terminology is assigned a Concept_id (which is considered non-standard). Each Source_concept_id will have a mapping to a Standard_concept_id. For example: -![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/Sepsis_to_SNOMED.png) +![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/images/Sepsis_to_SNOMED.png) In this case the standard SNOMED concept 201826 for type 2 diabetes mellitus would be stored in the Condition_occurrence table as the Condition_concept_id and the ICD10CM concept 1567956 for type 2 diabetes mellitus would be stored as the Condition_source_concept_id. @@ -79,9 +83,11 @@ Yes, that is the beauty of the community! If you find a mapping in the vocabular **15. What if I have source codes that are specific to my site? How would these be mapped?** -We have a tool called [Usagi](https://github.com/OHDSI/Usagi) (pictured below) that is designed to create mappings between coding systems and the Vocabulary Standard Concepts by using concept names and synonyms to find potential matches. + In the OMOP Vocabulary there is an empty table called the Source_to_concept_map. It is a simple table structure that allows you to establish mapping(s) for each source code with a standard concept in the OMOP Vocabulary (TARGET_CONCEPT_ID). This work can be facilitated by the OHDSI tool [Usagi](https://github.com/OHDSI/Usagi) (pictured below) which searches for text similarity between your source code descriptions and the OMOP Vocabulary and exports mappings in a SOURCE_TO_CONCEPT_MAP table structure. Example Source_to_concept_map files can be found [here](https://github.com/OHDSI/ETL-CDMBuilder/tree/master/man/VOCABULARY_ADDITIONS). These generated Source_to_concept_map files are then loaded into the OMOP Vocabulary's empty Source_to_concept_map prior to processing the native data into the CDM so that the CDM builder can use them in a build. + +![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/images/Usagi.png) -![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/Usagi.png) +If an source code is not supported by the OMOP Vocabulary, one can create a new records in the CONCEPT table, however the CONCEPT_IDs should start >2000000000 so that it is easy to tell between the OMOP Vocabulary concepts and the site specific concepts. Once those concepts exist CONCEPT_RELATIONSHIPS can be generated to assign them to a standard terminologies, USAGI can facilitate this process as well ([THEMIS issue #22](https://github.com/OHDSI/Themis/issues/22)). **16. How are one-to-many mappings applied?** @@ -129,7 +135,7 @@ The community! All the tools are open source meaning that anyone can submit an i **24. Do the current tools allow a user to define a treatment gap (persistence window) of any value when creating treatment episodes?** Yes – the ATLAS tool allows you to specify a persistence window between drug exposures when defining a cohort (see image below). -![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/ATLAS_Persistence_Window.PNG) +![](https://github.com/OHDSI/CommonDataModel/blob/master/Documentation/CommonDataModel_Wiki_Files/images/ATLAS_Persistence_Window.PNG) **25. Can the current tools identify medication use during pregnancy?** diff --git a/License.md b/License.md index 64f5177..834478c 100644 --- a/License.md +++ b/License.md @@ -4,4 +4,4 @@ This work is based on work by the Observational Medical Outcomes Partnership (OM All derivative work after the OMOP CDM v4 specification is dedicated to the public domain. Observational Health Data Sciences and Informatics (OHDSI) has waived all copyright and related or neighboring rights to the extent allowed by law. -[![](http://www.ohdsi.org/web/wiki/lib/exe/fetch.php?cache=&w=88&h=31&tok=3977bb&media=documentation:cdm:cdm:public_domain.png)](http://creativecommons.org/publicdomain/zero/1.0/) +[![](http://www.ohdsi.org/web/wiki/lib/exe/fetch.php?cache=&w=88&h=31&tok=3977bb&media=documentation:cdm:cdm:public_domain.png)](http://creativecommons.org/publicdomain/zero/1.0/) diff --git a/StandardizedDerivedElements/COHORT.md b/ResultsSchema/COHORT.md similarity index 100% rename from StandardizedDerivedElements/COHORT.md rename to ResultsSchema/COHORT.md diff --git a/StandardizedVocabularies/COHORT_DEFINITION.md b/ResultsSchema/COHORT_DEFINITION.md similarity index 100% rename from StandardizedVocabularies/COHORT_DEFINITION.md rename to ResultsSchema/COHORT_DEFINITION.md diff --git a/ResultsSchema/Results-Schema.md b/ResultsSchema/Results-Schema.md new file mode 100644 index 0000000..07528b9 --- /dev/null +++ b/ResultsSchema/Results-Schema.md @@ -0,0 +1,4 @@ +[COHORT](https://github.com/OHDSI/CommonDataModel/wiki/COHORT) +[COHORT_DEFINITION](https://github.com/OHDSI/CommonDataModel/wiki/COHORT_DEFINITION) + +New to CDM v6.0 is the concept of schemas. This allows for more separation between read-only and writeable tables. The clinical data, event, and vocabulary tables are in the 'CDM' schema and tables that need to be manipulated by web-based tools or end users have moved to the 'Results' schema. Currently the only two tables in the 'Results' schema are COHORT and COHORT_DEFINITON, though likely more will be added over the course of v6.0 point releases. \ No newline at end of file diff --git a/StandardizedClinicalDataTables/CONDITION_OCCURRENCE.md b/StandardizedClinicalDataTables/CONDITION_OCCURRENCE.md index 4c52173..25dcb6e 100644 --- a/StandardizedClinicalDataTables/CONDITION_OCCURRENCE.md +++ b/StandardizedClinicalDataTables/CONDITION_OCCURRENCE.md @@ -1,41 +1,39 @@ -Conditions are records of a Person suggesting the presence of a disease or medical condition stated as a diagnosis, a sign or a symptom, which is either observed by a Provider or reported by the patient. Conditions are recorded in different sources and levels of standardization, for example: +Conditions are records of a Person suggesting the presence of a disease or medical condition stated as a diagnosis, a sign, or a symptom, which is either observed by a Provider or reported by the patient. Conditions are recorded in different sources and levels of standardization, for example: - * Medical claims data include diagnoses coded in ICD-9-CM that are submitted as part of a reimbursement claim for health services and - * EHRs may capture Person Conditions in the form of diagnosis codes or symptoms. + * Medical claims data include diagnoses coded in Source Vocabularies such as ICD-9-CM that are submitted as part of a reimbursement claim for health services + * EHRs may capture Person conditions in the form of diagnosis codes or symptoms Field|Required|Type|Description :--------------------------------|:--------|:------------|:------------------------------------------------------------ -| condition_occurrence_id | Yes | integer | A unique identifier for each Condition Occurrence event. | -| person_id | Yes | integer | A foreign key identifier to the Person who is experiencing the condition. The demographic details of that Person are stored in the PERSON table. | -| condition_concept_id | Yes | integer | A foreign key that refers to a Standard Condition Concept identifier in the Standardized Vocabularies. | -| condition_start_date | Yes | date | The date when the instance of the Condition is recorded. | -| condition_start_datetime | No | datetime | The date and time when the instance of the Condition is recorded. | +| condition_occurrence_id | Yes | bigint | A unique identifier for each Condition Occurrence event. | +| person_id | Yes | bigint | A foreign key identifier to the Person who is experiencing the condition. The demographic details of that Person are stored in the PERSON table. | +| condition_concept_id | Yes | integer | A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies belonging to the 'Condition' domain. | +| condition_start_date | No | date | The date when the instance of the Condition is recorded. | +| condition_start_datetime | Yes | datetime | The date and time when the instance of the Condition is recorded. | | condition_end_date | No | date | The date when the instance of the Condition is considered to have ended. | | condition_end_datetime | No | datetime | The date when the instance of the Condition is considered to have ended. | -| condition_type_concept_id | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the source data from which the condition was recorded, the level of standardization, and the type of occurrence. | -| stop_reason | No | varchar(20) | The reason that the condition was no longer present, as indicated in the source data. | +| condition_type_concept_id | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the source data from which the Condition was recorded, the level of standardization, and the type of occurrence. These belong to the 'Condition Type' vocabulary | +| condition_status_concept_id | Yes | integer | A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies reflecting the point of care at which the Condition was diagnosed. | +| stop_reason | No | varchar(20) | The reason that the Condition was no longer present, as indicated in the source data. | | provider_id | No | integer | A foreign key to the Provider in the PROVIDER table who was responsible for capturing (diagnosing) the Condition. | | visit_occurrence_id | No | integer | A foreign key to the visit in the VISIT_OCCURRENCE table during which the Condition was determined (diagnosed). | | visit_detail_id | No | integer | A foreign key to the visit in the VISIT_DETAIL table during which the Condition was determined (diagnosed). | -| condition_source_value | No | varchar(50) | The source code for the condition as it appears in the source data. This code is mapped to a standard condition concept in the Standardized Vocabularies and the original code is stored here for reference. | -| condition_source_concept_id | No | integer | A foreign key to a Condition Concept that refers to the code used in the source. | -| condition_status_source_value | No | varchar(50) | The source code for the condition status as it appears in the source data. | -| condition_status_concept_id | No | integer | A foreign key to the predefined Concept in the Standard Vocabulary reflecting the condition status | +| condition_source_value | No | varchar(50) | The source code for the Condition as it appears in the source data. This code is mapped to a Standard Condition Concept in the Standardized Vocabularies and the original code is stored here for reference. | +| condition_source_concept_id | Yes | integer | A foreign key to a Condition Concept that refers to the code used in the source. | +| condition_status_source_value | No | varchar(50) | The source code for the condition status as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is stored here for reference. | + ### Conventions - * Valid Condition Concepts belong to the "Condition" domain. - * Condition records are typically inferred from diagnostic codes recorded in the source data. Such code system, like ICD-9-CM, ICD-10-CM, Read etc., provide a comprehensive coverage of conditions. However, if the diagnostic code in the source does not define a condition, but rather an observation or a procedure, then such information is not stored in the CONDITION_OCCURRENCE table, but in the respective tables instead. - * Source Condition identifiers are mapped to Standard Concepts for Conditions in the Standardized Vocabularies. When the source code cannot be translated into a Standard Concept, a CONDITION_OCCURRENCE entry is stored with only the corresponding source_concept_id and source_value, while the condition_concept_id is set to 0. - * Family history and past diagnoses ("history of") are not recorded in the CONDITION_OCCURRENCE table. Instead, they are listed in the OBSERVATION table. - * Codes written in the process of establishing the diagnosis, such as "question of" of and "rule out", are not represented here. Instead, they are listed in the OBSERVATION table, if they are used for analyses. - * A Condition Occurrence Type is assigned based on the data source and type of condition attribute, for example: - * ICD-9-CM Primary Diagnosis from inpatient and outpatient Claims - * ICD-9-CM Secondary Diagnoses from inpatient and outpatient Claims - * Diagnoses or problems recorded in an EHR. - * The Stop Reason indicates why a Condition is no longer valid with respect to the purpose within the source data. Typical values include "Discharged", "Resolved", etc. Note that a Stop Reason does not necessarily imply that the condition is no longer occurring. - * Condition source codes are typically ICD-9-CM, Read or ICD-10 diagnosis codes from medical claims or discharge status/visit diagnosis codes from EHRs. - * Presently, there is no designated vocabulary, domain, or class that represents condition status. The following concepts from SNOMED are recommended: - * Admitting diagnosis: 4203942 - * Final diagnosis: 4230359 (should also be used for discharge diagnosis) - * Preliminary diagnosis: 4033240 +No.|Convention Description +:--------|:------------------------------------ +| 1 | Valid Condition Concepts belong to the 'Condition' domain. +| 2 | Condition records are typically inferred from diagnostic codes recorded in the source data. Such code systems, like ICD-9-CM, ICD-10-CM, Read etc., provide a comprehensive coverage of conditions. However, if the diagnostic code in the source does not define a condition, but rather an observation or a procedure, then such information is not stored in the CONDITION_OCCURRENCE table, but in the respective tables indicated by the domain. +| 3 | Source Condition identifiers are mapped to Standard Concepts for Conditions in the Standardized Vocabularies. When the source code cannot be translated into a Standard Concept, a CONDITION_OCCURRENCE entry is stored with only the corresponding SOURCE_CONCEPT_ID and SOURCE_VALUE, while the CONDITION_CONCEPT_ID is set to 0. +| 4 | Family history and past diagnoses ('history of') are not recorded in the CONDITION_OCCURRENCE table. Instead, they are listed in the OBSERVATION table. +| 5 | Codes written in the process of establishing the diagnosis, such as 'question of' of and 'rule out', are not represented here. Instead, they are listed in the OBSERVATION table, if they are used for analyses. +| 6 | A Condition Occurrence Type is assigned based on the data source and type of condition attribute, for example:
  • ICD-9-CM Primary Diagnosis from inpatient and outpatient claims
  • ICD-9-CM Secondary Diagnoses from inpatient and outpatient claims
  • Diagnoses or problems recorded in an EHR.
| +| 7 | Valid Condition Occurrence Type Concepts belong to the 'Condition Type' vocabulary in the 'Type Concept' domain. +| 8 | The Stop Reason indicates why a Condition is no longer valid with respect to the purpose within the source data. Typical values include 'Discharged', 'Resolved', etc. Note that a Stop Reason does not necessarily imply that the condition is no longer occurring. +| 9 | Condition source codes are typically ICD-9-CM, Read or ICD-10-CM diagnosis codes from medical claims or discharge status/visit diagnosis codes from EHRs. +| 10 | Presently, there is no designated vocabulary, domain, or class that represents condition status. The following concepts from SNOMED are recommended:
  • Admitting diagnosis: 4203942
  • Final diagnosis: 4230359 (should also be used for discharge diagnosis
  • Preliminary diagnosis: 4033240
| diff --git a/StandardizedClinicalDataTables/DEATH.md b/StandardizedClinicalDataTables/DEATH.md index 091329c..9167ec5 100644 --- a/StandardizedClinicalDataTables/DEATH.md +++ b/StandardizedClinicalDataTables/DEATH.md @@ -1,21 +1,21 @@ -The death domain contains the clinical event for how and when a Person dies. A person can have up to one record if the source system contains evidence about the Death, such as: +As of OMOP CDM v6.0, the DEATH table has been deprecated in favor of storing the cause of death in the CONDITION_OCCURRENCE table, any observations relating to death stored in the OBSERVATION table, and a singular death date will be chosen and stored in the PERSON table. + +The 'Death' domain contains the clinical events surrounding how and when a Person dies. A Person can have information in the source system containing evidence about the Death, such as: * Condition Code in the Header or Detail information of claims * Status of enrollment into a health plan * Explicit record in EHR data -Field|Required|Type|Description -:-------------------------|:--------|:-----|:---------------------------------------------- -|person_id|Yes|integer|A foreign key identifier to the deceased person. The demographic details of that person are stored in the person table.| -|death_date |Yes|date|The date the person was deceased. If the precise date including day or month is not known or not allowed, December is used as the default month, and the last day of the month the default day.| -|death_datetime |No|datetime|The date and time the person was deceased. If the precise date including day or month is not known or not allowed, December is used as the default month, and the last day of the month the default day.| -|death_type_concept_id|Yes|integer|A foreign key referring to the predefined concept identifier in the Standardized Vocabularies reflecting how the death was represented in the source data.| -|cause_concept_id|No|integer|A foreign key referring to a standard concept identifier in the Standardized Vocabularies for conditions.| -|cause_source_value|No|varchar(50)|The source code for the cause of death as it appears in the source data. This code is mapped to a standard concept in the Standardized Vocabularies and the original code is, stored here for reference.| -|cause_source_concept_id|No|integer|A foreign key to the concept that refers to the code used in the source. Note, this variable name is abbreviated to ensure it will be allowable across database platforms.| - ### Conventions - * Living patients should not contain any information in the DEATH table. - * Each Person may have more than one record of death in the source data. It is the task of the ETL to pick the most plausible or most accurate records to be aggregated and stored as a single record in the DEATH table. - * If the Death Date cannot be precisely determined from the data, the best approximation should be used. - * Valid Concepts for the cause_concept_id have domain_id='Condition'. \ No newline at end of file + +No.|Convention Description +:--------|:------------------------------------ +| 1 | Living patients should not have a value in PERSON.DEATH_DATETIME, nor should they have any records relating to death either in the CONDITION_OCCURRENCE or OBSERVATION tables +| 2 | Only one death date per individual can be used. If a patient has clinical activity (e.g. prescriptions filled, labs performed, etc) more than 60+ days after death you may want to drop the death record as it may have been falsely reported. If multiple records of death exist on multiple days you may select the death that you deem most reliable (e.g. death at discharge) or select the latest death date ([THEMIS issue #6](https://github.com/OHDSI/Themis/issues/6)). +| 3 | If multiple death records occur, the date and the person have to be the same, but the cause can be different. Can be reported by different sources as well ([THEMIS issue #5](https://github.com/OHDSI/Themis/issues/5)). +| 4 | If PERSON.DEATH_DATETIME cannot be precisely determined from the data, the best approximation should be used. +| 5 | Any cause of death should be stored in the CONDITION_OCCURRENCE table, using the CONDITION_TYPE vocabulary with the DEATH_TYPE concept class. +| 6 | All observations relating to death should be stored in the OBSERVATION table, including the concept [4306655](http://athena.ohdsi.org/search-terms/terms/4306655). +| 7 | The DEATH_DATETIME in the PERSON table should not be used as the way to find all deaths
  • `select * from PERSON where death_datetime is not null` should not be the practice
  • Rather, deaths should be found through the OBSERVATION table and the PERSON table is only used to determine which death date should be used in analysis
+ + \ No newline at end of file diff --git a/StandardizedClinicalDataTables/DEVICE_EXPOSURE.md b/StandardizedClinicalDataTables/DEVICE_EXPOSURE.md index fdfd3a6..efb2fad 100644 --- a/StandardizedClinicalDataTables/DEVICE_EXPOSURE.md +++ b/StandardizedClinicalDataTables/DEVICE_EXPOSURE.md @@ -1,29 +1,33 @@ -The device exposure domain captures information about a person's exposure to a foreign physical object or instrument that which is used for diagnostic or therapeutic purposes through a mechanism beyond chemical action. Devices include implantable objects (e.g. pacemakers, stents, artificial joints), medical equipment and supplies (e.g. bandages, crutches, syringes), other instruments used in medical procedures (e.g. sutures, defibrillators) and material used in clinical care (e.g. adhesives, body material, dental material, surgical material). +The 'Device' domain captures information about a person's exposure to a foreign physical object or instrument which is used for diagnostic or therapeutic purposes through a mechanism beyond chemical action. Devices include implantable objects (e.g. pacemakers, stents, artificial joints), medical equipment and supplies (e.g. bandages, crutches, syringes), other instruments used in medical procedures (e.g. sutures, defibrillators) and material used in clinical care (e.g. adhesives, body material, dental material, surgical material). Field|Required|Type|Description :--------------------------------|:--------|:------------|:-------------------------------------------- -|device_exposure_id|Yes|integer|A system-generated unique identifier for each Device Exposure.| -|person_id|Yes|integer|A foreign key identifier to the Person who is subjected to the Device. The demographic details of that person are stored in the Person table.| -|device_concept_id|Yes|integer|A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Device concept.| -|device_exposure_start_date|Yes|date|The date the Device or supply was applied or used.| -|device_exposure_start_datetime|No|datetime|The date and time the Device or supply was applied or used.| -|device_exposure_end_date|No|date|The date the Device or supply was removed from use.| -|device_exposure_end_datetime|No|datetime|The date and time the Device or supply was removed from use.| -|device_type_concept_id|Yes|integer|A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Device Exposure recorded. It indicates how the Device Exposure was represented in the source data.| -|unique_device_id |No|varchar(50)|A UDI or equivalent identifying the instance of the Device used in the Person.| -|quantity|No|integer|The number of individual Devices used for the exposure.| -|provider_id|No|integer|A foreign key to the provider in the PROVIDER table who initiated of administered the Device.| -|visit_occurrence_id|No|integer|A foreign key to the visit in the VISIT_OCCURRENCE table during which the device was used.| -|visit_detail_id|No|integer|A foreign key to the visit detail in the VISIT_DETAIL table during which the Drug Exposure was initiated.| -|device_source_value|No|varchar(50)|The source code for the Device as it appears in the source data. This code is mapped to a standard Device Concept in the Standardized Vocabularies and the original code is stored here for reference.| -|device_source_concept_id|No|integer|A foreign key to a Device Concept that refers to the code used in the source.| +| device_exposure_id | Yes | bigint | A system-generated unique identifier for each Device Exposure. | +| person_id | Yes | bigint | A foreign key identifier to the Person who is subjected to the Device. The demographic details of that Person are stored in the PERSON table. | +| device_concept_id | Yes | integer | A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies belonging to the 'Device' domain. | +| device_exposure_start_date | No | date | The date the Device or supply was applied or used. | +| device_exposure_start_datetime| Yes | datetime | The date and time the Device or supply was applied or used. | +| device_exposure_end_date | No | date | The date use of the Device or supply was ceased. | +| device_exposure_end_datetime | No | datetime | The date and time use of the Device or supply was ceased. | +| device_type_concept_id | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Device Exposure recorded. It indicates how the Device Exposure was represented in the source data and belongs to the 'Device Type' domain.| +| unique_device_id | No | varchar(50)| A UDI or equivalent identifying the instance of the Device used in the Person. | +| quantity | No | integer | The number of individual Devices used in the exposure. | +| provider_id | No | integer | A foreign key to the provider in the PROVIDER table who initiated or administered the Device. | +| visit_occurrence_id | No | integer | A foreign key to the visit in the VISIT_OCCURRENCE table during which the Device was used. | +| visit_detail_id | No | integer | A foreign key to the visit detail record in the VISIT_DETAIL table during which the Device was used. | +| device_source_value | No | varchar(50)| The source code for the Device as it appears in the source data. This code is mapped to a Standard Device Concept in the Standardized Vocabularies and the original code is stored here for reference.| +| device_source_concept_id | Yes | integer | A foreign key to a Device Concept that refers to the code used in the source.| -### Conventions +### Conventions - * The distinction between Devices or supplies and procedures are sometimes blurry, but the former are physical objects while the latter are actions, often to apply a Device or supply. - * For medical devices that are regulated by the FDA, if a Unique Device Identification (UDI) is provided if available in the data source, and is recorded in the unique_device_id field. - * Valid Device Concepts belong to the "Device" domain. The Concepts of this domain are derived from the DI portion of a UDI or based on other source vocabularies, like HCPCS. - * A Device Type is assigned to each Device Exposure to track from what source the information was drawn or inferred. The valid domain_id for these Concepts is "Device Type". - * The Visit during which the Device was first used is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. - * The Visit Detail during which the Device was first used is recorded through a reference to the VISIT_DETAIL table. This information is not always available. - * The Provider exposing the patient to the Device is recorded through a reference to the PROVIDER table. This information is not always available. +No.|Convention Description +:--------|:------------------------------------ +| 1 | The distinction between Devices or supplies and Procedures are sometimes blurry, but the former are physical objects while the latter are actions, often to apply a Device or supply.| +| 2 | For medical devices that are regulated by the FDA, a Unique Device Identification (UDI) is provided if available in the data source and is recorded in the UNIQUE_DEVICE_ID field.| +| 3 | Valid Device Concepts belong to the 'Device' domain. The Concepts of this domain are derived from the DI portion of a UDI or based on other source vocabularies, like HCPCS.| +| 4 | A Device Type is assigned to each Device Exposure to track from what source the information was drawn or inferred. The valid vocabulary for these Concepts is 'Device Type'.| +| 5 | The Visit during which the Device was first used is recorded through a reference to the VISIT_OCCURRENCE table. | +| 6 | The Visit Detail during which the Device was first used is recorded through a reference to the VISIT_DETAIL table.| +| 7 | The Provider exposing the patient to the Device is recorded through a reference to the PROVIDER table. +| 8 | When dealing with duplicate records, the ETL must determine whether to sum them up into one record or keep them separate. Things to consider are:
  • Same Device/Procedure
  • Same DEVICE_EXPOSURE_START_DATETIME
  • Same Visit Occurrence or Visit Detail
  • Same Provider
  • Same Modifier for Procedures
  • Same COST_ID
[THEMIS issue #27](https://github.com/OHDSI/Themis/issues/27) | +| 9 | If a Device Exposure has a quantity of '0' in the source, this should default to '1' in the ETL. If there is a record in the source it can be assumed the exposure occurred at least once ([THEMIS issue #26](https://github.com/OHDSI/Themis/issues/26)). | \ No newline at end of file diff --git a/StandardizedClinicalDataTables/DRUG_EXPOSURE.md b/StandardizedClinicalDataTables/DRUG_EXPOSURE.md index ee4afbe..028f86f 100644 --- a/StandardizedClinicalDataTables/DRUG_EXPOSURE.md +++ b/StandardizedClinicalDataTables/DRUG_EXPOSURE.md @@ -1,47 +1,51 @@ -The drug exposure domain captures records about the utilization of a Drug when ingested or otherwise introduced into the body. A Drug is a biochemical substance formulated in such a way that when administered to a Person it will exert a certain physiological effect. Drugs include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as Drugs. +The 'Drug' domain captures records about the utilization of a Drug when ingested or otherwise introduced into the body. A Drug is a biochemical substance formulated in such a way that when administered to a Person it will exert a certain physiological effect. Drugs include prescription and over-the-counter medicines, vaccines, and large-molecule biologic therapies. Radiological devices ingested or applied locally do not count as Drugs. Drug Exposure is inferred from clinical events associated with orders, prescriptions written, pharmacy dispensings, procedural administrations, and other patient-reported information, for example: - * The "Prescription" section of an EHR captures prescriptions written by physicians or from electronic ordering systems - * The "Medication list" section of an EHR for both non-prescription products and medications prescribed by other providers + * The 'Prescription' section of an EHR captures prescriptions written by physicians or from electronic ordering systems + * The 'Medication list' section of an EHR for both non-prescription products and medications prescribed by other providers * Prescriptions filled at dispensing providers such as pharmacies, and then captured in reimbursement claim systems * Drugs administered as part of a Procedure, such as chemotherapy or vaccines. Field|Required|Type|Description :------------------------------|:--------|:------------|:------------------------------------------------ -|drug_exposure_id|Yes|integer|A system-generated unique identifier for each Drug utilization event.| -|person_id|Yes|integer|A foreign key identifier to the person who is subjected to the Drug. The demographic details of that person are stored in the person table.| -|drug_concept_id|Yes|integer|A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies for the Drug concept.| -|drug_exposure_start_date|Yes|date|The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.| -|drug_exposure_start_datetime|No|datetime|The start date and time for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.| -|drug_exposure_end_date|Yes|date|The end date for the current instance of Drug utilization. It is not available from all sources.| -|drug_exposure_end_datetime|No|datetime|The end date and time for the current instance of Drug utilization. It is not available from all sources.| -|verbatim_end_date|No|date|The known end date of a drug_exposure as provided by the source| -|drug_type_concept_id|Yes|integer| A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data.| -|stop_reason|No|varchar(20)|The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc.| -|refills|No|integer|The number of refills after the initial prescription. The initial prescription is not counted, values start with null.| -|quantity |No|float|The quantity of drug as recorded in the original prescription or dispensing record.| -|days_supply|No|integer|The number of days of supply of the medication as recorded in the original prescription or dispensing record.| -|sig|No|varchar(MAX)|The directions ("signetur") on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record.| -|route_concept_id|No|integer|A foreign key to a predefined concept in the Standardized Vocabularies reflecting the route of administration.| -|lot_number|No|varchar(50)|An identifier assigned to a particular quantity or lot of Drug product from the manufacturer.| -|provider_id|No|integer|A foreign key to the provider in the PROVIDER table who initiated (prescribed or administered) the Drug Exposure.| -|visit_occurrence_id|No|integer|A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Drug Exposure was initiated.| -|visit_detail_id|No|integer|A foreign key to the Visit Detail in the VISIT_DETAIL table during which the Drug Exposure was initiated.| -|drug_source_value|No|varchar(50)|The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.| -|drug_source_concept_id|No|integer|A foreign key to a Drug Concept that refers to the code used in the source.| -|route_source_value|No|varchar(50)|The information about the route of administration as detailed in the source.| -|dose_unit_source_value|No|varchar(50)|The information about the dose unit as detailed in the source.| +| drug_exposure_id | Yes | bigint | A system-generated unique identifier for each Drug utilization event. | +|person_id |Yes |bigint |A foreign key identifier to the Person who is subjected to the Drug. The demographic details of that Person are stored in the PERSON table. | +|drug_concept_id |Yes |integer |A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies belonging to the 'Drug' domain. | +|drug_exposure_start_date |No |date |The start date for the current instance of Drug utilization. Valid entries include a start date of a prescription, the date a prescription was filled, or the date on which a Drug administration procedure was recorded.| +|drug_exposure_start_datetime |Yes |datetime |The start date and time for the current instance of Drug utilization. Valid entries include a start datetime of a prescription, the date and time a prescription was filled, or the date and time on which a Drug administration procedure was recorded.| +|drug_exposure_end_date |No |date |The end date for the current instance of Drug utilization. Depending on different sources, it could be a known or an inferred date and denotes the last day at which the patient was still exposed to Drug. | +|drug_exposure_end_datetime |No |datetime |The end date and time for the current instance of Drug utilization. Depending on different sources, it could be a known or an inferred date and time and denotes the last day at which the patient was still exposed to Drug. | +|verbatim_end_date |No |date |The known end date of a drug_exposure as provided by the source. | +|drug_type_concept_id |Yes |integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of Drug Exposure recorded. It indicates how the Drug Exposure was represented in the source data and belongs to the 'Drug Type' vocabulary.| +|stop_reason |No |varchar(20)|The reason the Drug was stopped. Reasons include regimen completed, changed, removed, etc. | +|refills |No |integer |The number of refills after the initial prescription. The initial prescription is not counted, values start with null. | +|quantity |No |float |The quantity of drug as recorded in the original prescription or dispensing record. | +|days_supply |No |integer |The number of days of supply of the medication as prescribed. This reflects the intention of the provider for the length of exposure. | +|sig |No |varchar(MAX)|The directions ('signetur') on the Drug prescription as recorded in the original prescription (and printed on the container) or dispensing record. | +|route_concept_id |Yes |integer |A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies reflecting the route of administration and belonging to the 'Route' domain. | +|lot_number |No |varchar(50)|An identifier assigned to a particular quantity or lot of Drug product from the manufacturer. | +|provider_id |No |integer|A foreign key to the provider in the PROVIDER table who initiated (prescribed or administered) the Drug Exposure.| +|visit_occurrence_id |No |integer|A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Drug Exposure was initiated.| +|visit_detail_id |No |integer|A foreign key to the Visit Detail in the VISIT_DETAIL table during which the Drug Exposure was initiated.| +|drug_source_value |No |varchar(50)|The source code for the Drug as it appears in the source data. This code is mapped to a Standard Drug concept in the Standardized Vocabularies and the original code is, stored here for reference.| +|drug_source_concept_id |Yes |integer|A foreign key to a Drug Concept that refers to the code used in the source.| +|route_source_value |No |varchar(50)|The information about the route of administration as detailed in the source.| +|dose_unit_source_value |No |varchar(50)|The information about the dose unit as detailed in the source.| ### Conventions - * Valid Concepts for the drug_concept_id field belong to the "Drug" domain. Most Concepts in the Drug domain are based on RxNorm, but some may come from other sources. Concepts are members of the Clinical Drug or Pack, Branded Drug or Pack, Drug Component or Ingredient classes. - * Source drug identifiers, including NDC codes, Generic Product Identifiers, etc. are mapped to Standard Drug Concepts in the Standardized Vocabularies (e.g., based on RxNorm). When the Drug Source Value of the code cannot be translated into standard Drug Concept IDs, a Drug exposure entry is stored with only the corresponding source_concept_id and drug_source_value and a drug_concept_id of 0. - * The Drug Concept with the most detailed content of information is preferred during the mapping process. These are indicated in the concept_class_id field of the Concept and are recorded in the following order of precedence: "Branded Pack", "Clinical Pack", "Branded Drug", "Clinical Drug", "Branded Drug Component", "Clinical Drug Component", "Branded Drug Form", "Clinical Drug Form", and only if no other information is available "Ingredient". Note: If only the drug class is known, the drug_concept_id should contain 0. - * A Drug Type is assigned to each Drug Exposure to track from what source the information was drawn or inferred from. The valid concept_class_id for these Concepts is "Drug Type". - * The content of the refills field determines the current number of refills, not the number of remaining refills. For example, for a drug prescription with 2 refills, the content of this field for the 3 Drug Exposure events are null, 1 and 2. - * The route_concept_id refers to a Standard Concepts of the "Route" domain. Note: Route information can also be inferred from the Drug product itself by determining the Drug Form of the Concept, creating some partial overlap of the same type of information. Therefore, route information should be stored in standard drug concept_id (as a drug with corresponding Dose Form). The route_concept_id could be used for storing more granular forms e.g. 'Intraventricular cardiac'. - * The lot_number field contains an identifier assigned from the manufacturer of the Drug product. - * If possible, the visit in which the drug was prescribed or delivered is recorded in the visit_occurrence_id field through a reference to the visit table. - * If possible, the prescribing or administering provider (physician or nurse) is recorded in the provider_id field through a reference to the provider table. - * The drug_exposure_end_date denotes the day the drug exposure ended for the patient. This could be that the duration of drug_supply was reached (in which case drug_exposure_end_date = drug_exposure_start_date + days_supply -1), or because the exposure was stopped (medication changed, medication discontinued, etc.) +No.|Convention Description +:--------|:------------------------------------ +| 1 | Valid Concepts for the DRUG_CONCEPT_ID field belong to the 'Drug' domain. Most Concepts in the Drug domain are based on RxNorm, but some may come from other sources. Concepts are members of the Clinical Drug or Pack, Branded Drug or Pack, Drug Component or Ingredient classes. | +| 2 | Source drug identifiers, including NDC codes, Generic Product Identifiers, etc. are mapped to Standard Drug Concepts in the Standardized Vocabularies (e.g., based on RxNorm). When the Drug Source Value of the code cannot be translated into Standard Drug Concept IDs, a Drug exposure entry is stored with only the corresponding SOURCE_CONCEPT_ID and DRUG_SOURCE_VALUE and a DRUG_CONCEPT_ID of 0. +| 3 | The Drug Concept with the most detailed content of information is preferred during the mapping process. These are indicated in the CONCEPT_CLASS_ID field of the Concept and are recorded in the following order of precedence: 'Branded Pack', 'Clinical Pack', 'Branded Drug', 'Clinical Drug', 'Branded Drug Component', 'Clinical Drug Component', 'Branded Drug Form', 'Clinical Drug Form', and only if no other information is available 'Ingredient'. Note: If only the drug class is known, the DRUG_CONCEPT_ID field should contain 0. +| 4 | A Drug Type is assigned to each Drug Exposure to track from what source the information was drawn or inferred from. The valid CONCEPT_CLASS_ID for these Concepts is 'Drug Type'. | +| 5 | The content of the refills field determines the current number of refills, not the number of remaining refills. For example, for a drug prescription with 2 refills, the content of this field for the 3 Drug Exposure events are null, 1 and 2.| +| 6 | The ROUTE_CONCEPT_ID refers to a Standard Concepts of the 'Route' domain. Note: Route information can also be inferred from the Drug product itself by determining the Drug Form of the Concept, creating some partial overlap of the same type of information. Therefore, route information should be stored in DRUG_CONCEPT_ID (as a drug with corresponding Dose Form). The ROUTE_CONCEPT_ID could be used for storing more granular forms e.g. 'Intraventricular cardiac'.| +| 7 | The LOT_NUMBER field contains an identifier assigned from the manufacturer of the Drug product. | +| 8 | If possible, the visit in which the drug was prescribed or delivered is recorded in the VISIT_OCCURRENCE_ID field through a reference to the visit table.| +| 9 | If possible, the prescribing or administering provider (physician or nurse) is recorded in the PROVIDER_ID field through a reference to the provider table. +| 10 | The DRUG_EXPOSURE_END_DATE denotes the day the drug exposure ended for the patient. This could be that the duration of DRUG_SUPPLY was reached (in which case DRUG_EXPOSURE_END_DATETIME = DRUG_EXPOSURE_START_DATETIME + DAYS_SUPPLY -1 day), or because the exposure was stopped (medication changed, medication discontinued, etc.)| +| 11 | When the native data suggests a drug exposure has a days supply less than 0, drop the record as unknown if a person has received the drug or not ([THEMIS issue #24](https://github.com/OHDSI/Themis/issues/24)).| +| 12 | If a patient has multiple records on the same day for the same drug or procedures the ETL should not de-dupe them unless there is probable reason to believe the item is a true data duplicate ([THEMIS issue #14](https://github.com/OHDSI/Themis/issues/14)).| diff --git a/StandardizedClinicalDataTables/FACT_RELATIONSHIP.md b/StandardizedClinicalDataTables/FACT_RELATIONSHIP.md index 6660069..b94525b 100644 --- a/StandardizedClinicalDataTables/FACT_RELATIONSHIP.md +++ b/StandardizedClinicalDataTables/FACT_RELATIONSHIP.md @@ -1,4 +1,4 @@ -The FACT_RELATIONSHIP table contains records about the relationships between facts stored as records in any table of the CDM. Relationships can be defined between facts from the same domain (table), or different domains. Examples of Fact Relationships include: Person relationships (parent-child), care site relationships (hierarchical organizational structure of facilities within a health system), indication relationship (between drug exposures and associated conditions), usage relationships (of devices during the course of an associated procedure), or facts derived from one another (measurements derived from an associated specimen). +The FACT_RELATIONSHIP table contains records about the relationships between facts stored as records in any table of the CDM. Relationships can be defined between facts from the same domain, or different domains. Examples of Fact Relationships include: Person relationships (parent-child), care site relationships (hierarchical organizational structure of facilities within a health system), indication relationship (between drug exposures and associated conditions), usage relationships (of devices during the course of an associated procedure), or facts derived from one another (measurements derived from an associated specimen). Field|Required|Type|Description :-------------------------|:--------|:------------|:-------------------------------------------------------------- @@ -9,6 +9,7 @@ Field|Required|Type|Description |relationship_concept_id |Yes|integer|A foreign key to a Standard Concept ID of relationship in the Standardized Vocabularies.| ### Conventions - * All relationships are directional, and each relationship is represented twice symmetrically within the FACT_RELATIONSHIP table. For example, two persons if person_id = 1 is the mother of person_id = 2 two records are in the FACT_RELATIONSHIP table (all strings in fact concept_id records in the Concept table: - * Person, 1, Person, 2, parent of - * Person, 2, Person, 1, child of + +No.|Convention Description +:--------|:------------------------------------ +| 1 | All relationships are directional, and each relationship is represented twice symmetrically within the FACT_RELATIONSHIP table. For example, two persons if person_id = 1 is the mother of person_id = 2 two records are in the FACT_RELATIONSHIP table (all strings in fact concept_id records in the Concept table:
  • Person, 1, Person, 2, parent of
  • Person, 2, Person, 1, child of
| diff --git a/StandardizedClinicalDataTables/MEASUREMENT.md b/StandardizedClinicalDataTables/MEASUREMENT.md index 6a8b468..0df513e 100644 --- a/StandardizedClinicalDataTables/MEASUREMENT.md +++ b/StandardizedClinicalDataTables/MEASUREMENT.md @@ -4,37 +4,40 @@ Field|Required|Type|Description :----------------------------------|:--------|:------------|:------------------------------------------------ |measurement_id|Yes|integer|A unique identifier for each Measurement.| |person_id|Yes|integer|A foreign key identifier to the Person about whom the measurement was recorded. The demographic details of that Person are stored in the PERSON table.| -|measurement_concept_id|Yes|integer|A foreign key to the standard measurement concept identifier in the Standardized Vocabularies.| -|measurement_date|Yes|date|The date of the Measurement.| -|measurement_datetime|No|datetime|The date and time of the Measurement. Some database systems don't have a datatype of time. To accomodate all temporal analyses, datatype datetime can be used (combining measurement_date and measurement_time [forum discussion](http://forums.ohdsi.org/t/date-time-and-datetime-problem-and-the-world-of-hours-and-1day/314))| +|measurement_concept_id|Yes|integer|A foreign key to the standard measurement concept identifier in the Standardized Vocabularies. These belong to the 'Measurement' domain, but could overlap with the 'Observation' domain (see #3 below).| +|measurement_date|No|date|The date of the Measurement.| +|measurement_datetime|Yes|datetime|The date and time of the Measurement. Some database systems don't have a datatype of time. To accommodate all temporal analyses, datatype datetime can be used (combining measurement_date and measurement_time [forum discussion](http://forums.ohdsi.org/t/date-time-and-datetime-problem-and-the-world-of-hours-and-1day/314))| |measurement_time |No|varchar(10)|The time of the Measurement. This is present for backwards compatibility and will be deprecated in an upcoming version| -|measurement_type_concept_id|Yes|integer|A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the provenance from where the Measurement record was recorded.| -|operator_concept_id|No|integer|A foreign key identifier to the predefined Concept in the Standardized Vocabularies reflecting the mathematical operator that is applied to the value_as_number. Operators are <, <=, =, >=, >.| +|measurement_type_concept_id|Yes|integer|A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the provenance from where the Measurement record was recorded. These belong to the 'Meas Type' vocabulary| +|operator_concept_id|No|integer|A foreign key identifier to the predefined Concept in the Standardized Vocabularies reflecting the mathematical operator that is applied to the value_as_number. Operators are <, <=, =, >=, > and these concepts belong to the 'Meas Value Operator' domain.| |value_as_number|No|float|A Measurement result where the result is expressed as a numeric value.| -|value_as_concept_id|No|integer|A foreign key to a Measurement result represented as a Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.).| -|unit_concept_id|No|integer|A foreign key to a Standard Concept ID of Measurement Units in the Standardized Vocabularies.| +|value_as_concept_id|No|integer|A foreign key to a Measurement result represented as a Concept from the Standardized Vocabularies (e.g., positive/negative, present/absent, low/high, etc.). These belong to the 'Meas Value' domain| +|unit_concept_id|No|integer|A foreign key to a Standard Concept ID of Measurement Units in the Standardized Vocabularies that belong to the 'Unit' domain.| |range_low|No|float|The lower limit of the normal range of the Measurement result. The lower range is assumed to be of the same unit of measure as the Measurement value.| |range_high|No|float|The upper limit of the normal range of the Measurement. The upper range is assumed to be of the same unit of measure as the Measurement value.| |provider_id|No|integer|A foreign key to the provider in the PROVIDER table who was responsible for initiating or obtaining the measurement.| |visit_occurrence_id|No|integer|A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Measurement was recorded.| |visit_detail_id|No|integer|A foreign key to the Visit Detail in the VISIT_DETAIL table during which the Measurement was recorded. | |measurement_source_value|No|varchar(50)|The Measurement name as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is stored here for reference.| -|measurement_source_concept_id|No|integer|A foreign key to a Concept in the Standard Vocabularies that refers to the code used in the source.| +|measurement_source_concept_id|Yes|integer|A foreign key to a Concept in the Standard Vocabularies that refers to the code used in the source.| |unit_source_value|No|varchar(50)|The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is stored here for reference.| |value_source_value|No|varchar(50)|The source value associated with the content of the value_as_number or value_as_concept_id as stored in the source data.| ### Conventions - * Measurements differ from Observations in that they require a standardized test or some other activity to generate a quantitative or qualitative result. For example, LOINC 1755-8 concept_id 3027035 'Albumin [Mass/time] in 24 hour Urine' is the lab test to measure a certain chemical in a urine sample. - * Even though each Measurement always have a result, the fields value_as_number and value_as_concept_id are not mandatory. When the result is not known, the Measurement record represents just the fact that the corresponding Measurement was carried out, which in itself is already useful information for some use cases. - * Valid Measurement Concepts (measurement_concept_id) belong to the 'Measurement' domain, but could overlap with the 'Observation' domain. This is due to the fact that there is a continuum between systematic examination or testing (Measurement) and a simple determination of fact (Observation). When the Measurement Source Value of the code cannot be translated into a standard Measurement Concept ID, a Measurement entry is stored with only the corresponding source_concept_id and measurement_source_value and a measurement_concept_id of 0. - * Measurements are stored as attribute value pairs, with the attribute as the Measurement Concept and the value representing the result. The value can be a Concept (stored in value_as_concept), or a numerical value (value_as_number) with a Unit (unit_concept_id). - * Valid Concepts for the value_as_concept field belong to the 'Meas Value' domain. - * For some Measurement Concepts, the result is included in the test. For example, ICD10 concept_id 45595451 "Presence of alcohol in blood, level not specified" indicates a Measurement and the result (present). In those situations, the CONCEPT_RELATIONSHIP table in addition to the "Maps to" record contains a second record with the relationship_id set to "Maps to value". In this example, the "Maps to" relationship directs to 4041715 "Blood ethanol measurement" as well as a "Maps to value" record to 4181412 "Present". - * The operator_concept_id is optionally given for relative Measurements where the precise value is not available but its relation to a certain benchmarking value is. For example, this can be used for minimal detection thresholds of a test. - * The meaning of Concept 4172703 for '=' is identical to omission of a operator_concept_id value. Since the use of this field is rare, it's important when devising analyses to not to forget testing for the content of this field for values different from =. - * Valid Concepts for the operator_concept_id field belong to the 'Meas Value Operator' domain. - * The Unit is optional even if a value_as_number is provided. - * If reference ranges for upper and lower limit of normal as provided (typically by a laboratory) these are stored in the range_high and range_low fields. Ranges have the same unit as the value_as_number. - * The Visit during which the observation was made is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. - * The Provider making the observation is recorded through a reference to the PROVIDER table. This information is not always available. +No.|Convention Description +:--------|:------------------------------------ +| 1 | Measurements differ from Observations in that they require a standardized test or some other activity to generate a quantitative or qualitative result. For example, LOINC 1755-8 concept_id 3027035 'Albumin [Mass/time] in 24 hour Urine' is the lab test to measure a certain chemical in a urine sample.| +| 2 | Even though each Measurement always have a result, the fields VALUE_AS_NUMBER and VALUE_AS_CONCEPT_ID are not mandatory. When the result is not known, the Measurement record represents just the fact that the corresponding Measurement was carried out, which in itself is already useful information for some use cases.| +| 3 | Valid Measurement Concepts (MEASUREMENT_CONCEPT_ID) belong to the 'Measurement' domain, but could overlap with the 'Observation' domain. This is due to the fact that there is a continuum between systematic examination or testing (Measurement) and a simple determination of fact (Observation). When the Measurement Source Value of the code cannot be translated into a standard Measurement Concept ID, a Measurement entry is stored with only the corresponding SOURCE_CONCEPT_ID and MEASUREMENT_SOURCE_VALUE and a MEASUREMENT_CONCEPT_ID of 0.| +| 4 | Measurements are stored as attribute value pairs, with the attribute as the Measurement Concept and the value representing the result. The value can be a Concept (stored in VALUE_AS_CONCEPT), or a numerical value (VALUE_AS_NUMBER) with a Unit (UNIT_CONCEPT_ID). | +| 5 | Valid Concepts for the VALUE_AS_CONCEPT field belong to the 'Meas Value' domain. | +| 6 | For some Measurement Concepts, the result is included in the test. For example, ICD10 concept_id 45595451 'Presence of alcohol in blood, level not specified' indicates a Measurement and the result (present). In those situations, the CONCEPT_RELATIONSHIP table in addition to the 'Maps to' record contains a second record with the relationship_id set to 'Maps to value'. In this example, the 'Maps to' relationship directs to 4041715 'Blood ethanol measurement' as well as a 'Maps to value' record to 4181412 'Present'.| +| 7 | The OPERATOR_CONCEPT_ID is optionally given for relative Measurements where the precise value is not available but its relation to a certain benchmarking value is. For example, this can be used for minimal detection thresholds of a test.| +| 8 | The meaning of Concept 4172703 for '=' is identical to omission of a OPERATOR_CONCEPT_ID value. Since the use of this field is rare, it's important when devising analyses to not to forget testing for the content of this field for values different from =.| +| 9 | Valid Concepts for the OPERATOR_CONCEPT_ID field belong to the 'Meas Value Operator' domain.| +| 10 | The Unit is optional even if a VALUE_AS_NUMBER is provided.| +| 11 | If reference ranges for upper and lower limit of normal as provided (typically by a laboratory) these are stored in the RANGE_HIGH and RANGE_LOW fields. Ranges have the same unit as the VALUE_AS_NUMBER.| +| 12 | The Visit during which the observation was made is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available.| +| 13 | The Provider making the observation is recorded through a reference to the PROVIDER table. This information is not always available.| +| 14 | If there is a negative value coming from the source, set the VALUE_AS_NUMBER to NULL, with the exception of the following Measurements (listed as LOINC codes):
  • 1925-7 Base excess in Arterial blood by calculation
  • 1927-3 Base excess in Venous blood by calculation
  • 8632-2 QRS-Axis
  • 11555-0 Base excess in Blood by calculation
  • 1926-5 Base excess in Capillary blood by calculation
  • 28638-5 Base excess in Arterial cord blood by calculation
  • 28639-3 Base excess in Venous cord blood by calculation
[THEMIS issue #16](https://github.com/OHDSI/Themis/issues/16) | diff --git a/StandardizedClinicalDataTables/NOTE.md b/StandardizedClinicalDataTables/NOTE.md index e639113..baeb409 100644 --- a/StandardizedClinicalDataTables/NOTE.md +++ b/StandardizedClinicalDataTables/NOTE.md @@ -2,131 +2,45 @@ The NOTE table captures unstructured information that was recorded by a provider Field|Required|Type|Description :--------------------|:--------|:------------|:-------------------------------------------------------- -|note_id |Yes|integer|A unique identifier for each note.| -|person_id |Yes|integer|A foreign key identifier to the Person about whom the Note was recorded. The demographic details of that Person are stored in the PERSON table.| -|note_date |Yes|date|The date the note was recorded.| -|note_datetime |No|datetime|The date and time the note was recorded.| -|note_type_concept_id |Yes|integer|A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the type, origin or provenance of the Note.| -|note_class_concept_id |Yes| integer| A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the HL7 LOINC Document Type Vocabulary classification of the note.| -|note_title |No| varchar(250)| The title of the Note as it appears in the source.| -|note_text |Yes|varchar(MAX)|The content of the Note.| -|encoding_concept_id |Yes |integer| A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type| -|language_concept_id |Yes |integer |A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the language of the note| -|provider_id |No|integer|A foreign key to the Provider in the PROVIDER table who took the Note.| -|visit_occurrence_id |No|integer|A foreign key to the Visit in the VISIT_OCCURRENCE table when the Note was taken.| -|visit_detail_id |No|integer|A foreign key to the Visit in the VISIT_DETAIL table when the Note was taken.| -|note_source_value |No|varchar(50)|The source value associated with the origin of the Note| +|note_id |Yes|integer|A unique identifier for each note.| +|person_id |Yes|integer|A foreign key identifier to the Person about whom the Note was recorded. The demographic details of that Person are stored in the PERSON table.| +|note_event_id |No |integer|A foreign key identifier to the event (e.g. Measurement, Procedure, Visit, Drug Exposure, etc) record during which the note was recorded.| +|note_event_field_concept_id |No|integer|A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the field to which the note_event_id is referring. | +|note_date |No|date|The date the note was recorded.| +|note_datetime |Yes|datetime|The date and time the note was recorded.| +|note_type_concept_id |Yes|integer|A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the type, origin or provenance of the Note. These belong to the 'Note Type' vocabulary| +|note_class_concept_id |Yes| integer| A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the HL7 LOINC Document Type Vocabulary classification of the note.| +|note_title |No| varchar(250)| The title of the Note as it appears in the source.| +|note_text |Yes|varchar(MAX)|The content of the Note.| +|encoding_concept_id |Yes |integer| A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the note character encoding type| +|language_concept_id |Yes |integer |A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the language of the note| +|provider_id |No|integer|A foreign key to the Provider in the PROVIDER table who took the Note.| +|visit_occurrence_id |No|integer|A foreign key to the Visit in the VISIT_OCCURRENCE table when the Note was taken.| +|visit_detail_id |No|integer|A foreign key to the Visit in the VISIT_DETAIL table when the Note was taken.| +|note_source_value |No|varchar(50)|The source value associated with the origin of the Note| ### Conventions - * The NOTE table contains free text (in ASCII, or preferably in UTF8 format) taken by a healthcare Provider. - * The Visit during which the note was written is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. - * The Provider making the note is recorded through a reference to the PROVIDER table. This information is not always available. - * The type of note_text is CLOB or varchar(MAX) depending on RDBMS - * note_class_concept_id is a foreign key to the CONCEPT table to describe a standardized combination of five LOINC axes (role, domain, setting, type of service, and document kind). See below for description. + +No.|Convention Description +:--------|:------------------------------------ +| 1 | The NOTE table contains free text (in ASCII, or preferably in UTF8 format) taken by a healthcare Provider.| +| 2 | The Visit during which the note was written is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available.| +| 3 | The Provider making the note is recorded through a reference to the PROVIDER table. This information is not always available.| +| 4 | The type of note_text is CLOB or varchar(MAX) depending on RDBMS| +| 5 | NOTE_CLASS_CONCEPT_ID is a foreign key to the CONCEPT table to describe a standardized combination of five LOINC axes (role, domain, setting, type of service, and document kind). See below for description.| ### Mapping of clinical documents to Clinical Document Ontology (CDO) and standard terminology HL7/LOINC CDO is a standard for consistent naming of documents to support a range of use cases: retrieval, organization, display, and exchange. It guides the creation of LOINC codes for clinical notes. CDO annotates each document with 5 dimensions: -* **Kind of Document:** Characterizes the generalc structure of the document at a macro level (e.g. Anesthesia Consent) +* **Kind of Document:** Characterizes the general structure of the document at a macro level (e.g. Anesthesia Consent) * **Type of Service**: Characterizes the kind of service or activity (e.g. evaluations, consultations, and summaries). The notion of time sequence, e.g., at the beginning (admission) at the end (discharge) is subsumed in this axis. Example: Discharge Teaching. * **Setting:** Setting is an extension of CMS�s definitions (e.g. Inpatient, Outpatient) * **Subject Matter Domain (SMD):** Characterizes the subject matter domain of a note (e.g. Anesthesiology) * **Role:** Characterizes the training or professional level of the author of the document, but does not break down to specialty or subspecialty (e.g. Physician) -Each combination of these 5 dimensions should roll up to a unique LOINC code. For example, Dentistry Hygienist Outpatient Progress note (LOINC code 34127-1) has the following dimensions: +Each combination of these 5 dimensions rolls up to a unique LOINC code. * According to CDO requirements, only 2 of the 5 dimensions are required to properly annotate a document: Kind of Document and any one of the other 4 dimensions. -* However, not all the permutations of the CDO dimensions will necessarily yield an existing LOINC code.2 HL7/LOINC workforce is committed to establish new LOINC codes for each new encountered combination of CDO dimensions. 3 - -Automation of mapping of clinical notes to a standard terminology based on the note title is possible when it is driven by ontology (aka CDO). Mapping to individual LOINC codes which may or may not exist for a particular note type cannot be fully automated. To support mapping of clinical notes to CDO in OMOP CDM, we propose the following approach: - -#### 1. Add all LOINC concepts representing 5 CDO dimensions to the Concept table. For example: - -Field | Record 1 | Record 2 -:-- | :-- | :-- -concept_id | 55443322132 | 55443322175 -concept_name | Administrative note | Against medical advice note -concept_code | LP173418-7 | LP173388-2 -vocabulary_id | LOINC | LOINC - -#### 2. Represent CDO hierarchy in the Concept_Relationship table using the �Subsumes� � �Is a� relationship pair. For example: - -Field | Record 1 | Record 2 -:-- | :-- | :-- -concept_id_1 | 55443322132 | 55443322175 -concept_id_2 | 55443322175 | 55443322132 -relationship_id | Subsumes | Is a - -#### 3. Add LOINC document codes to the Concept table (e.g. Dentistry Hygienist Outpatient Progress note, LOINC code 34127-1). For example: - -Field | Record 1 | Record 2 -:-- | :-- | :-- -concept_id | 193240 | 193241 -concept_name | Dentistry Hygienist Outpatient Progress note | Consult note -concept_code | 34127-1 | 11488-4 -vocabulary_id | LOINC | LOINC - -#### 4. Represent dimensions of each document concept in Concept_Relationship table by its relationships to the respective concepts from CDO. - -* Use the �Member Of� � �Has Member� (new) relationship pair. -* Using example from the Dentistry Hygienist Outpatient Progress note (LOINC code 34127-1): - -concept_id_1 | concept_id_1 | relationship_id -:-- | :-- | :-- -193240 | 55443322132 | Member Of -55443322132 | 193240 | Has Member -193240 | 55443322175 | Member Of -55443322175 | 193240 | Has Member -193240 | 55443322166 | Member Of -55443322166 | 193240 | Has Member -193240 | 55443322107 | Member Of -55443322107 | 193240 | Has Member -193240 | 55443322146 | Member Of -55443322146 | 193240 | Has Member - -Where concept codes represent the following concepts: - -Content | Description -:---------- | :-------------------------------------------------------------------- -193240 | Corresponds to LOINC 34127-1, Dentistry Hygienist Outpatient Progress note -55443322132 | Corresponds to LOINC LP173418-7, Kind of Document = Note -55443322175 | Corresponds to LOINC LP173213-2, Type of Service = Progress -55443322166 | Corresponds to LOINC LP173051-6, Setting = Outpatient -55443322107 | Corresponds to LOINC LP172934-4, Subject Matter Domain �= Dentistry -55443322146 | Corresponds to LOINC LP173071-4, Role = Hygienist - -Most of the codes will not have all 5 dimensions. Therefore, they may be represented by 2-5 relationship pairs. - -#### 5. If LOINC does not have a code corresponding to a permutation of the 5 CDO encountered in the source, this code will be generated as OMOP vocabulary code. - -* Its relationships to the CDO dimensions will be represented exactly as those of existing LOINC concepts (as described above). If/when a proper LOINC code for this permutation is released, the old code should be deprecated. Transition between the old and new codes should be represented by �Concept replaces� � �Concept replaced by� pairs. - -#### 6. Mapping from the source data will be performed to the 2-5 CDO dimensions. - -Query below finds LOINC code for Dentistry Hygienist Outpatient Progress note (see example above) that has all 5 dimensions: - -```sql - SELECT - FROM Concept_Relationship - WHERE relationship_id = �Has Member� AND - (concept_id_1 = 55443322132 - OR concept_id_1 = 55443322175 - OR concept_id_1 = 55443322166 - OR concept_id_1 = 55443322107 - OR concept_id_1 = 55443322146) - GROUP BY concept_ID_2 -``` - -If less than 5 dimensions are available, `HAVING COUNT(n)` clause should be added to get a unique record at the intersection of these dimensions. n is the number of dimensions available: - -```sql - SELECT - FROM Concept_Relationship - WHERE relationship_id = �Has Member� AND - (concept_id_1 = 55443322132 - OR concept_id_1 = 55443322175 - OR concept_id_1 = 55443322146) - GROUP BY concept_ID_2 - HAVING COUNT(*) = 3 -``` \ No newline at end of file +* However, not all the permutations of the CDO dimensions will necessarily yield an existing LOINC code.2 HL7/LOINC workforce is committed to establish new LOINC codes for each new encountered combination of CDO dimensions. +* The full document ontology as it exists in the Vocabulary is too extensive to list here, but it is possible to explore through the ATHENA tool starting with the ['LOINC Document Ontology - Type of Service and Kind of Document'](http://athena.ohdsi.org/search-terms/terms/36209248) by walking through the 'Is a'/'Subsumes' relationship hierarchies. diff --git a/StandardizedClinicalDataTables/NOTE_NLP.md b/StandardizedClinicalDataTables/NOTE_NLP.md index 7722b51..9b198f2 100644 --- a/StandardizedClinicalDataTables/NOTE_NLP.md +++ b/StandardizedClinicalDataTables/NOTE_NLP.md @@ -4,12 +4,12 @@ Field | Required | Type | Description :------------------------------- | :-------- | :------------ | :--------------------------------------------------- |note_nlp_id | Yes | integer | A unique identifier for each term extracted from a note.| |note_id | Yes | integer | A foreign key to the Note table note the term was |extracted from.| -|section_concept_id | No | integer | A foreign key to the predefined Concept in the Standardized |Vocabularies representing the section of the extracted term.| +|section_concept_id | Yes | integer | A foreign key to the predefined Concept in the Standardized Vocabularies representing the section of the extracted term.| |snippet | No | varchar(250) | A small window of text surrounding the term.| -|offset | No | varchar(50) | Character offset of the extracted term in the |input note.| +|offset | No | varchar(50) | Character offset of the extracted term in the input note.| |lexical_variant | Yes | varchar(250) | Raw text extracted from the NLP tool.| -|note_nlp_concept_id | No | integer | A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the normalized concept for the extracted term. Domain of the term is represented as part of the Concept table.| -|note_nlp_source_concept_id | No | integer | A foreign key to a Concept that refers to the code in the source vocabulary used by the NLP system| +|note_nlp_concept_id | Yes | integer | A foreign key to the predefined Concept in the Standardized Vocabularies reflecting the normalized concept for the extracted term. Domain of the term is represented as part of the Concept table.| +|note_nlp_source_concept_id | Yes | integer | A foreign key to a Concept that refers to the code in the source vocabulary used by the NLP system| |nlp_system | No | varchar(250) | Name and version of the NLP system that extracted the term.Useful for data provenance.| |nlp_date | Yes | date | The date of the note processing.Useful for data provenance.| |nlp_datetime | No | datetime | The date and time of the note processing. Useful for data provenance.| @@ -19,32 +19,8 @@ Field | Required | Type | Description ### Conventions -**Term_exists** -Term_exists is defined as a flag that indicates if the patient actually has or had the condition. Any of the following modifiers would make Term_exists false: - -* Negation = true -* Subject = [anything other than the patient] -* Conditional = true -* Rule_out = true -* Uncertain = very low certainty or any lower certainties - -A complete lack of modifiers would make Term_exists true. - -For the modifiers that are there, they would have to have these values: - -* Negation = false -* Subject = patient -* Conditional = false -* Rule_out = false -* Uncertain = true or high or moderate or even low (could argue about low) - -**Term_temporal** -Term_temporal is to indicate if a condition is “present” or just in the “past”. - -The following would be past: - -* History = true -* Concept_date = anything before the time of the report - -**Term_modifiers** -Term_modifiers will concatenate all modifiers for different types of entities (conditions, drugs, labs etc) into one string. Lab values will be saved as one of the modifiers. A list of allowable modifiers (e.g., signature for medications) and their possible values will be standardized later. +No.|Convention Description +:--------|:------------------------------------ +| 1 | Term_exists is defined as a flag that indicates if the patient actually has or had the condition. Any of the following modifiers would make Term_exists false:
  • Negation = true
  • Subject = [anything other than the patient]
  • Conditional = true/li>
  • Rule_out = true
  • Uncertain = very low certainty or any lower certainties
  • A complete lack of modifiers would make Term_exists true.

For the modifiers that are there, they would have to have these values:
  • Negation = false
  • Subject = patient
  • Conditional = false
  • Rule_out = false
  • Uncertain = true or high or moderate or even low (could argue about low)
| +| 2 | Term_temporal is to indicate if a condition is “present” or just in the “past”. The following would be past:
  • History = true
  • Concept_date = anything before the time of the report
| +| 3 | Term_modifiers will concatenate all modifiers for different types of entities (conditions, drugs, labs etc) into one string. Lab values will be saved as one of the modifiers. A list of allowable modifiers (e.g., signature for medications) and their possible values will be standardized later. | diff --git a/StandardizedClinicalDataTables/OBSERVATION.md b/StandardizedClinicalDataTables/OBSERVATION.md index 45c4d15..7eddbee 100644 --- a/StandardizedClinicalDataTables/OBSERVATION.md +++ b/StandardizedClinicalDataTables/OBSERVATION.md @@ -5,8 +5,8 @@ Field|Required|Type|Description |observation_id |Yes|integer|A unique identifier for each observation.| |person_id |Yes|integer|A foreign key identifier to the Person about whom the observation was recorded. The demographic details of that Person are stored in the PERSON table.| |observation_concept_id |Yes|integer|A foreign key to the standard observation concept identifier in the Standardized Vocabularies.| -|observation_date|Yes|date|The date of the observation.| -|observation_datetime|No|datetime|The date and time of the observation.| +|observation_date|No|date|The date of the observation.| +|observation_datetime|Yes|datetime|The date and time of the observation.| |observation_type_concept_id|Yes|integer|A foreign key to the predefined concept identifier in the Standardized Vocabularies reflecting the type of the observation.| |value_as_number|No|float|The observation result stored as a number. This is applicable to observations where the result is expressed as a numeric value.| |value_as_string|No|varchar(60)|The observation result stored as a string. This is applicable to observations where the result is expressed as verbatim text.| @@ -17,20 +17,31 @@ Field|Required|Type|Description |visit_occurrence_id|No|integer|A foreign key to the visit in the VISIT_OCCURRENCE table during which the observation was recorded.| |visit_detail_id|No|integer|A foreign key to the visit in the VISIT_DETAIL table during which the observation was recorded.| |observation_source_value|No|varchar(50)|The observation code as it appears in the source data. This code is mapped to a Standard Concept in the Standardized Vocabularies and the original code is, stored here for reference.| -|observation_source_concept_id|No|integer|A foreign key to a Concept that refers to the code used in the source.| +|observation_source_concept_id|Yes|integer|A foreign key to a Concept that refers to the code used in the source.| |unit_source_value|No|varchar(50)|The source code for the unit as it appears in the source data. This code is mapped to a standard unit concept in the Standardized Vocabularies and the original code is, stored here for reference.| |qualifier_source_value|No|varchar(50)|The source value associated with a qualifier to characterize the observation| +|observation_event_id| No | integer| A foreign key to an event table (e.g., PROCEDURE_OCCURRENCE_ID). | +|obs_event_field_concept_id| Yes | integer| A foreign key that refers to a Standard Concept identifier in the Standardized Vocabularies referring to the field represented in the OBSERVATION_EVENT_ID. | +|value_as_datetime| No | integer| The observation result stored as a datetime value. This is applicable to observations where the result is expressed as a point in time.| ### Conventions - * Observations differ from Measurements in that they do not require a standardized test or some other activity to generate clinical fact. Typical observations are medical history, family history, the stated need for certain treatment, social circumstances, lifestyle choices, healthcare utilization patterns, etc. If the generation clinical facts requires a standardized testing such as lab testing or imaging and leads to a standardized result, the data item is recorded in the MEASUREMENT table. If the clinical fact observed determines a sign, symptom, diagnosis of a disease or other medical condition, it is recorded in the CONDITION_OCCURRENCE table. - * Valid Observation Concepts are not enforced to be from any domain. They still should be Standard Concepts, and they typically belong to the "Observation" or sometimes "Measurement" domain. - * Observation can be stored as attribute value pairs, with the attribute as the Observation Concept and the value representing the clinical fact. This fact can be a Concept (stored in value_as_concept), a numerical value (value_as_number) or a verbatim string (value_as_string). Even though Observations do not have an explicit result, the clinical fact can be stated separately from the type of Observation in the value_as_ fields. - * It is recommended for observations that are suggestive statements of positive assertion should have a value of "Yes" (concept_id=4188539), recorded, even though the null value is the equivalent. - * Valid Concepts of the value_as_concept field are not enforced, but typically belong to the "Meas Value" domain. - * For numerical facts a Unit can be provided in the unit_concept_id. - * For facts represented as Concepts no domain membership is enforced. - * Note that the value of value_as_concept_id may be provided through mapping from a source Concept which contains the content of the Observation. In those situations, the CONCEPT_RELATIONSHIP table in addition to the "Maps to" record contains a second record with the relationship_id set to "Maps to value". For example, ICD9CM V17.5 concept_id 44828510 "Family history of asthma" has a "Maps to" relationship to 4167217 "Family history of clinical finding" as well as a "Maps to value" record to 317009 "Asthma". - * The qualifier_concept_id field contains all attributes specifying the clinical fact further, such as as degrees, severities, drug-drug interaction alerts etc. - * The Visit during which the observation was made is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. - * The Provider making the observation is recorded through a reference to the PROVIDER table. This information is not always available. +No.|Convention Description +:--------|:------------------------------------ +| 1 | Observations differ from Measurements in that they do not require a standardized test or some other activity to generate clinical fact. Typical observations are medical history, family history, the stated need for certain treatment, social circumstances, lifestyle choices, healthcare utilization patterns, etc. If the generation clinical facts requires a standardized testing such as lab testing or imaging and leads to a standardized result, the data item is recorded in the MEASUREMENT table. If the clinical fact observed determines a sign, symptom, diagnosis of a disease or other medical condition, it is recorded in the CONDITION_OCCURRENCE table. | +| 2 | Valid Observation Concepts are not enforced to be from any domain. They still should be Standard Concepts, and they typically belong to the 'Observation' or sometimes 'Measurement' domain. | +| 3 | Observations can be stored as attribute value pairs, with the attribute as the Observation Concept and the value representing the clinical fact. This fact can be a Concept (stored in VALUE_AS_CONCEPT), a numerical value (VALUE_AS_NUMBER), a verbatim string (VALUE_AS_STRING), or a datetime (VALUE_AS_DATETIME). Even though Observations do not have an explicit result, the clinical fact can be stated separately from the type of Observation in the VALUE_AS_* fields. +| 4 | It is recommended for Observations that are suggestive statements of positive assertion should have a value of 'Yes' (concept_id=4188539), recorded, even though the null value is the equivalent. | +| 5 | Valid Concepts of the VALUE_AS_CONCEPT field are not enforced, but typically belong to the 'Meas Value' domain.| +| 6 | For numerical facts a Unit can be provided in the UNIT_CONCEPT_ID.| +| 7 | For facts represented as Concepts no domain membership is enforced.| +| 8 | Note that the value of VALUE_AS_CONCEPT_ID may be provided through mapping from a source Concept which contains the content of the Observation. In those situations, the CONCEPT_RELATIONSHIP table in addition to the 'Maps to' record contains a second record with the relationship_id set to 'Maps to value'. For example, ICD9CM V17.5 concept_id 44828510 'Family history of asthma' has a 'Maps to' relationship to 4167217 'Family history of clinical finding' as well as a 'Maps to value' record to 317009 'Asthma'. | +| 9 | The QUALIFIER_CONCEPT_ID field contains all attributes specifying the clinical fact further, such as as degrees, severities, drug-drug interaction alerts etc. | +| 10 | The Visit during which the Observation was made is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available.| +| 11 | The Visit Detail during which the Observation was made is recorded through a reference to the VISIT_DETAIL table. This information is not always available.| +| 12 | The Provider making the observation is recorded through a reference to the PROVIDER table. This information is not always available. | +| 13 | When storing patient responses to survey questions, each record in the OBSERVATION table represents a single question/response pair and is linked to a specific survey/questionnaire using OBSERVATION.OBSERVATION_EVENT_ID and SURVEY_CONDUCT.SURVEY_CONDUCT_ID. | +| 14 | Each survey response record is the response to a specific question identified by the OBSERVATION_CONCEPT_ID. This concept ID is a unique question contained in the CONCEPT table. | +| 15 | An individual survey question can have multiple responses to a question (e.g. which of these items relate to you, a,b,c,...?). Each response is stored as a separate record in the OBSERVATION table.| +| 16 | The question / answer OBSERVATION record is linked to the patient questionnaire used for collecting the data using two new fields in the OBSERVATION table; OBS_EVENT_FIELD_CONCEPT_ID and OBSERVATION_EVENT_ID.
  • OBS_EVENT_FIELD_CONCEPT_ID for any survey related observations contains the concept that refers to the field SURVEY_CONDUCT_ID and OBSERVATION_EVENT_ID contains the actual SURVEY_CONDUCT_ID of the specific survey
  • This construct can be used for other observation groupings
| +| 17 | The OBSERVATION table can also store survey scoring results. Many validated PRO questionnaires have scoring algorithms (many of which proprietary) that return an overall patient score based on the answers provided.
  • Survey scores are identified by their OBSERVATION_CONCEPT_ID and are linked back to the scored survey using the same EVENT_FIELD construct described above. In the name/value pair model, the name (question) is stored as OBSERVATION_CONCEPT_ID and the value (answer) is stored as OBSERVATION_AS_CONCEPT_ID where the answer is categorical and is defined as a concept in the concept table, OBSERVATION_AS_NUMBER where the answer is numeric, OBSERVATION_AS_STRING where the answer is a free text string or OBSERVATION_AS_DATETIME.
| diff --git a/StandardizedClinicalDataTables/OBSERVATION_PERIOD.md b/StandardizedClinicalDataTables/OBSERVATION_PERIOD.md index 72a6520..d0442f8 100644 --- a/StandardizedClinicalDataTables/OBSERVATION_PERIOD.md +++ b/StandardizedClinicalDataTables/OBSERVATION_PERIOD.md @@ -6,14 +6,17 @@ Field|Required|Type|Description |person_id|Yes|integer|A foreign key identifier to the person for whom the observation period is defined. The demographic details of that person are stored in the person table.| |observation_period_start_date|Yes|date|The start date of the observation period for which data are available from the data source.| |observation_period_end_date|Yes|date|The end date of the observation period for which data are available from the data source.| -|period_type_concept_id|Yes|Integer|A foreign key identifier to the predefined concept in the Standardized Vocabularies reflecting the source of the observation period information| +|period_type_concept_id|Yes|Integer|A foreign key identifier to the predefined concept in the Standardized Vocabularies reflecting the source of the observation period information, belonging to the 'Obs Period Type' vocabulary| ### Conventions - * One Person may have one or more disjoint observation periods, during which times analyses may assume that clinical events would be captured if observed, and outside of which no clinical events may be recorded. - * Each Person can have more than one valid OBSERVATION_PERIOD record, but no two observation periods can overlap in time for a given person. - * As a general assumption, during an Observation Period any clinical event that happens to the patient is expected to be recorded. Conversely, the absence of data indicates that no clinical events occurred to the patient. - * Both the _start_date and the _end_date of the clinical event has to be between observation_period_start_date and observation_period_end_date. - * No clinical data are valid outside an active Observation Period. Clinical data that refer to a time outside (diagnoses of previous conditions such as "Old MI" or medical history) of an active Observation Period are recorded as Observations. The date of the Observation is the first day of the first Observation Period of a patient. - * For claims data, observation periods are inferred from the enrollment periods to a health benefit plan. - * For EHR data, the observation period cannot be determined explicitly, because patients usually do not announce their departure from a certain healthcare provider. The ETL will have to apply some heuristic to make a reasonable guess on what the observation_period should be. Refer to the ETL documentation for details. +No.|Convention Description +:--------|:------------------------------------ +| 1 | Each Person has to have at least one observation period.| +| 2 | One Person may have one or more disjoint observation periods, during which times analyses may assume that clinical events would be captured if observed| +| 3 | Each Person can have more than one valid OBSERVATION_PERIOD record, but no two observation periods can overlap in time for a given person.| +| 4 | As a general assumption, during an Observation Period any clinical event that happens to the patient is expected to be recorded. Conversely, the absence of data indicates that no clinical events occurred to the patient. +| 5 | Both the _START_DATE and the _END_DATE of the clinical event has to be between observation_period_start_date and observation_period_end_date. | +| 6 | Events CAN fall outside of an observation period though they should fall in a valid payer plan period, such as Medicare Part D, which can overlap an observation period. However, time outside of an observation period cannot be used to identify people. To ensure quality, events outside of an observation period should not be used for analysis. [THEMIS issue #23](https://github.com/OHDSI/Themis/issues/23) | +| 7 | For claims data, observation periods are inferred from the enrollment periods to a health benefit plan.| +| 8 | For EHR data, the observation period cannot be determined explicitly, because patients usually do not announce their departure from a certain healthcare provider. The ETL will have to apply some heuristic to make a reasonable guess on what the observation_period should be. Refer to the ETL documentation for details. | diff --git a/StandardizedClinicalDataTables/PERSON.md b/StandardizedClinicalDataTables/PERSON.md index 1fb76f8..537c179 100644 --- a/StandardizedClinicalDataTables/PERSON.md +++ b/StandardizedClinicalDataTables/PERSON.md @@ -8,25 +8,36 @@ Field|Required|Type|Description |month_of_birth|No|integer|The month of birth of the person. For data sources that provide the precise date of birth, the month is extracted and stored in this field.| |day_of_birth|No|integer|The day of the month of birth of the person. For data sources that provide the precise date of birth, the day is extracted and stored in this field.| |birth_datetime|No|datetime|The date and time of birth of the person.| -|race_concept_id|Yes|integer|A foreign key that refers to an identifier in the CONCEPT table for the unique race of the person.| -|ethnicity_concept_id|Yes|integer|A foreign key that refers to the standard concept identifier in the Standardized Vocabularies for the ethnicity of the person.| +|death_datetime|No|datetime|The date and time of death of the person.| +|race_concept_id|Yes|integer|A foreign key that refers to an identifier in the CONCEPT table for the unique race of the person, belonging to the 'Race' vocabulary.| +|ethnicity_concept_id|Yes|integer|A foreign key that refers to the standard concept identifier in the Standardized Vocabularies for the ethnicity of the person, belonging to the 'Ethnicity' vocabulary.| |location_id|No|integer|A foreign key to the place of residency for the person in the location table, where the detailed address information is stored.| |provider_id|No|integer|A foreign key to the primary care provider the person is seeing in the provider table.| |care_site_id|No|integer|A foreign key to the site of primary care in the care_site table, where the details of the care site are stored.| |person_source_value|No|varchar(50)|An (encrypted) key derived from the person identifier in the source data. This is necessary when a use case requires a link back to the person data at the source dataset.| |gender_source_value|No|varchar(50)|The source code for the gender of the person as it appears in the source data. The person’s gender is mapped to a standard gender concept in the Standardized Vocabularies; the original value is stored here for reference.| -|gender_source_concept_id|No|Integer|A foreign key to the gender concept that refers to the code used in the source.| +|gender_source_concept_id|Yes|Integer|A foreign key to the gender concept that refers to the code used in the source.| |race_source_value|No|varchar(50)|The source code for the race of the person as it appears in the source data. The person race is mapped to a standard race concept in the Standardized Vocabularies and the original value is stored here for reference.| -|race_source_concept_id|No|Integer|A foreign key to the race concept that refers to the code used in the source.| +|race_source_concept_id|Yes|Integer|A foreign key to the race concept that refers to the code used in the source.| |ethnicity_source_value|No|varchar(50)|The source code for the ethnicity of the person as it appears in the source data. The person ethnicity is mapped to a standard ethnicity concept in the Standardized Vocabularies and the original code is, stored here for reference.| -|ethnicity_source_concept_id|No|Integer|A foreign key to the ethnicity concept that refers to the code used in the source.| +|ethnicity_source_concept_id|Yes|Integer|A foreign key to the ethnicity concept that refers to the code used in the source.| ### Conventions - * All tables representing patient-related Domains have a foreign-key reference to the person_id field in the PERSON table. - * Each person record has associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation. For example, the location or gender is expected to have a unique value per person, even though in life these data may change over time. - * Valid Gender, Race and Ethnicity Concepts each belong to their own Domain. - * Ethnicity in the OMOP CDM follows the OMB Standards for Data on Race and Ethnicity: Only distinctions between Hispanics and Non-Hispanics are made. - * Additional information is stored through references to other tables, such as the home address (location_id) or the primary care provider. - * The Provider refers to the primary care provider (General Practitioner). - * The Care Site refers to where the Provider typically provides the primary care. \ No newline at end of file +No.|Convention Description +:--------|:------------------------------------ +| 1 | All tables representing patient-related Domains have a foreign-key reference to the person_id field in the PERSON table.| +| 2 | Each person record has associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation. For example, the location or gender is expected to have a unique value per person, even though in life these data may change over time. +| 3 | The GENDER_CONCEPT_ID should store what is believed to be the biological or sex assigned at birth. If the data set does have gender identification information, this should be stored in the OBSERVATION table (using the gender concepts 8532-Female or 8507-Male in OBSERVATION_CONCEPT_ID)[THEMIS issue #32](https://github.com/OHDSI/Themis/issues/32).| +| 4 | If we do not know the month or day of birth, we do not guess. A person can exist without a month or day of birth. If a person lacks a birth year that person should be dropped([THEMIS issue #30](https://github.com/OHDSI/Themis/issues/30)).| +| 5 | Living patients should not have a value in PERSON.DEATH_DATETIME, nor should they have any records relating to death either in the CONDITION_OCCURRENCE or OBSERVATION tables +| 6 | Only one death date per individual can be used. If a patient has clinical activity (e.g. prescriptions filled, labs performed, etc) more than 60+ days after death you may want to drop the death record as it may have been falsely reported. If multiple records of death exist on multiple days you may select the death that you deem most reliable (e.g. death at discharge) or select the latest death date. +| 7 | If multiple death records occur, the date and the person have to be the same, but the cause can be different. Can be reported by different sources as well. +| 8 | If PERSON.DEATH_DATETIME cannot be precisely determined from the data, the best approximation should be used. +| 9 | The DEATH_DATETIME in the PERSON table should not be used as the way to find all deaths
  • `select * from PERSON where death_datetime is not null` should not be the practice
  • Rather, deaths should be found through the OBSERVATION table and the PERSON table is only used to determine which death date should be used in analysis
+| 10 | Valid Gender, Race and Ethnicity Concepts each belong to their own Domain. +| 11 | Ethnicity in the OMOP CDM follows the OMB Standards for Data on Race and Ethnicity: Only distinctions between Hispanics and Non-Hispanics are made. +| 12 | Additional information is stored through references to other tables, such as the home address (location_id) or the primary care provider. +| 13 | The Provider refers to the primary care provider (General Practitioner). When the primary provider is unknown for a person then leave the PROVIDER_ID blank ([THEMIS issue #36](https://github.com/OHDSI/Themis/issues/36)). +| 14 | The Care Site refers to where the Provider typically provides the primary care. When care site for the primary provider is unknown then leave the CARE_SITE_ID blank. +| 15 | It is not required that all subjects from the raw data be carried over to the CDM, in fact removing people that are not of high enough quality may help researchers using the CDM. Example scenarios to remove subjects include: a person’s year of birth or age are unreasonable (e.g. born in year 0, 1800, 2999 or just lacking a year of birth), person lacks health benefits in claims database (i.e. thus you do not have a complete picture of their record), or raw data states that the person may not be of high research quality (e.g. CPRD will actually suggest which people not to use within research). Removal of a patient is not required and should be made in consideration of the raw data source. Reasons for removal of persons should be documented in the ETL documentation and METADATA table (insert row in METADATA where metadata.name='count of removed persons' and metada.value_as_string='xyz' where xyz is a number (e.g., 12).
An ETL should not delete persons who contribute time however have no health care utilization (e.g. an individual enrolled in insurance but does not visit a doctor or pharmacy). This individual will contribute to analysis however as a healthy / non-care seeking individual ([THEMIS issue #9](https://github.com/OHDSI/Themis/issues/9)).| diff --git a/StandardizedClinicalDataTables/PROCEDURE_OCCURRENCE.md b/StandardizedClinicalDataTables/PROCEDURE_OCCURRENCE.md index a306882..b99fdd4 100644 --- a/StandardizedClinicalDataTables/PROCEDURE_OCCURRENCE.md +++ b/StandardizedClinicalDataTables/PROCEDURE_OCCURRENCE.md @@ -8,25 +8,29 @@ Field|Required|Type|Description |procedure_occurrence_id|Yes|integer|A system-generated unique identifier for each Procedure Occurrence.| |person_id|Yes|integer|A foreign key identifier to the Person who is subjected to the Procedure. The demographic details of that Person are stored in the PERSON table.| |procedure_concept_id|Yes|integer|A foreign key that refers to a standard procedure Concept identifier in the Standardized Vocabularies.| -|procedure_date|Yes|date|The date on which the Procedure was performed.| -|procedure_datetime|No|datetime|The date and time on which the Procedure was performed.| -|procedure_type_concept_id|Yes|integer|A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the procedure record is derived.| -|modifier_concept_id|No|integer|A foreign key to a Standard Concept identifier for a modifier to the Procedure (e.g. bilateral)| +|procedure_date|No|date|The date on which the Procedure was performed.| +|procedure_datetime|Yes|datetime|The date and time on which the Procedure was performed.| +|procedure_type_concept_id|Yes|integer|A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the type of source data from which the procedure record is derived, belonging to the 'Procedure Type' vocabulary.| +|modifier_concept_id|Yes|integer|A foreign key to a Standard Concept identifier for a modifier to the Procedure (e.g. bilateral). These concepts are typically distinguished by 'Modifier' concept classes (e.g., 'CPT4 Modifier' as part of the 'CPT4' vocabulary).| |quantity|No|integer|The quantity of procedures ordered or administered.| |provider_id|No|integer|A foreign key to the provider in the PROVIDER table who was responsible for carrying out the procedure.| |visit_occurrence_id|No|integer|A foreign key to the Visit in the VISIT_OCCURRENCE table during which the Procedure was carried out.| |visit_detail_id|No|integer|A foreign key to the Visit Detail in the VISIT_DETAIL table during which the Procedure was carried out.| |procedure_source_value|No|varchar(50)|The source code for the Procedure as it appears in the source data. This code is mapped to a standard procedure Concept in the Standardized Vocabularies and the original code is, stored here for reference. Procedure source codes are typically ICD-9-Proc, CPT-4, HCPCS or OPCS-4 codes.| -|procedure_source_concept_id|No|integer|A foreign key to a Procedure Concept that refers to the code used in the source.| +|procedure_source_concept_id|Yes|integer|A foreign key to a Procedure Concept that refers to the code used in the source.| |modifier_source_value|No|varchar(50)|The source code for the qualifier as it appears in the source data.| ### Conventions - * Valid Procedure Concepts belong to the "Procedure" domain. Procedure Concepts are based on a variety of vocabularies: SNOMED-CT, ICD-9-Proc, CPT-4, HCPCS and OPCS-4, but also atypical Vocabularies such as ICD-9-CM or MedDRA. - * Procedures are expected to be carried out within one day and therefore have no end date. - * Procedures could involve the application of a drug, in which case the procedural component is recorded in the procedure table and simultaneously the administered drug in the drug exposure table when both the procedural component and drug are identifiable. - * If the quantity value is omitted, a single procedure is assumed. - * The Procedure Type defines from where the Procedure Occurrence is drawn or inferred. For administrative claims records the type indicates whether a Procedure was primary or secondary and their relative positioning within a claim. - * The Visit during which the procedure was performed is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. - * The Visit Detail during with the procedure was performed is recorded through a reference to the VISIT_DETAIL table. This information is not always available. - * The Provider carrying out the procedure is recorded through a reference to the PROVIDER table. This information is not always available. +No.|Convention Description +:--------|:------------------------------------ +| 1 | Valid Procedure Concepts belong to the 'Procedure' domain. Procedure Concepts are based on a variety of vocabularies: SNOMED-CT, ICD-9-Proc, CPT-4, HCPCS and OPCS-4, but also atypical Vocabularies such as ICD-9-CM or MedDRA. +| 2 | Procedures are expected to be carried out within one day and therefore have no end date. +| 3 | Procedures could involve the application of a drug, in which case the procedural component is recorded in the procedure table and simultaneously the administered drug in the drug exposure table when both the procedural component and drug are identifiable. +| 4 | If the quantity value is omitted, a single procedure is assumed. +| 5 | The Procedure Type defines from where the Procedure Occurrence is drawn or inferred. For administrative claims records the type indicates whether a Procedure was primary or secondary and their relative positioning within a claim. +| 6 | The Visit during which the procedure was performed is recorded through a reference to the VISIT_OCCURRENCE table. This information is not always available. +| 7 | The Visit Detail during with the procedure was performed is recorded through a reference to the VISIT_DETAIL table. This information is not always available. +| 8 | The Provider carrying out the procedure is recorded through a reference to the PROVIDER table. This information is not always available. +| 9 | When dealing with duplicate records, the ETL must determine whether to sum them up into one record or keep them separate. Things to consider are:
  • Same Procedure
  • Same PROCEDURE_DATETIME
  • Same Visit Occurrence or Visit Detail
  • Same Provider
  • Same Modifier for Procedures
  • Same COST_ID
[THEMIS issue #27](https://github.com/OHDSI/Themis/issues/27) | +| 10 | If a Procedure has a quantity of '0' in the source, this should default to '1' in the ETL. If there is a record in the source it can be assumed the exposure occurred at least once ([THEMIS issue #26](https://github.com/OHDSI/Themis/issues/26)).| \ No newline at end of file diff --git a/StandardizedClinicalDataTables/SPECIMEN.md b/StandardizedClinicalDataTables/SPECIMEN.md index 95ef507..b65e3d7 100644 --- a/StandardizedClinicalDataTables/SPECIMEN.md +++ b/StandardizedClinicalDataTables/SPECIMEN.md @@ -6,12 +6,12 @@ Field|Required|Type|Description |person_id|Yes|integer|A foreign key identifier to the Person for whom the Specimen is recorded.| |specimen_concept_id|Yes|integer|A foreign key referring to a Standard Concept identifier in the Standardized Vocabularies for the Specimen.| |specimen_type_concept_id|Yes|integer|A foreign key referring to the Concept identifier in the Standardized Vocabularies reflecting the system of record from which the Specimen was represented in the source data.| -|specimen_date|Yes|date|The date the specimen was obtained from the Person.| -|specimen_datetime|No|datetime|The date and time on the date when the Specimen was obtained from the person.| +|specimen_date|No|date|The date the specimen was obtained from the Person.| +|specimen_datetime|Yes|datetime|The date and time on the date when the Specimen was obtained from the person.| |quantity|No|float|The amount of specimen collection from the person during the sampling procedure.| |unit_concept_id|No|integer|A foreign key to a Standard Concept identifier for the Unit associated with the numeric quantity of the Specimen collection.| -|anatomic_site_concept_id|No|integer|A foreign key to a Standard Concept identifier for the anatomic location of specimen collection.| -|disease_status_concept_id|No|integer|A foreign key to a Standard Concept identifier for the Disease Status of specimen collection.| +|anatomic_site_concept_id|Yes|integer|A foreign key to a Standard Concept identifier for the anatomic location of specimen collection.| +|disease_status_concept_id|Yes|integer|A foreign key to a Standard Concept identifier for the Disease Status of specimen collection.| |specimen_source_id|No|varchar(50)|The Specimen identifier as it appears in the source data.| |specimen_source_value|No|varchar(50)|The Specimen value as it appears in the source data. This value is mapped to a Standard Concept in the Standardized Vocabularies and the original code is, stored here for reference.| |unit_source_value|No|varchar(50)|The information about the Unit as detailed in the source.| @@ -19,4 +19,7 @@ Field|Required|Type|Description |disease_status_source_value|No|varchar(50)|The information about the disease status as detailed in the source.| ### Conventions - * Anatomic site is coded at the most specific level of granularity possible, such that higher level classifications can be derived using the Standardized Vocabularies. \ No newline at end of file + +No.|Convention Description +:--------|:------------------------------------ +| 1 | Anatomic site is coded at the most specific level of granularity possible, such that higher level classifications can be derived using the Standardized Vocabularies. \ No newline at end of file diff --git a/StandardizedClinicalDataTables/SURVEY_CONDUCT.md b/StandardizedClinicalDataTables/SURVEY_CONDUCT.md new file mode 100644 index 0000000..722b265 --- /dev/null +++ b/StandardizedClinicalDataTables/SURVEY_CONDUCT.md @@ -0,0 +1,40 @@ +# SURVEY_CONDUCT + +The SURVEY_CONDUCT table is used to store an instance of a completed survey or questionnaire. It captures details of the individual questionnaire such as who completed it, when it was completed and to which patient treatment or visit it relates to (if any). Each SURVEY has a SURVEY_CONCEPT_ID, a concept in the CONCEPT table identifying the questionnaire e.g. EQ5D, VR12, SF12. Each questionnaire should exist in the CONCEPT table. Each SURVEY can be optionally related to a specific patient visit in order to link it both to the visit during which it was completed and any subsequent visit where treatment was assigned based on the patient's responses. + +Field | Required | Type | Description +:----------------|:-----------------|:------------|:-----------------------------------| +SURVEY_CONDUCT_ID | Yes | integer | Unique identifier for each completed survey. +PERSON_ID | Yes | integer | A foreign key identifier to the Person in the PERSON table about whom the survey was completed. +SURVEY_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the name and identity of the survey. +SURVEY_START_DATE | No | date | Date on which the survey was started. +SURVEY_START_DATETIME | No | datetime | Date and time the survey was started. +SURVEY_END_DATE | Yes | date | Date on which the survey was completed. +SURVEY_END_DATETIME | No | datetime | Date and time the survey was completed. +PROVIDER_ID | No  | integer  | A foreign key to the provider in the provider table who was associated with the survey completion. +ASSISTED_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies indicating whether the survey was completed with assistance. +RESPONDENT_TYPE_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the respondent type. Example: Research Associate, Patient. +TIMING_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies that refers to a certain timing. Example: 3 month follow-up, 6 month follow-up. +COLLECTION_METHOD_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the data collection method (e.g. Paper, Telephone, Electronic Questionnaire). +ASSISTED_SOURCE_VALUE | No | varchar(50) | Source value representing whether patient required assistance to complete the survey. Example: “Completed without assistance”, ”Completed with assistance”. +RESPONDENT_TYPE_SOURCE_VALUE | No| varchar(100) | Source code representing role of person who completed the survey. +TIMING_SOURCE_VALUE | No | varchar(100) | Text string representing the timing of the survey. Example: Baseline, 6-month follow-up. +COLLECTION_METHOD_SOURCE_VALUE | No | varchar(100) | The collection method as it appears in the source data. +SURVEY_SOURCE_VALUE | No | varchar(100) | The survey name/title as it appears in the source data. +SURVEY_SOURCE_CONCEPT_ID |Yes| integer | A foreign key to a predefined Concept that refers to the code for the survey name/title used in the source. +SURVEY_SOURCE_IDENTIFIER | No | varchar(100) | Unique identifier for each completed survey in source system. +VALIDATED_SURVEY_CONCEPT_ID | Yes | integer | A foreign key to the predefined Concept identifier in the Standardized Vocabularies reflecting the validation status of the survey. +VALIDATED_SURVEY_SOURCE_VALUE | No | integer | Source value representing the validation status of the survey. +SURVEY_VERSION_NUMBER | No | varchar(20) | Version number of the questionnaire or survey used. +VISIT_OCCURRENCE_ID | No | integer | A foreign key to the VISIT_OCCURRENCE table during which the survey was completed +RESPONSE_VISIT_OCCURRENCE_ID | No | integer  | A foreign key to the visit in the VISIT_OCCURRENCE table during which treatment was carried out that relates to this survey. + +### Conventions + +No.|Convention Description +:--------|:------------------------------------ +| 1 | Patient responses to survey questions are stored in the OBSERVATION table. Each record in the OBSERVATION table represents a single question/response pair and is linked to a specific SURVEY/questionnaire using OBSERVATION.DOMAIN_OCCURRENCE_ID and SURVEY.SURVEY_OCCURRENCE_ID. +| 2 | Each response record is the response to a specific question identified by the OBSERVATION_CONCEPT_ID. This concept ID is a unique question contained in the CONCEPT table. +| 3 | An individual survey question can have multiple responses to a question (e.g. which of these items relate to you, a, b, c ,…?). Each response is stored as a separate record in the OBSERVATION table.