Added TableofContents.md

This commit is contained in:
clairblacketer 2017-06-14 11:14:15 -04:00
parent 7dfa389682
commit 5d65dde09c
5 changed files with 140 additions and 3 deletions

View File

@ -2,13 +2,15 @@
<br>*Authors: Christian Reich, Patrick Ryan, Rimma Belenkaya, Karthik Natarajan, Clair Blacketer*
<br>***Release date needed***
[Back to Table of Contents](TableofContents.md)
[Back to Table of Contents](Documentation/TableofContents.md)
---
# 1 Background
[1.1 The Role of the Common Data Model](TheRoleoftheCommonDataModel.md)
[1.2 Design Principles](DesignPrinciples.md)
[1.3 Data Model Conventions](DataModelConventions.md)
The Observational Medical Outcomes Partnership (OMOP) was a public-private partnership established to inform the appropriate use of observational healthcare databases for studying the effects of medical products. Over the course of the 5-year project and through its community of researchers from industry, government, and academia, OMOP successfully achieved its aims to:

View File

@ -0,0 +1,92 @@
*OMOP Common Data Model v5.1 Specifications*
<br>*Authors: Christian Reich, Patrick Ryan, Rimma Belenkaya, Karthik Natarajan, Clair Blacketer*
<br>***Release date needed***
[Back to Table of Contents](Documentation/TableofContents.md)
<br>[Back to Background](Background.md)
---
# 1.3 Data Model Conventions
There are a number of implicit and explicit conventions that have been adopted in the CDM. Developers of methods that run methods against the CDM need to understand these conventions.
## General conventions of data tables
The CDM is platform-independent. Data types are defined generically using ANSI SQL data types (VARCHAR, INTEGER, FLOAT, DATE, TIME, CLOB). Precision is provided only for VARCHAR. It reflects the minimal required string length and can be expanded within a CDM instantiation. The CDM does not prescribe the date and time format. Standard queries against CDM may vary for local instantiations and date/time configurations.
In most cases, the first field in each table ends in "_id", containing a record identifier that can be used as a foreign key in another table.
## General conventions of fields
Variable names across all tables follow one convention:
^Notation^Description^
|<entity>_SOURCE_VALUE|Verbatim information from the source data, typically used in ETL to map to CONCEPT_ID, and not to be used by any standard analytics. For example, condition_source_value = 787.02 was the ICD-9 code captured as a diagnosis from the administrative claim|
|<entity>_ID|Unique identifiers for key entities, which can serve as foreign keys to establish relationships across entities For example, person_id uniquely identifies each individual. visit_occurrence_id uniquely identifies a PERSON encounter at a point of care.|
|<entity>_CONCEPT_ID|Foreign key into the Standardized Vocabularies (i.e. the standard_concept attribute for the corresponding term is true), which serves as the primary basis for all standardized analytics For example, condition_concept_id = 31967 contains reference value for SNOMED concept of Nausea|
|<entity>_SOURCE_CONCEPT_ID|Foreign key into the Standardized Vocabularies representing the concept and terminology used in the source data, when applicable For example, condition_source_concept_id = 35708202 denotes the concept of Nausea in the MedDRA terminology; the analogous condition_concept_id might be 31967, since SNOMED-CT is the Standardized Vocabularies for most clinical diagnoses and findings.|
|<entity>_TYPE_CONCEPT_ID|Delineates the origin of the source information, standardized within the Standardized Vocabularies For example, drug_type_concept_id can allow analysts to discriminate between Pharmacy dispensing and Prescription written|
## Representation of content through Concepts
In CDM data tables the meaning of the content of each record is represented using Concepts. Concepts are stored with their concept_id as foreign keys to the CONCEPT table in the Standardized Vocabularies, which contains Concepts necessary to describe the healthcare experience of a patient. If a Standard Concept does not exist or cannot be identified, the Concept with the concept_id 0 is used, representing a non-existing or unmappable concept.
Records in the CONCEPT table contain all the detailed information about it (name, relationships, types etc.). Concepts, Concept Relationships and other information relating to Concepts contained in the tables of the Standardized Vocabularies..
## Difference between Concept IDs and Source Values
Many tables contain equivalent information multiple times: As a Source Value, a Source Concept and as a Standard Concept.
* Source Values contains the codes from public code systems such as ICD-9-CM, NDC, CPT-4 etc. or local controlled vocabularies (such as F for female and M for male) copied from the source data. Source Values are stored in the _source_value field in the data tables.
* Concepts are CDM-specific entities that represent the meaning of a clinical fact. Most concepts are based on code systems used in healthcare (called Source Concepts), while others were created de-novo (concept_code = "OMOP generated"). Concepts have unique IDs across all domains.
* Source Concepts are the concepts that represent the code used in the source. Source Concepts are only used for common healthcare code systems, but not for OMOP-generated Concepts. Source Concepts are stored in the source_concept_id field in the data tables.
* Standard Concepts are those concepts that are used to define the unique meaning of a clinical entity. For each entity there is one Standard Concept. Standard Concepts are typically drawn from existing public vocabulary sources. Concepts that have the equivalent meaning to a Standard Concept are mapped to the Standard Concept. Standard Concepts are referred to in the concept_id field of the data tables.
Source Values are only provided for convenience and quality assurance (QA) purposes. Source Values and Source Concepts are optional, while Standard Concepts are mandatory. Source Values may contain information that is only meaningful in the context of a specific data source.
## Difference between general Concepts and Type Concepts
Type Concepts (ending in _type_concept_id) and general Concepts (ending in _concept_id) are part of many tables. The former are special Concepts with the purpose of indicating where the data are derived from in the source. For example, the Type Concept field can be used to distinguish a DRUG_EXPOSURE record that is derived from a pharmacy-dispensing claim from one indicative of a prescription written in an electronic health record (EHR).
## Time span of available data
Data tables for clinical data contain a date stamp (ending in _date, _start_date or _end_date), indicating when that clinical event occurred. As a rule, no record can be outside of a valid OBSERVATION_PERIOD time period. Clinical information that relates to events happened prior the first OBSERVATION_PERIOD, it will be captured as a record in the OBSERVATION table of 'Medical history' (concept_id = 43054928), with the observation_date set to the first observation_period_start_date of that patient, and the value_as_concept_id set to the corresponding concept_id for the condition/drug/procedure that occurred in the past. No data occurring after the last observation_period_end_date can be valid records in the CDM.
## Content of each table
For the tables of the main domains of the CDM it is imperative that used concepts are strictly limited to the domain. For example, the CONDITION_OCCURRENCE table contains only information about conditions (diagnoses, signs, symptoms), but no information about procedures. Not all source coding schemes adhere to such rules. For example, ICD-9-CM codes, which contain mostly diagnoses of human disease, also contain information about the status of patients having received a procedure: V25.5 "Encounter for insertion of implantable subdermal contraceptive" defines a procedure and is therefore stored in the PROCEDURE_OCCURRENCE table.
## Differentiating between source values, source concept ids, and standard concept ids
Each table contains fields for source values, source concept ids, and standard concept ids.
* Source values are fields to maintain the verbatim information from the source database, are stored as unstructured text, and are generally not to be used by any standardized analytics.
* Source concept ids provide a repeatable representation of the source concept, when the source data are drawn from a commonly-used internationally-recognized vocabulary that has been distributed with the OMOP Common Data Model. Specific use cases where source vocabulary-specific analytics are required can be accommodated by the use of the source concept id fields, but these are generally not applicable across disparate data sources. The standard concept id fields are **strongly suggested** to be used in all standardized analytics, as specific vocabularies have been established within each data domain to facilitate standardization of both structure and content within the OMOP Common Data Model.
The following provide conventions for processing source data using these three fields in each domain:
When processing data where the source value is either free text or a reference to a coding scheme that is not contained within the Standardized Vocabularies:
- Map all source values directly to standard concept_ids. Store these mappings in the SOURCE_TO_CONCEPT_MAP table.
- If the source code is not mappable to a vocabulary term, the source_concept_id field is set to 0
When processing your data where source value is a reference to a coding scheme contained within the Standardized Vocabularies:
- Map all your source values to the corresponding concept_ids in the source vocabulary. Store the result in the source_concept_id field.
- If the source code follows the same formatting as the distributed vocabulary, the mapping can be directly obtained from the CONCEPT table using the CONCEPT_CODE field.
- If the source code uses alternative formatting (ex. format has removed decimal point from ICD-9 codes), you will need to perform the formatting transformation within the ETL. In this case, you may wish to store the mappings from original codes to source concept ids in the SOURCE_TO_CONCEPT_MAP table.
- If the source code is not mappable to a vocabulary term, the source_concept_id field is set to 0
- Use the CONCEPT_RELATIONSHIP table to identify the standard concept_id that corresponds to the source_concept_id in the domain.
- Each source_concept_id can have 1 or more Standard concept_id mapped to it. Each Standard concept_id belongs to only one primary domain, but when a source concept_id maps to multiple standard concept_ids, it is possible for that source_concept_id to result in records being produced across multiple domains. For example, HCPCS code for infusion of a drug will map to a concept in the procedure domain of the infusion and a different concept in the drug domain for the product infused. It is also possible for one source_concept_id to map to multiple standard concept_ids within the same domain. For example, ICD-9 for viral hepatitis with hepatic coma maps to SNOMED viral hepatitis and a different concept for hepatic coma in which case multiple condition_occurrence records will be generated for the one source value record.
- If the source_concept_id is not mappable to any standard concept_id, the concept_id field is set to 0.
- Write the data record into table(s) corresponding to the domain of the standard concept_id(s).
- If the source value is mapped to source_concept_id, but the source_concept_id is not mapped to a standard concept_id, then the domain for the data record, and hence it's table location, is determined by the domain_id field of the CONCEPT record the source_concept_id refers to. The standard concept_id is set to 0.
- If the source value cannot be mapped to a source_concept_id or standard concept_id, then direct the data record to the most appropriate CDM domain based on your local knowledge of the intent of the source data and associated value. For example, if the unmappable source_value came from a diagnosis table, then in the absence of other information, you may choose to record that fact in the CONDITION_OCCURRENCE table.
Each standard concept_id field has a set of allowable concept_id values. The allowable values are defined by the domain of the concepts. For example, there is a domain concept of Gender, for which there are only two allowable standard concepts of practical use (8507- Male, 8532- Female) and one allowable generic concept to represent a standard notion of no information (concept_id = 0).
There is no constraint on allowed concept_ids within the source_concept_id fields.
## Custom source_to_concept_maps
When the source data uses coding systems that are not currently in the Standardized Vocabularies (e.g. ICPC codes for diagnoses), the convention is to store the mapping of such source codes to Standard Concepts in the SOURCE_TO_CONCEPT_MAP table. The codes used in the data source can be recorded in the source_value fields, but no source_concept_id will be available.
Custom source codes are not allowed to map to Standard Concepts that are marked as invalid.

View File

@ -2,7 +2,7 @@
<br>*Authors: Christian Reich, Patrick Ryan, Rimma Belenkaya, Karthik Natarajan, Clair Blacketer*
<br>***Release date needed***
[Back to Table of Contents](TableofContents.md)
[Back to Table of Contents](Documentation/TableofContents.md)
<br>[Back to Background](Background.md)
---
@ -15,7 +15,7 @@ Therefore, the CDM is designed to store observational data to allow for research
- **Suitability for purpose.** The CDM aims at providing data organized in a way optimal for analysis, rather than for the purpose of operational needs of health care providers or payers.
- **Data protection.** All data that might jeopardize the identity and protection of patients, such as names, precise birthdays etc. are limited. Exceptions are possible where the research expressly requires more detailed information, such as precise birth dates for the study of infants.
- **Design of domains.** The domains are modeled in a person-centric relational data model, where for each record the identity of the person and a date is captured as a minimum.
- **Rationale for domains. ** Domains are identified and separately defined in an Entity-relationship model if they have an analysis use case and the domain has specific attributes that are not otherwise applicable. All other data can be preserved as an observation in an entity-attribute-value structure.
- **Rationale for domains.** Domains are identified and separately defined in an Entity-relationship model if they have an analysis use case and the domain has specific attributes that are not otherwise applicable. All other data can be preserved as an observation in an entity-attribute-value structure.
- **Standardized Vocabularies.** To standardize the content of those records, the CDM relies on the Standardized Vocabularies containing all necessary and appropriate corresponding standard healthcare concepts.
- **Reuse of existing vocabularies.** If possible, these concepts are leveraged from national or industry standardization or vocabulary definition organizations or initiatives, such as the National Library of Medicine, the Department of Veterans' Affairs, the Center of Disease Control and Prevention, etc.
- **Maintaining source codes.** Even though all codes are mapped to the Standardized Vocabularies, the model also stores the original source code to ensure no information is lost.

View File

@ -0,0 +1,31 @@
*OMOP Common Data Model v5.1 Specifications*
<br>*Authors: Christian Reich, Patrick Ryan, Rimma Belenkaya, Karthik Natarajan, Clair Blacketer*
<br>***Release date needed***
[Back to Table of Contents](TableofContents.md)
<br>[Back to Background](Background.md)
---
# 2 Glossary of Terms
^Term^Abbr.^Description^
|Ancestor| |The higher level Concept in a hierarchical relationship. Note that ancestors and descendants can be many levels apart from each other.|
|Average Wholesale Price|AWP|The price manufacturers set for prescription drugs to be purchased at the wholesale level to pharmacies and healthcare provider.|
|Centers for Disease Control and Prevention|CDC|The Centers for Disease Control and Prevention is a United States federal agency under the Department of Health and Human Services. It works to protect public health and safety by providing information to enhance health decisions.|
|Common Data Model|CDM|The CDM intends to facilitate observational analyses of disparate healthcare databases. The CDM defines table structures for each of the data entities (e.g., Persons, Visit Occurrence, Drug Exposure, Condition Occurrence, Observation, Procedure Occurrence, etc.). It includes observational data elements that are relevant to identifying exposure to various treatments and defining condition occurrence. The CDM includes both the Standardized Vocabularies of terms and the entity domain tables.|
|Concept| |A concept is the basic unit of information. Concepts may be grouped into a given domain. A concept is a unique term that has a unique and static identifier/name, belongs to a domain, and may exist in relation to other concepts. The vertical relationships consist of "is a" statements that form a logical hierarchy. In general, concepts above a given concept are referred to as ancestors and those below as descendants.|
|Conceptual Data Model| |A conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature. Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).|
|Data mapping| |It is the data element mappings between two distinct data models, terminologies, or concepts. Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks.|
|Demographics| |Demographics refer to selected characteristics of persons. Demographics may include data such as race, age, sex, date of birth, location, etc.|
|Descendant| |The lower level Concept in a hierarchical relationship. Note that ancestors and descendants can be many levels apart from each other.|
|Design Principle| |An organized arrangement of one or more elements or principles for a purpose. It identifies core principles and best practices to assist developers to produce software. Thoroughly understanding the goals of stakeholders and designing systems with those goals in mind are the best approaches to successfully deliver results.|
|Electronic Health Record|EHR|Electronic health record refers to an individual person's medical record in digital format. It may be made up of electronic medical records from many locations and/or sources. The EHR is a longitudinal electronic record of person health information generated by one or more encounters in any care delivery setting. Included in this information are person demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data and radiology reports.|
|Electronic Medical Record|EMR|An electronic medical record is a computerized medical record created in an organization that delivers care, such as a hospital or outpatient setting. Electronic medical records tend to be a part of a local stand-alone health information system that allows storage, retrieval and manipulation of records. This document will reference EHR moving forward even if specific data source might internally use EMR definition.|
|Extract Transform Load|ETL|Process of getting data out of one data store (Extract), modifying it (Transform), and inserting it into a different data store (Load).|
|Health Insurance Portability and Accountability Act|HIPAA|A federal law that was designed to allow portability of health insurance between jobs. In addition, it required the creation of a federal law to protect personally identifiable health information; if that did not occur by a specific date (which it did not), HIPAA directed the Department of Health and Human Services (DHHS) to issue federal regulations with the same purpose. DHHS has issued HIPAA privacy regulations (the HIPAA Privacy Rule) as well as other regulations under HIPAA.|
|Logical Data Model| |Logical data models are graphical representation of the business requirements. They describe the things of importance to an organization and how they relate to one another, as well as business definitions and examples. The logical data model can be validated and approved by a business representative, and can be the basis of physical database design.|
|Primary Care Provider|PCP|A health care provider designated as responsible to provide general medical care to a patient, including evaluation and treatment as well as referral to specialists.|
|Protected Health Information|PHI|Protected health information under HIPAA includes any individually identifiable health information. Identifiable refers not only to data that is explicitly linked to a particular individual (that's identified information). It also includes health information with data items which reasonably could be expected to allow individual identification. De-identified information is that from which all potentially identifying information has been removed.|
|Terminology| |Technical or special terms used in a business or special subject area.|
|Vocabulary| |A computerized list (as of items of data or words) used for reference (as for information retrieval or word processing).|

View File

@ -0,0 +1,12 @@
*OMOP Common Data Model v5.1 Specifications*
<br>*Authors: Christian Reich, Patrick Ryan, Rimma Belenkaya, Karthik Natarajan, Clair Blacketer*
<br>***Release date needed***
---
# Table of Contents
[1 Background](Background/Background.md)
[1.1 The Role of the Common Data Model](TheRoleoftheCommonDataModel.md)
[1.2 Design Principles](DesignPrinciples.md)
[1.3 Data Model Conventions](DataModelConventions.md)