Merge branch 'master' of https://github.com/OHDSI/CommonDataModel.wiki
commit
88384f4455
|
@ -1 +1,75 @@
|
|||
This is where FAQs will go
|
||||
## Common Data Model
|
||||
|
||||
1. I understand that the common data model (CDM) is a way of organizing disparate data sources into the same relational database design, but how can it be effective since many databases use different coding schemes?
|
||||
|
||||
During the extract, transform, load (ETL) process of converting a data source into the OMOP common data model, we standardize the structure (e.g. tables, fields, data types), conventions (e.g. rules that govern how source data should be represented), and content (e.g. what common vocabularies are used to speak the same language across clinical domains). The common data model preserves all source data, including the original source vocabulary codes, but adds the standardized vocabularies to allow for network research across the entire OHDSI research community.
|
||||
|
||||
2. How does my data get transformed into the common data model?
|
||||
|
||||
You or someone in your organization will need to create a process to build your CDM. Don’t worry though, you are not alone! The open nature of the community means that much of the code that other participants have written to transform their own data is available for you to use. If you have a data license for a large administrative claims database like Truven Health MarketScan® or Optum’s Clinformatics® Extended Data Mart, chances are that someone has already done the legwork. Here is one example of a full builder freely available on [github](https://github.com/OHDSI/ETL-CDMBuilder) that has been written for a variety of data sources.
|
||||
|
||||
The [community forums](http://forums.ohdsi.org/) are also a great place to ask questions if you are stuck or need guidance on how to represent your data in the common data model. Members are usually very responsive!
|
||||
|
||||
3. Are any tables or fields optional?
|
||||
|
||||
It is expected that all tables will be present in a CDM though it is not a requirement that they are all populated. The two mandatory tables are:
|
||||
|
||||
* [Person](https://github.com/OHDSI/CommonDataModel/wiki/person): Contains records that uniquely identify each patient in the source data who is time at-risk to have clinical observations recorded within the source systems.
|
||||
* [Observation_period](https://github.com/OHDSI/CommonDataModel/wiki/observation_period): Contains records which uniquely define the spans of time for which a Person is at-risk to have clinical events recorded within the source systems.
|
||||
|
||||
It is then up to you which tables to populate, though the core event tables are generally agreed upon to be [Condition_occurrence](https://github.com/OHDSI/CommonDataModel/wiki/CONDITION_OCCURRENCE), [Procedure_occurrence](https://github.com/OHDSI/CommonDataModel/wiki/PROCEDURE_OCCURRENCE), [Drug_exposure](https://github.com/OHDSI/CommonDataModel/wiki/DRUG_EXPOSURE), [Measurement](https://github.com/OHDSI/CommonDataModel/wiki/MEASUREMENT), and [Observation](https://github.com/OHDSI/CommonDataModel/wiki/OBSERVATION). Each table has certain required fields, a full list of which can be found on the Common Data Model [wiki page](https://github.com/OHDSI/CommonDataModel/wiki/).
|
||||
|
||||
4. Does the data model include any derived information? Which tables or values are derived?
|
||||
|
||||
The common data model stores verbatim data from the source across various clinical domains, such as records for conditions, drugs, procedures, and measurements. In addition, to assist the analyst, the common data model also provides some derived tables, based on commonly used analytic procedures.
|
||||
For example, the [Condition_era](https://github.com/OHDSI/CommonDataModel/wiki/CONDITION_ERA) table is derived from the [Condition_occurrence](https://github.com/OHDSI/CommonDataModel/wiki/CONDITION_OCCURENCE) table and both the [Drug_era](https://github.com/OHDSI/CommonDataModel/wiki/DRUG_ERA) and [Dose_era](https://github.com/OHDSI/CommonDataModel/wiki/DOSE_ERA) tables are derived from the [Drug_exposure](https://github.com/OHDSI/CommonDataModel/wiki/DRUG_EXPOSURE) table. An era is defined as a span of time when a patient is assumed to have a given condition or exposure to a particular active ingredient. Members of the community have written code to create these tables and it is out on the [github](https://github.com/OHDSI/CommonDataModel/tree/master/CodeExcerpts/DerivedTables) if you choose to use it in your CDM build. It is important to reinforce, the analyst has the opportunity, but not the obligation, to use any of the derived tables and all of the source data is still available for direct use if the analysis calls for different assumptions.
|
||||
|
||||
5. How is age captured in the model?
|
||||
|
||||
Year_of_birth, month_of_birth, day_of_birth and birth_datetime are all fields in the Person table designed to capture some form of date of birth. While only year_of_birth is required, these fields allow for maximum flexibility over a wide range of data sources.
|
||||
|
||||
6. How are gender, race, and ethnicity captured in the model? Are they coded using values a human reader can understand?
|
||||
|
||||
Standard Concepts are used to denote all clinical entities throughout the OMOP common data model, including gender, race, and ethnicity. Source values are mapped to Standard Concepts during the extract, transform, load (ETL) process of converting a database to the OMOP Common Data Model. These are then stored in the Gender_concept_id, Race_concept_id and Ethnicity_concept_id fields in the Person table. Because the standard concepts span across all clinical domains, and in keeping with Cimino’s ‘Desiderata for Controlled Medical Vocabularies in the Twenty-First Century’, the identifiers are unique, persistent nonsematic identifiers. Gender, for example, is stored as either 8532 (female) or 8507 (male) in gender_concept_id while the original value from the source is stored in gender_source_value (M, male, F, etc)..
|
||||
|
||||
7. How is time-varying patient information such as location of residence addressed in the model?
|
||||
|
||||
The OMOP common data model has been pragmatically defined based on the desired analytic use cases of the community, as well as the available types of data that community members have access to. Currently in the model, Each each person record has associated demographic attributes which are assumed to be constant for the patient throughout the course of their periods of observation. For example, the location or primary care provider is expected to have a unique value per person, even though in life these data may change over time. Typically, the most recent information is chosen though it is up to the person performing the transformation which value to store.
|
||||
|
||||
Something like marital status is a little different as it is considered to be an observation rather than a demographic attribute. This means that it is housed in the Observation table rather than the Person table, giving the opportunity to store each change in status as a unique record.
|
||||
|
||||
If someone in the community had a use case for time-varying location of residence and also had source data that contains this information, we’d welcome participation in the CDM workgroup to evolve the model further.
|
||||
|
||||
8. How does the model denote the time period during which a Person’s information is valid?
|
||||
|
||||
The OMOP Common Data Model uses something called observation periods (stored in the [Observation_period](https://github.com/OHDSI/CommonDataModel/wiki/observation_period) table) as a way to define the time span during which a patient is at-risk to have a clinical event recorded. In administrative claims databases, for example, these observation periods are often analogous to the notion of ‘enrollment’.
|
||||
|
||||
9. How does the model capture start and stop dates for insurance coverage? What if a person’s coverage changes?
|
||||
|
||||
The [Payer_plan_period](https://github.com/OHDSI/CommonDataModel/wiki/payer_plan_period) table captures details of the period of time that a Person is continuously enrolled under a specific health Plan benefit structure from a given Payer. Payer plan periods, as opposed to observation periods, can overlap so as to demote the time when a Person is enrolled in multiple plans at the same time such as Medicare Part A and Medicare Part D.
|
||||
|
||||
10. What if I have EHR data? How would I create observation periods?
|
||||
|
||||
An observation period is considered as the time at which a patient is at-risk to have a clinical event recorded in the source system. Determining the appropriate observation period for each source data can vary, depending on what information the source contains. If a source does not provide information about a patient’s entry or exit from a system, then reasonable heuristics need to be developed and applied within the ETL.
|
||||
|
||||
## Vocabulary Mapping
|
||||
|
||||
11. Do I have to map my source codes to Standard Concepts myself? Are there vocabulary mappings that already exist for me to leverage?
|
||||
|
||||
If your data use any of the 55 source vocabularies that are currently supported, the mappings have been done for you. The full list is available from the open-source [ATHENA](http://athena.ohdsi.org/search-terms/terms) tool under the download tab (see below). You can choose to download the ten [vocabulary tables](https://github.com/OHDSI/CommonDataModel/wiki/Standardized-Vocabularies) from there as well – you will need a copy in your environment if you plan on building a CDM.
|
||||
|
||||
[!](CommonDataModel/Documentation/CommonDataModel_Wiki_Files/images/Athena_download_box.png)
|
||||
|
||||
The [ATHENA](http://athena.ohdsi.org/search-terms/terms) tool also allows you to explore the vocabulary before downloading it if you are curious about the mappings or if you have a specific code in mind and would like to know which standard concept it is associated with; just click on the search tab and type in a keyword to begin searching.
|
||||
|
||||
12. If I want to apply the mappings myself, can I do so? Are they transparent to all users?
|
||||
|
||||
Yes, all mappings are available in the [Concept_relationship](https://github.com/OHDSI/CommonDataModel/wiki/CONCEPT_RELATIONSHIP) table (which can be downloaded from [ATHENA](http://athena.ohdsi.org/search-terms/terms)). Each value in a supported source terminology is assigned a Concept_id (which is considered non-standard). Each Source_concept_id will have a mapping to a Standard_concept_id. For example:
|
||||
|
||||
[!](CommonDataModel/Documentation/CommonDataModel_Wiki_Files/images/Sepsis_to_SNOMED.png)
|
||||
|
||||
In this case the standard SNOMED concept 201826 for type 2 diabetes mellitus would be stored in the Condition_occurrence table as the Condition_concept_id and the ICD10CM concept 1567956 for type 2 diabetes mellitus would be stored as the Condition_source_concept_id.
|
||||
|
||||
13. Can RXNorm codes be stored in the model? Can I store multiple levels if I so choose? What if one collaborator uses a different level of RXNorm than I use when transforming their database?
|
||||
|
||||
In the OMOP Common Data Model RXNorm is considered the standard vocabulary for representing drug exposures. One of the great things about the Standardized Vocabulary is that the hierarchical nature of RXNorm is preserved to enable efficient querying. It is agreed upon best practice to store the lowest level RXNorm available and then use the Vocabulary to explore any pertinent relationships. Drug ingredients are the highest-level ancestors so a query for the descendants of an ingredient should turn up all drug products (Clinical Drug or Branded Drug) containing that ingredient. A query designed in this way will find drugs of interest in any CDM regardless of the level of RXNorm used.
|
||||
|
|
Loading…
Reference in New Issue