Fixing typos

This commit is contained in:
Clair Blacketer 2024-03-28 14:37:46 -04:00
parent f774a0743b
commit 22fe755ae0
11 changed files with 133 additions and 91 deletions

View File

@ -497,10 +497,13 @@ v5.3 (previously v5.3.1). Each table is represented with a high-level
description and ETL conventions that should be followed. This is
continued with a discussion of each field in each table, any conventions
related to the field, and constraints that should be followed (like
primary key, foreign key, etc). Should you have questions please feel
free to visit the <a href="https://forums.ohdsi.org/">forums</a> or the
<a href="https://github.com/ohdsi/CommonDataModel/issues">github
issue</a> page.</p>
primary key, foreign key, etc). All tables should be instantiated in a
CDM instance but do not need to be populated. Similarly, fields that are
not required should exist in the CDM table but do not need to be
populated. Should you have questions please feel free to visit the <a
href="https://forums.ohdsi.org/">forums</a> or the <a
href="https://github.com/ohdsi/CommonDataModel/issues">github issue</a>
page.</p>
<p><em><strong>Special Note</strong> This documentation previously
referenced v5.3.1. During the OHDSI/CommonDataModel Hack-A-Thon that
occurred on August 18, 2021 the decision was made to align documentation
@ -3754,12 +3757,22 @@ No
days_supply
</td>
<td style="text-align:left;">
The number of days of supply of the medication as recorded in the
original prescription or dispensing record. Days supply can differ from
actual drug duration (i.e. prescribed days supply vs actual exposure).
</td>
<td style="text-align:left;">
Days supply of the drug. This should be the verbatim days_supply as
given on the prescription. If the drug is physician administered use
duration end date if given or set to 1 as default if duration is not
available.
The field should be left empty if the source data does not contain a
verbatim days_supply, and should not be calculated from other
fields.<br><br>Negative values are not allowed. If the source has
negative days supply the record should be dropped as it is unknown if
the patient actually took the drug. Several actions are possible: 1)
record is not trustworthy and we remove the record entirely. 2) we trust
the record and leave days_supply empty or 3) record needs to be combined
with other record (e.g. reversal of prescription). High values (&gt;365
days) should be investigated. If considered an error in the source data
(e.g. typo), the value needs to be excluded to prevent creation of
unrealistic long eras.
</td>
<td style="text-align:left;">
integer
@ -5396,7 +5409,8 @@ from =.
Operators are =, &gt; and these concepts belong to the Meas Value
Operator domain. <a
href="https://athena.ohdsi.org/search-terms/terms?domain=Meas+Value+Operator&amp;standardConcept=Standard&amp;page=1&amp;pageSize=15&amp;query=">Accepted
Concepts</a>.
Concepts</a>. Leave it NULL if theres an exact numeric value given
(instead of putting =) or theres no numeric value at all.
</td>
<td style="text-align:left;">
integer
@ -5476,7 +5490,10 @@ results for measurements, it is a valid ETL choice to preserve both
values. The continuous value should go in the VALUE_AS_NUMBER field and
the categorical value should be mapped to a standard concept in the
Meas Value domain and put in the VALUE_AS_CONCEPT_ID field. This is
also the destination for the Maps to value relationship.
also the destination for the Maps to value relationship. If theres no
categorial result in a source_data, set value_as_concept_id to NULL, if
there is a categorial result in a source_data but without mapping, set
value_as_concept_id to 0.
</td>
<td style="text-align:left;">
integer
@ -5510,7 +5527,10 @@ data.
<td style="text-align:left;">
There is no standardization requirement for units associated with
MEASUREMENT_CONCEPT_IDs, however, it is the responsibility of the ETL to
choose the most plausible unit.
choose the most plausible unit. If the source unit is NULL (applicable
to cases when theres no numerical value or when it doesnt require a
unit), keep unit_concept_id NULL as well. If theres no mapping of a
source unit, populate unit_concept_id with 0.
</td>
<td style="text-align:left;">
integer
@ -6149,7 +6169,10 @@ href="https://athena.ohdsi.org/search-terms/terms/4167217">4167217</a>
Family history of clinical finding as well as a Maps to value record
to <a
href="https://athena.ohdsi.org/search-terms/terms/134057">134057</a>
Disorder of cardiovascular system.
Disorder of cardiovascular system. If theres no categorial result in
a source_data, set value_as_concept_id to NULL, if there is a categorial
result in a source_data but without mapping, set value_as_concept_id to
0.
</td>
<td style="text-align:left;">
Integer
@ -6214,7 +6237,10 @@ data.
<td style="text-align:left;">
There is no standardization requirement for units associated with
OBSERVATION_CONCEPT_IDs, however, it is the responsibility of the ETL to
choose the most plausible unit.
choose the most plausible unit. If the source unit is NULL (applicable
to cases when theres no numerical value or when it doesnt require a
unit), keep unit_concept_id NULL as well. If theres no mapping of a
source unit, populate unit_concept_id with 0.
</td>
<td style="text-align:left;">
integer
@ -7845,7 +7871,10 @@ The unit for the quantity of the specimen.
<td style="text-align:left;">
Map the UNIT_SOURCE_VALUE to a Standard Concept in the Unit domain. <a
href="https://athena.ohdsi.org/search-terms/terms?domain=Unit&amp;standardConcept=Standard&amp;page=1&amp;pageSize=15&amp;query=">Accepted
Concepts</a>
Concepts</a>. If the source unit is NULL (applicable to cases when
theres no numerical value or when it doesnt require a unit), keep
unit_concept_id NULL as well. If theres no mapping of a source unit,
populate unit_concept_id with 0.
</td>
<td style="text-align:left;">
integer
@ -10361,7 +10390,8 @@ be exposed to a particular active ingredient. A Drug Era is not the same
as a Drug Exposure: Exposures are individual records corresponding to
the source when Drug was delivered to the Person, while successive
periods of Drug Exposures are combined under certain rules to produce
continuous Drug Eras.</p>
continuous Drug Eras. Every record in the DRUG_EXPOSURE table should be
part of a drug era based on the dates of exposure.</p>
<p><strong>User Guide</strong></p>
<p>NA</p>
<p><strong>ETL Conventions</strong></p>
@ -10456,7 +10486,9 @@ PERSON
drug_concept_id
</td>
<td style="text-align:left;">
The Concept Id representing the specific drug ingredient.
The drug_concept_id should conform to the concept class ingredient as
the drug_era is an era of time where a person is exposed to a particular
drug ingredient.
</td>
<td style="text-align:left;">
</td>
@ -10850,8 +10882,9 @@ No
<p><strong>Table Description</strong></p>
<p>A Condition Era is defined as a span of time when the Person is
assumed to have a given condition. Similar to Drug Eras, Condition Eras
are chronological periods of Condition Occurrence. Combining individual
Condition Occurrences into a single Condition Era serves two
are chronological periods of Condition Occurrence and every Condition
Occurrence record should be part of a Condition Era. Combining
individual Condition Occurrences into a single Condition Era serves two
purposes:</p>
<ul>
<li>It allows aggregation of chronic conditions that require frequent
@ -11484,7 +11517,7 @@ cdm_etl_reference
<td style="text-align:left;">
</td>
<td style="text-align:left;">
Put the link to the CDM version used.
Version of the ETL script used. e.g. link to the Git release
</td>
<td style="text-align:left;">
varchar(255)
@ -11508,7 +11541,9 @@ No
source_release_date
</td>
<td style="text-align:left;">
The release date of the source data.
The date the data was extracted from the source system. In some systems
that is the same as the date the ETL was run. Typically the latest even
date in the source is on the source_release_date.
</td>
<td style="text-align:left;">
</td>
@ -11534,7 +11569,8 @@ No
cdm_release_date
</td>
<td style="text-align:left;">
The release data of the CDM instance.
The date the ETL script was completed. Typically this is after the
source_release_date.
</td>
<td style="text-align:left;">
</td>
@ -11560,6 +11596,7 @@ No
cdm_version
</td>
<td style="text-align:left;">
Version of the OMOP CDM used as string. e.g. v5.4
</td>
<td style="text-align:left;">
</td>

View File

@ -498,7 +498,10 @@ table is represented with a high-level description and ETL conventions
that should be followed. This is continued with a discussion of each
field in each table, any conventions related to the field, and
constraints that should be followed (like primary key, foreign key,
etc). Should you have questions please feel free to visit the <a
etc). All tables should be instantiated in a CDM instance but do not
need to be populated. Similarly, fields that are not required should
exist in the CDM table but do not need to be populated. Should you have
questions please feel free to visit the <a
href="https://forums.ohdsi.org/">forums</a> or the <a
href="https://github.com/ohdsi/CommonDataModel/issues">github issue</a>
page.</p>

View File

@ -515,7 +515,10 @@ v6.0. Each table is represented with a high-level description and ETL
conventions that should be followed. This is continued with a discussion
of each field in each table, any conventions related to the field, and
constraints that should be followed (like primary key, foreign key,
etc). Should you have questions please feel free to visit the <a
etc). All tables should be instantiated in a CDM instance but do not
need to be populated. Similarly, fields that are not required should
exist in the CDM table but do not need to be populated. Should you have
questions please feel free to visit the <a
href="https://forums.ohdsi.org/">forums</a> or the <a
href="https://github.com/ohdsi/CommonDataModel/issues">github issue</a>
page.</p>
@ -3837,12 +3840,20 @@ No
days_supply
</td>
<td style="text-align:left;">
The number of days of supply of the medication as recorded in the
original prescription or dispensing record. Days supply can differ from
actual drug duration (i.e. prescribed days supply vs actual exposure).
</td>
<td style="text-align:left;">
Days supply of the drug. This should be the verbatim days_supply as
given on the prescription. If the drug is physician administered use
duration end date if given or set to 1 as default if duration is not
available.
The field should be left empty if the source data does not contain a
verbatim days_supply, and should not be calculated from other
fields.Negative values are not allowed. Several actions are possible: 1)
record is not trustworthy and we remove the record entirely. 2) we trust
the record and leave days_supply empty or 3) record needs to be combined
with other record (e.g. reversal of prescription). High values (&gt;365
days) should be investigated. If considered an error in the source data
(e.g. typo), the value needs to be excluded to prevent creation of
unrealistic long eras.
</td>
<td style="text-align:left;">
integer
@ -8847,7 +8858,7 @@ Source value representing the validation status of the survey.
<td style="text-align:left;">
</td>
<td style="text-align:left;">
integer
varchar(100)
</td>
<td style="text-align:left;">
No

View File

@ -7,7 +7,7 @@ person,day_of_birth,No,integer,NA,"For data sources that provide the precise dat
person,birth_datetime,No,datetime,NA,"This field is not required but highly encouraged for data sources that provide the precise datetime of birth. If birth_datetime is not provided in the source, use the following logic to infer the date: If day_of_birth is null and month_of_birth is not null then use the first of the month in that year. If month_of_birth is null or if day_of_birth AND month_of_birth are both null and the person has records during their year of birth then use the date of the earliest record, otherwise use the 15th of June of that year. If time of birth is not given use midnight (00:00:0000).",No,No,NA,NA,NA,NA,NA
person,race_concept_id,Yes,integer,This field captures race or ethnic background of the person.,"Only use this field if you have information about race or ethnic background. The Vocabulary contains Concepts about the main races and ethnic backgrounds in a hierarchical system. Due to the imprecise nature of human races and ethnic backgrounds, this is not a perfect system. Mixed races are not supported. If a clear race or ethnic background cannot be established, use Concept_Id 0. [Accepted Race Concepts](http://athena.ohdsi.org/search-terms/terms?domain=Race&standardConcept=Standard&page=1&pageSize=15&query=).",No,Yes,CONCEPT,CONCEPT_ID,Race,NA,NA
person,ethnicity_concept_id,Yes,integer,"This field captures Ethnicity as defined by the Office of Management and Budget (OMB) of the US Government: it distinguishes only between ""Hispanic"" and ""Not Hispanic"". Races and ethnic backgrounds are not stored here.",Only use this field if you have US-based data and a source of this information. Do not attempt to infer Ethnicity from the race or ethnic background of the Person. [Accepted ethnicity concepts](http://athena.ohdsi.org/search-terms/terms?domain=Ethnicity&standardConcept=Standard&page=1&pageSize=15&query=),No,Yes,CONCEPT,CONCEPT_ID,Ethnicity,NA,NA
person,location_id,No,integer,The location refers to the physical address of the person. This field should capture the last known location of the person.,"Put the location_id from the [LOCATION](https://ohdsi.github.io/CommonDataModel/cdm531.html#location) table here that represents the most granular location information for the person. This could represent anything from postal code or parts thereof, state, or county for example. Since many databases contain deidentified data, it is common that the precision of the location is reduced to prevent re-identification. This field should capture the last known location.",No,Yes,LOCATION,LOCATION_ID,NA,NA,NA
person,location_id,No,integer,The location refers to the physical address of the person. This field should capture the last known location of the person.,"Put the location_id from the [LOCATION](https://ohdsi.github.io/CommonDataModel/cdm54.html#LOCATION) table here that represents the most granular location information for the person. This could represent anything from postal code or parts thereof, state, or county for example. Since many databases contain deidentified data, it is common that the precision of the location is reduced to prevent re-identification. This field should capture the last known location.",No,Yes,LOCATION,LOCATION_ID,NA,NA,NA
person,provider_id,No,integer,The Provider refers to the last known primary care provider (General Practitioner).,"Put the provider_id from the [PROVIDER](https://ohdsi.github.io/CommonDataModel/cdm531.html#provider) table of the last known general practitioner of the person. If there are multiple providers, it is up to the ETL to decide which to put here.",No,Yes,PROVIDER,PROVIDER_ID,NA,NA,NA
person,care_site_id,No,integer,The Care Site refers to where the Provider typically provides the primary care.,NA,No,Yes,CARE_SITE,CARE_SITE_ID,NA,NA,NA
person,person_source_value,No,varchar(50),Use this field to link back to persons in the source data. This is typically used for error checking of ETL logic.,Some use cases require the ability to link back to persons in the source data. This field allows for the storing of the person value as it appears in the source. This field is not required but strongly recommended.,No,No,NA,NA,NA,NA,NA
@ -378,7 +378,7 @@ episode,episode_parent_id,No,integer,Use this field to find the Episode that sub
episode,episode_number,No,integer,"For sequences of episodes, this is used to indicate the order the episodes occurred. For example, lines of treatment could be indicated here.",Please see [article] for the details of how to count episodes.,No,No,NA,NA,NA,NA,NA
episode,episode_object_concept_id,Yes,integer,"A Standard Concept representing the disease phase, outcome, or other abstraction of which the episode consists. For example, if the EPISODE_CONCEPT_ID is [treatment regimen](https://athena.ohdsi.org/search-terms/terms/32531) then the EPISODE_OBJECT_CONCEPT_ID should contain the chemotherapy regimen concept, like [Afatinib monotherapy](https://athena.ohdsi.org/search-terms/terms/35804392).",Episode entries from the 'Disease Episode' concept class should have an episode_object_concept_id that comes from the Condition domain. Episode entries from the 'Treatment Episode' concept class should have an episode_object_concept_id that scome from the 'Procedure' domain or 'Regimen' concept class.,No,Yes,CONCEPT,CONCEPT_ID,"Procedure, Regimen",NA,NA
episode,episode_type_concept_id,Yes,integer,"This field can be used to determine the provenance of the Episode record, as in whether the episode was from an EHR system, insurance claim, registry, or other sources.",Choose the EPISODE_TYPE_CONCEPT_ID that best represents the provenance of the record. [Accepted Concepts](https://athena.ohdsi.org/search-terms/terms?domain=Type+Concept&standardConcept=Standard&page=1&pageSize=15&query=). A more detailed explanation of each Type Concept can be found on the [vocabulary wiki](https://github.com/OHDSI/Vocabulary-v5.0/wiki/Vocab.-TYPE_CONCEPT).,No,Yes,CONCEPT,CONCEPT_ID,Type Concept,NA,NA
episode,episode_source_value,No,varchar(50),The source code for the Episdoe as it appears in the source data. This code is mapped to a Standard Condition Concept in the Standardized Vocabularies and the original code is stored here for reference.,NA,No,No,NA,NA,NA,NA,NA
episode,episode_source_value,No,varchar(50),The source code for the Episode as it appears in the source data. This code is mapped to a Standard Condition Concept in the Standardized Vocabularies and the original code is stored here for reference.,NA,No,No,NA,NA,NA,NA,NA
episode,episode_source_concept_id,No,integer,A foreign key to a Episode Concept that refers to the code used in the source.,Given that the Episodes are user-defined it is unlikely that there will be a Source Concept available. If that is the case then set this field to zero.,No,Yes,CONCEPT,CONCEPT_ID,NA,NA,NA
episode_event,episode_id,Yes,integer,Use this field to link the EPISODE_EVENT record to its EPISODE.,Put the EPISODE_ID that subsumes the EPISODE_EVENT record here.,No,Yes,EPISODE,EPISODE_ID,NA,NA,NA
episode_event,event_id,Yes,integer,"This field is the primary key of the linked record in the database. For example, if the Episode Event is a Condition Occurrence, then the CONDITION_OCCURRENCE_ID of the linked record goes in this field.",Put the primary key of the linked record here.,No,No,NA,NA,NA,NA,NA

1 cdmTableName cdmFieldName isRequired cdmDatatype userGuidance etlConventions isPrimaryKey isForeignKey fkTableName fkFieldName fkDomain fkClass unique DQ identifiers
7 person birth_datetime No datetime NA This field is not required but highly encouraged for data sources that provide the precise datetime of birth. If birth_datetime is not provided in the source, use the following logic to infer the date: If day_of_birth is null and month_of_birth is not null then use the first of the month in that year. If month_of_birth is null or if day_of_birth AND month_of_birth are both null and the person has records during their year of birth then use the date of the earliest record, otherwise use the 15th of June of that year. If time of birth is not given use midnight (00:00:0000). No No NA NA NA NA NA
8 person race_concept_id Yes integer This field captures race or ethnic background of the person. Only use this field if you have information about race or ethnic background. The Vocabulary contains Concepts about the main races and ethnic backgrounds in a hierarchical system. Due to the imprecise nature of human races and ethnic backgrounds, this is not a perfect system. Mixed races are not supported. If a clear race or ethnic background cannot be established, use Concept_Id 0. [Accepted Race Concepts](http://athena.ohdsi.org/search-terms/terms?domain=Race&standardConcept=Standard&page=1&pageSize=15&query=). No Yes CONCEPT CONCEPT_ID Race NA NA
9 person ethnicity_concept_id Yes integer This field captures Ethnicity as defined by the Office of Management and Budget (OMB) of the US Government: it distinguishes only between "Hispanic" and "Not Hispanic". Races and ethnic backgrounds are not stored here. Only use this field if you have US-based data and a source of this information. Do not attempt to infer Ethnicity from the race or ethnic background of the Person. [Accepted ethnicity concepts](http://athena.ohdsi.org/search-terms/terms?domain=Ethnicity&standardConcept=Standard&page=1&pageSize=15&query=) No Yes CONCEPT CONCEPT_ID Ethnicity NA NA
10 person location_id No integer The location refers to the physical address of the person. This field should capture the last known location of the person. Put the location_id from the [LOCATION](https://ohdsi.github.io/CommonDataModel/cdm531.html#location) table here that represents the most granular location information for the person. This could represent anything from postal code or parts thereof, state, or county for example. Since many databases contain deidentified data, it is common that the precision of the location is reduced to prevent re-identification. This field should capture the last known location. Put the location_id from the [LOCATION](https://ohdsi.github.io/CommonDataModel/cdm54.html#LOCATION) table here that represents the most granular location information for the person. This could represent anything from postal code or parts thereof, state, or county for example. Since many databases contain deidentified data, it is common that the precision of the location is reduced to prevent re-identification. This field should capture the last known location. No Yes LOCATION LOCATION_ID NA NA NA
11 person provider_id No integer The Provider refers to the last known primary care provider (General Practitioner). Put the provider_id from the [PROVIDER](https://ohdsi.github.io/CommonDataModel/cdm531.html#provider) table of the last known general practitioner of the person. If there are multiple providers, it is up to the ETL to decide which to put here. No Yes PROVIDER PROVIDER_ID NA NA NA
12 person care_site_id No integer The Care Site refers to where the Provider typically provides the primary care. NA No Yes CARE_SITE CARE_SITE_ID NA NA NA
13 person person_source_value No varchar(50) Use this field to link back to persons in the source data. This is typically used for error checking of ETL logic. Some use cases require the ability to link back to persons in the source data. This field allows for the storing of the person value as it appears in the source. This field is not required but strongly recommended. No No NA NA NA NA NA
378 domain domain_name Yes varchar(255) The name describing the Domain, e.g. Condition, Procedure, Measurement etc. NA No No NA NA NA NA NA
379 domain domain_concept_id Yes integer A Concept representing the Domain Concept the DOMAIN record belongs to. NA No Yes CONCEPT CONCEPT_ID NA NA NA
380 concept_class concept_class_id Yes varchar(20) A unique key for each class. NA Yes No NA NA NA NA NA
381 concept_class concept_class_name Yes varchar(255) The name describing the Concept Class, e.g. Clinical Finding, Ingredient, etc. NA No No NA NA NA NA NA
382 concept_class concept_class_concept_id Yes integer A Concept that represents the Concept Class. NA No Yes CONCEPT CONCEPT_ID NA NA NA
383 concept_relationship concept_id_1 Yes integer NA NA No Yes CONCEPT CONCEPT_ID NA NA NA
384 concept_relationship concept_id_2 Yes integer NA NA No Yes CONCEPT CONCEPT_ID NA NA NA

View File

@ -23,10 +23,6 @@ Below is the specification document for the OMOP Common Data Model, v5.3 (previo
*__Special Note__ This documentation previously referenced v5.3.1. During the OHDSI/CommonDataModel Hack-A-Thon that occurred on August 18, 2021 the decision was made to align documentation with the minor releases. Hot fixes and minor.minor release can be found through the searching of tags.*
--after regeneration of DDLs
link to csv of cdm
link to pdf of cdm documentation
link to forum on doc page
```{r docLoop53, echo=FALSE, results='asis'}
tableSpecs <- read.csv("../inst/csv/OMOP_CDMv5.3_Table_Level.csv", stringsAsFactors = FALSE)

View File

@ -32,7 +32,7 @@ The table below details which OHDSI tools support CDM v5.4. There are two levels
|--|--|--|--|
|**CDM R package**|This package can be downloaded from [https://github.com/OHDSI/CommonDataModel/](https://github.com/OHDSI/CommonDataModel/). It functions to dynamically create the OMOP CDM documentation and DDL scripts to instantiate the CDM tables. |`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("heavy_check_mark")`
|**Data Quality Dashboard**|This package can be downloaded from [https://github.com/OHDSI/DataQualityDashboard](https://github.com/OHDSI/DataQualityDashboard). It runs a set of > 3500 data quality checks against an OMOP CDM instance and is required to be run on all databases prior to participating in an OHDSI network research study.|`r emoji::emoji("heavy_check_mark")`| `r emoji::emoji("exclamation")`
|**Achilles**|This package can be downloaded from [https://github.com/OHDSI/Achilles](https://github.com/OHDSI/Achilles), performing a set of broad database characterizations agains an OMOP CDM instance. |`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("exclamation")`
|**Achilles**|This package can be downloaded from [https://github.com/OHDSI/Achilles](https://github.com/OHDSI/Achilles), performing a set of broad database characterizations against an OMOP CDM instance. |`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("exclamation")`
|**ARES**|This package can be downloaded from [https://github.com/OHDSI/Ares](https://github.com/OHDSI/Ares) and is designed to display the results from both the ACHILLES and DataQualityDashboard packages to support data quality and characterization research.|`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("exclamation")`
|**ATLAS**|ATLAS is an open source software tool for researchers to conduct scientific analyses on standardized observational data. [Demo](http://atlas-demo.ohdsi.org/) |`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("exclamation")`
|**Rabbit-In-A-Hat**|This package can be downloaded from [https://github.com/OHDSI/WhiteRabbit](https://github.com/OHDSI/WhiteRabbit) and is an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.|`r emoji::emoji("heavy_check_mark")`|`r emoji::emoji("heavy_check_mark")`

View File

@ -23,12 +23,8 @@ library(stringr)
Please be aware that v6.0 of the OMOP CDM is **not** fully supported by the OHDSI suite of tools and methods. The major difference in CDM v5.3 and CDM v6.0 involves switching the \*_datetime fields to mandatory rather than optional. This switch radically changes the assumptions related to exposure and outcome timing. Rather than move forward with v6.0, CDM v5.4 was designed with additions to the model that have been requested by the community while retaining the date structure of medical events in v5.3. Please see our the specifications for [CDM v5.4](http://ohdsi.github.io/CommonDataModel/cdm54.html) and detailed [changes from CDM v5.3](http://ohdsi.github.io/CommonDataModel/cdm54Changes.html). **For new collaborators to OHDSI, please transform your data to [CDM v5.4](https://github.com/OHDSI/CommonDataModel/releases/tag/v5.4.0) until such time that the v6 series of the CDM is ready for mainstream use.**
Below is the specification document for the OMOP Common Data Model, v6.0. Each table is represented with a high-level description and ETL conventions that should be followed. This is continued with a discussion of each field in each table, any conventions related to the field, and constraints that should be followed (like primary key, foreign key, etc). Should you have questions please feel free to visit the [forums](https://forums.ohdsi.org/) or the [github issue](https://github.com/ohdsi/CommonDataModel/issues) page.
Below is the specification document for the OMOP Common Data Model, v6.0. Each table is represented with a high-level description and ETL conventions that should be followed. This is continued with a discussion of each field in each table, any conventions related to the field, and constraints that should be followed (like primary key, foreign key, etc). All tables should be instantiated in a CDM instance but do not need to be populated. Similarly, fields that are not required should exist in the CDM table but do not need to be populated. Should you have questions please feel free to visit the [forums](https://forums.ohdsi.org/) or the [github issue](https://github.com/ohdsi/CommonDataModel/issues) page.
--after regeneration of DDLs
link to csv of cdm
link to pdf of cdm documentation
link to forum on doc page
## **Changes in v6.0**

View File

@ -145,7 +145,7 @@ not necessary to query for the existence of a relationship both in the concept_i
fields.
- Concept Relationships define direct relationships between Concepts. Indirect relationships through 3rd
Concepts are not captured in this table. However, the [CONCEPT_ANCESTOR](https://ohdsi.github.io/CommonDataModel/cdm531.html#concept_ancestor) table does this for
hierachical relationships over several “generations” of direct relationships.
hierarchical relationships over several “generations” of direct relationships.
- In previous versions of the CDM, the relationship_id used to be a numerical identifier. See the
[RELATIONSHIP](https://ohdsi.github.io/CommonDataModel/cdm531.html#relationship) table.

View File

@ -60,7 +60,7 @@ The table below details which OHDSI tools support CDM v5.4. There are two levels
|--|--|--|--|
|**CDM R package**|This package can be downloaded from [https://github.com/OHDSI/CommonDataModel/](https://github.com/OHDSI/CommonDataModel/). It functions to dynamically create the OMOP CDM documentation and DDL scripts to instantiate the CDM tables. |`r emo::ji("white heavy check mark")`|`r emo::ji("white heavy check mark")`
|**Data Quality Dashboard**|This package can be downloaded from [https://github.com/OHDSI/DataQualityDashboard](https://github.com/OHDSI/DataQualityDashboard). It runs a set of > 3500 data quality checks against an OMOP CDM instance and is required to be run on all databases prior to participating in an OHDSI network research study.|`r emo::ji("white heavy check mark")`| `r emo::ji("warning")`
|**Achilles**|This package can be downloaded from [https://github.com/OHDSI/Achilles](https://github.com/OHDSI/Achilles), performing a set of broad database characterizations agains an OMOP CDM instance. |`r emo::ji("white heavy check mark")`|`r emo::ji("warning")`
|**Achilles**|This package can be downloaded from [https://github.com/OHDSI/Achilles](https://github.com/OHDSI/Achilles), performing a set of broad database characterizations against an OMOP CDM instance. |`r emo::ji("white heavy check mark")`|`r emo::ji("warning")`
|**ARES**|This package can be downloaded from [https://github.com/OHDSI/Ares](https://github.com/OHDSI/Ares) and is designed to display the results from both the ACHILLES and DataQualityDashboard packages to support data quality and characterization research.|`r emo::ji("white heavy check mark")`|`r emo::ji("warning")`
|**ATLAS**|ATLAS is an open source software tool for researchers to conduct scientific analyses on standardized observational data. [Demo](http://atlas-demo.ohdsi.org/) |`r emo::ji("white heavy check mark")`|`r emo::ji("warning")`
|**Rabbit-In-A-Hat**|This package can be downloaded from [https://github.com/OHDSI/WhiteRabbit](https://github.com/OHDSI/WhiteRabbit) and is an application for interactive design of an ETL to the OMOP Common Data Model with the help of the the scan report generated by White Rabbit.|`r emo::ji("white heavy check mark")`|`r emo::ji("white heavy check mark")`

View File

@ -1,48 +1,47 @@
---
title: "Indices, Primary Keys and Foreign Key Constraints"
output:
html_document:
toc: true
toc_depth: 5
toc_float: true
---
## Overview
Database indices improve the performance of queries against a database by organizing the data in a way that increase query execution.
This article was written to provide guidance on the setting of indices, primary and foreign keys for data that has been transformed into the Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM). The community that supports the design and development of the OHDSI/CommonDataModel Github repository is a diverse collaborative of healthcare and technical profesisonals whom have limited data base adminstrative (DBA) experience. As a result, the comments below should be interpreted as suggestions and recommendations to help increase performance. Your teams needs may call for a modified configuration.
## General Recommendations
Should your database of choice support indexing, the OMOP CDM Working Group recommends
* Indexing on all columns containing an "_id" (e.g. condition_occurrence_id, drug_exposure_id, measurement_id, procedure_occurrence_id, etc.)
* Indexing on primary and foreign keys
For all databases, regardless of custom indice support, primary and foreign keys should be set. This is a step towards ensuring data integrity. Information on what table level attributes should be set as primary and foreign keys can be found within the *_Field_Level.csv file(s) located in the [INST/CSV directory](https://github.com/OHDSI/CommonDataModel/tree/v5.4/inst/csv)
## Database support
The OHDSI/CommonDataModel package leverages OHDSI/SQLRender and as a result is only capable of supporting sources that are supported by OHDSI/SQLRender. The following databases are currently supported.
### Microsoft SQL Server
### Oracle
### PostgreSQL
### Amazon Redshift
On AWS Redshift it is important to ensure that your data is properly distributed and sorted across nodes. Compression on certain columns may also help. The designed DDL does set DISTKEYS in an effort to optimize performance. This configuration can be seen within the [Redshift-specific DDL](https://github.com/OHDSI/CommonDataModel/blob/v5.4/ddl/5.4/redshift/OMOPCDM_redshift_5.4_ddl.sql).
### Impala
### IBM Netezza
### Google BigQuery
Google BigQuery does not require manual optimization and/or sizing. Google BigQuery does massive parallel full table scans and intensive caching, all under the hood.
[Reference](https://forums.ohdsi.org/t/iso-best-practices-of-cdm-indexing/10939/2)
### Microsoft Parallel Data Warehouse (PDW)
### SQLite
### Databricks
This database type is not yet supported but is actively being worked on by a number of collaborators. For more informtion, please contact Ajit Londhe of Amgen.
## References
[ISO Best Practices of CDM Indexing](https://forums.ohdsi.org/t/iso-best-practices-of-cdm-indexing/10939/2)
---
title: "Indices, Primary Keys and Foreign Key Constraints"
output:
html_document:
toc: true
toc_depth: 5
toc_float: true
---
## Overview
Database indices improve the performance of queries against a database by organizing the data in a way that increase query execution.
This article was written to provide guidance on the setting of indices, primary and foreign keys for data that has been transformed into the Observational Medical Outcome Partnership (OMOP) Common Data Model (CDM). The community that supports the design and development of the OHDSI/CommonDataModel Github repository is a diverse collaborative of healthcare and technical profesisonals whom have limited data base administrative (DBA) experience. As a result, the comments below should be interpreted as suggestions and recommendations to help increase performance. Your teams needs may call for a modified configuration.
## General Recommendations
Should your database of choice support indexing, the OMOP CDM Working Group recommends
* Indexing on all columns containing an "_id" (e.g. condition_occurrence_id, drug_exposure_id, measurement_id, procedure_occurrence_id, etc.)
* Indexing on primary and foreign keys
For all databases, regardless of custom indice support, primary and foreign keys should be set. This is a step towards ensuring data integrity. Information on what table level attributes should be set as primary and foreign keys can be found within the *_Field_Level.csv file(s) located in the [INST/CSV directory](https://github.com/OHDSI/CommonDataModel/tree/v5.4/inst/csv)
## Database support
The OHDSI/CommonDataModel package leverages OHDSI/SQLRender and as a result is only capable of supporting sources that are supported by OHDSI/SQLRender. The following databases are currently supported.
### Microsoft SQL Server
### Oracle
### PostgreSQL
### Amazon Redshift
On AWS Redshift it is important to ensure that your data is properly distributed and sorted across nodes. Compression on certain columns may also help. The designed DDL does set DISTKEYS in an effort to optimize performance. This configuration can be seen within the [Redshift-specific DDL](https://github.com/OHDSI/CommonDataModel/blob/v5.4/ddl/5.4/redshift/OMOPCDM_redshift_5.4_ddl.sql).
### Impala
### IBM Netezza
### Google BigQuery
Google BigQuery does not require manual optimization and/or sizing. Google BigQuery does massive parallel full table scans and intensive caching, all under the hood.
[Reference](https://forums.ohdsi.org/t/iso-best-practices-of-cdm-indexing/10939/2)
### Microsoft Parallel Data Warehouse (PDW)
### SQLite
### Databricks
## References
[ISO Best Practices of CDM Indexing](https://forums.ohdsi.org/t/iso-best-practices-of-cdm-indexing/10939/2)

View File

@ -475,7 +475,7 @@ WHERE vocabulary_id = 'None';
### Visit Concept Roll-up
The query below will utilize the Visit Concept hierarcy to find the highest-level ancestors. In the case that both the VISIT_OCCURRENCE and VISIT_DETAIL tables are populated, it is good practice (though not required) to use the highest-level ancestors as the VISIT_CONCEPT_IDs in the VISIT_OCCURRENCE table and their children as the VISIT_DETAIL_CONCEPT_IDs in the VISIT_DETAIL table. This relationship between the VISIT_OCCURRENCE and VISIT_DETAIL tables allow for standardized Visit logic to be written, building Visits from Visit Details. For more information on how this can be done, please see the [Optum Extended ETL documentation](https://ohdsi.github.io/ETL-LambdaBuilder/Optum%20Clinformatics/Optum_visit_occurrence.html).
The query below will utilize the Visit Concept hierarchy to find the highest-level ancestors. In the case that both the VISIT_OCCURRENCE and VISIT_DETAIL tables are populated, it is good practice (though not required) to use the highest-level ancestors as the VISIT_CONCEPT_IDs in the VISIT_OCCURRENCE table and their children as the VISIT_DETAIL_CONCEPT_IDs in the VISIT_DETAIL table. This relationship between the VISIT_OCCURRENCE and VISIT_DETAIL tables allow for standardized Visit logic to be written, building Visits from Visit Details. For more information on how this can be done, please see the [Optum Extended ETL documentation](https://ohdsi.github.io/ETL-LambdaBuilder/Optum%20Clinformatics/Optum_visit_occurrence.html).
```{sql eval=FALSE, echo=TRUE}
SELECT concept_id, concept_name