Metadata and health system documentation for CDM v6.0

This commit is contained in:
clairblacketer 2018-09-27 15:09:54 -04:00
parent 2686e73b79
commit 91d438af14
7 changed files with 55 additions and 21 deletions

View File

@ -9,12 +9,18 @@ Field|Required|Type|Description
|state|No|varchar(2)|The state field as it appears in the source data.|
|zip|No|varchar(9)|The zip or postal code.|
|county|No|varchar(20)|The county.|
|country|No|varchar(100)|The country|
|location_source_value|No|varchar(50)|The verbatim information that is used to uniquely identify the location as it appears in the source data.|
|latitude|No|float|The geocoded latitude|
|longitude|No|float|The geocoded longitude|
### Conventions
* Each address or Location is unique and is present only once in the table.
* Locations do not contain names, such as the name of a hospital. In order to construct a full address that can be used in the postal service, the address information from the Location needs to be combined with information from the Care Site. The PERSON table does not contain name information at all.
* All fields in the Location tables contain the verbatim data in the source, no mapping or normalization takes place. None of the fields are mandatory. If the source data have no Location information at all, all Locations are represented by a single record. Typically, source data contain full or partial zip or postal codes or county or census district information.
* Zip codes are handled as strings of up to 9 characters length. For US addresses, these represent either a 3-digit abbreviated Zip code as provided by many sources for patient protection reasons, the full 5-digit Zip or the 9-digit (ZIP + 4) codes. Unless for specific reasons analytical methods should expect and utilize only the first 3 digits. For international addresses, different rules apply.
* The county information can be provided and is not redundant with information from the zip codes as not all of these have an unambiguous county designation.
* No country information is expected as source data are always collected within a single country.
No.|Convention Description
:--------|:------------------------------------
| 1 | Each address or Location is unique and is present only once in the table. |
| 2 | Locations do not contain names, such as the name of a hospital. In order to construct a full address that can be used in the postal service, the address information from the Location needs to be combined with information from the Care Site. The PERSON table does not contain name information at all. |
| 3 | All fields in the Location tables contain the verbatim data in the source, no mapping or normalization takes place. None of the fields are mandatory. If the source data have no Location information at all, all Locations are represented by a single record. Typically, source data contain full or partial zip or postal codes or county or census district information. |
| 4 | Zip codes are handled as strings of up to 9 characters length. For US addresses, these represent either a 3-digit abbreviated Zip code as provided by many sources for patient protection reasons, the full 5-digit Zip or the 9-digit (ZIP + 4) codes. Unless for specific reasons analytical methods should expect and utilize only the first 3 digits. For international addresses, different rules apply. |
| 5 | The county information can be provided and is not redundant with information from the zip codes as not all of these have an unambiguous county designation. |
| 6 | For standardized geospatial visualization and analysis, addresses need to be, at the minimum be geocoded into latitude and longitude. This allows it to put as a point on a map. This proposal is to add two fields, latitude and longitude to the location table. |

View File

@ -0,0 +1,23 @@
## LOCATION_HISTORY
The LOCATION HISTORY table stores relationships between Persons or Care Sites and geographic locations over time.
Field|Required|Type|Description
:------------------------------|:--------|:------------|:----------------------------------------------
|location_id |Yes|integer|A foreign key to the location table.|
|relationship_type_concept_id |No|varchar(50)|The type of relationship between location and entity.|
|domain_id |Yes|varchar(50)|The domain of the entity that is related to the location. Either PERSON, PROVIDER, or CARE_SITE.|
|entity_id |Yes|integer|The unique identifier for the entity. References either person_id, provider_id, or care_site_id, depending on domain_id.|
|start_date |Yes|date|The date the relationship started.|
|end_date |No|date|The date the relationship ended.|
### Conventions
No.|Convention Description
:--------|:------------------------------------
| 1 | The entities (and permissible domains) with related locations are: Persons (PERSON), Providers (PROVIDER), and Care Sites (CARE_SITE). |
| 2 | DOMAIN_ID specifies which table the ENTITY_ID refers to |
| 3 | Locations and entities are static. Relationships between locations and entities are dynamic. |
| 4 | When the domain is PERSON, the permissible values of relationship_type are: 'residence', 'work site', 'school'. |
| 5 | When the domain is CARE_SITE, the value of relationship_type is NULL. |
| 6 | When the domain is PROVIDER, the value of relationship_type is 'office'. |

View File

@ -17,8 +17,11 @@ Field|Required|Type|Description
|gender_source_concept_id|No|integer|A foreign key to a Concept that refers to the code used in the source.|
### Conventions
* Many sources do not make a distinction between individual and institutional providers. The PROVIDER table contains the individual providers.
* If the source, instead of uniquely identifying individual providers, only provides limited information such as specialty, generic or "pooled" Provider records are listed in the PROVIDER table.
* A single Provider cannot be listed twice (be duplicated) in the table. If a Provider has more than one Specialty, the main or most often exerted specialty should be recorded.
* Valid Specialty Concepts belong to the 'Specialty' domain.
* The care_site_id represent a fixed relationship between a Provider and her main Care Site. Providers are also linked to Care Sites through Condition, Procedure and Visit records.
No.|Convention Description
:--------|:------------------------------------
| 1 | Many sources do not make a distinction between individual and institutional providers. The PROVIDER table contains the individual providers. |
| 2 | If the source, instead of uniquely identifying individual providers, only provides limited information such as specialty, generic or 'pooled' Provider records are listed in the PROVIDER table. |
| 3 | A single Provider cannot be listed twice (be duplicated) in the table. If a Provider has more than one Specialty, the main or most often exerted specialty should be recorded. |
| 4 | Valid Specialty Concepts belong to the 'Specialty' domain. |
| 5 | The CARE_SITE_ID represent a fixed relationship between a Provider and her main Care Site. Providers are also linked to Care Sites through Condition, Procedure and Visit records. |

View File

@ -1,4 +1,5 @@
[LOCATION](https://github.com/OHDSI/CommonDataModel/wiki/LOCATION)
[LOCATION](https://github.com/OHDSI/CommonDataModel/wiki/LOCATION)
[LOCATION_HISTORY](https://github.com/OHDSI/CommonDataModel/wiki/LOCATION_HISTORY)
[CARE_SITE](https://github.com/OHDSI/CommonDataModel/wiki/CARE_SITE)
[PROVIDER](https://github.com/OHDSI/CommonDataModel/wiki/PROVIDER)

View File

@ -15,6 +15,8 @@ Field|Required|Type|Description
### Conventions
* If a source database is derived from multiple data feeds, the integration of those disparate sources is expected to be documented in the ETL specifications. The source information on each of the databases can be represented as separate records in the CDM_SOURCE table.
* Currently, there is no mechanism to link individual records in the CDM tables to their source record in the CDM_SOURCE table.
* The version of the vocabulary can be obtained from the vocabulary_name field in the VOCABULARY table for the record where vocabulary_id='None'.
No.|Convention Description
:--------|:------------------------------------
| 1 | If a source database is derived from multiple data feeds, the integration of those disparate sources is expected to be documented in the ETL specifications. The source information on each of the databases can be represented as separate records in the CDM_SOURCE table. |
| 2 | Currently, there is no mechanism to link individual records in the CDM tables to their source record in the CDM_SOURCE table. |
| 3 | The version of the vocabulary can be obtained from the vocabulary_name field in the VOCABULARY table for the record where vocabulary_id='None'. |

View File

@ -12,4 +12,6 @@ Field |Required |Type |Description
### Conventions
*
No.|Convention Description
:--------|:------------------------------------
| 1 | One record in the Metadata table is pre-populated in the DDL indicating the CDM version of the database. |

View File

@ -1,8 +1,5 @@
[CDM_SOURCE](https://github.com/OHDSI/CommonDataModel/wiki/CDM_SOURCE)
[METADATA](https://github.com/OHDSI/CommonDataModel/wiki/METADATA)
All metadata about the data should be derived from the data themselves. However, the following contains a few key pieces of information that are convenient especially for software applications utilizing the CDM data.
Below provides an entity-relationship diagram highlighting the tables within the Standardized Metadata portion of the OMOP Common Data Model:
![Metadata entity-relationship diagram](http://www.ohdsi.org/web/wiki/lib/exe/fetch.php?media=documentation:cdm:standard_meta_data.png)\
All metadata about the data should be derived from the data themselves.