Merge pull request #20 from anthonysena/V5ConversionImprovement
Improvements to conversion scripts, documentation and DRG conversion.
This commit is contained in:
commit
2caea197eb
|
@ -1,152 +0,0 @@
|
||||||
/*********************************************************************************
|
|
||||||
# Copyright 2015 Observational Health Data Sciences and Informatics
|
|
||||||
#
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
********************************************************************************/
|
|
||||||
|
|
||||||
/*******************************************************************************
|
|
||||||
|
|
||||||
PURPOSE: Use this script is used to help perform Quality Assurance activities
|
|
||||||
after you convert your OMOP V4 common data model to CDM V5.
|
|
||||||
|
|
||||||
last revised: 01 July 2015
|
|
||||||
author: Anthony Sena
|
|
||||||
|
|
||||||
This script was authored against SQL Server and will require conversion to other
|
|
||||||
dialects. Please keep this in mind if you plan to use this against another RDBMS.
|
|
||||||
|
|
||||||
General Notes
|
|
||||||
---------------
|
|
||||||
|
|
||||||
This script will use the metadata tables from the V4 and V5 tables to get a list
|
|
||||||
of all of the tables from each database and the rowcounts for each table in an
|
|
||||||
effort to help you see how your data has changed through the conversion process.
|
|
||||||
|
|
||||||
In the results, we include a column to identify the tables that were part of
|
|
||||||
the migration process in an effort to hone in on the key tables.
|
|
||||||
|
|
||||||
There is a Part 2 of this QA script which will also show you how data moved
|
|
||||||
amongst some specific tables.
|
|
||||||
|
|
||||||
|
|
||||||
INSTRUCTIONS
|
|
||||||
------------
|
|
||||||
|
|
||||||
1. This script has placeholders for your CDM V4 and CDMV5 database/schema.
|
|
||||||
In order to make this file work in your environment, you
|
|
||||||
should plan to do a global "FIND AND REPLACE" on this file to fill in the
|
|
||||||
file with values that pertain to your environment. The following are the
|
|
||||||
tokens you should use when doing your "FIND AND REPLACE" operation:
|
|
||||||
|
|
||||||
a. [SOURCE_CDMV4]
|
|
||||||
b. [TARGET_CDMV5]
|
|
||||||
|
|
||||||
2. Run the resulting script on your target RDBDMS.
|
|
||||||
|
|
||||||
*********************************************************************************/
|
|
||||||
--USE [TARGET_CDMV5]
|
|
||||||
USE [CDMV5_Conversion_Target]
|
|
||||||
GO
|
|
||||||
|
|
||||||
IF OBJECT_ID('tempdb..#v5_stats', 'U') IS NOT NULL
|
|
||||||
DROP TABLE #v5_stats;
|
|
||||||
|
|
||||||
SELECT
|
|
||||||
DB_NAME() as DBName,
|
|
||||||
t.NAME AS TableName,
|
|
||||||
p.[Rows]
|
|
||||||
INTO #v5_stats
|
|
||||||
FROM
|
|
||||||
sys.tables t
|
|
||||||
INNER JOIN
|
|
||||||
sys.indexes i ON t.OBJECT_ID = i.object_id
|
|
||||||
INNER JOIN
|
|
||||||
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
|
|
||||||
INNER JOIN
|
|
||||||
sys.allocation_units a ON p.partition_id = a.container_id
|
|
||||||
WHERE
|
|
||||||
t.NAME NOT LIKE 'dt%' AND
|
|
||||||
i.OBJECT_ID > 255 AND
|
|
||||||
i.index_id <= 1
|
|
||||||
GROUP BY
|
|
||||||
t.NAME, i.object_id, i.index_id, i.name, p.[Rows]
|
|
||||||
ORDER BY
|
|
||||||
object_name(i.object_id)
|
|
||||||
|
|
||||||
--USE [SOURCE_CDMV4]
|
|
||||||
USE [CDM_TRUVEN_CCAE_6k]
|
|
||||||
GO
|
|
||||||
|
|
||||||
IF OBJECT_ID('tempdb..#v4_stats', 'U') IS NOT NULL
|
|
||||||
DROP TABLE #v4_stats;
|
|
||||||
|
|
||||||
SELECT
|
|
||||||
DB_NAME() as DBName,
|
|
||||||
t.NAME AS TableName,
|
|
||||||
p.[Rows]
|
|
||||||
INTO #v4_stats
|
|
||||||
FROM
|
|
||||||
sys.tables t
|
|
||||||
INNER JOIN
|
|
||||||
sys.indexes i ON t.OBJECT_ID = i.object_id
|
|
||||||
INNER JOIN
|
|
||||||
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
|
|
||||||
INNER JOIN
|
|
||||||
sys.allocation_units a ON p.partition_id = a.container_id
|
|
||||||
WHERE
|
|
||||||
t.NAME NOT LIKE 'dt%' AND
|
|
||||||
i.OBJECT_ID > 255 AND
|
|
||||||
i.index_id <= 1
|
|
||||||
GROUP BY
|
|
||||||
t.NAME, i.object_id, i.index_id, i.name, p.[Rows]
|
|
||||||
ORDER BY
|
|
||||||
object_name(i.object_id)
|
|
||||||
|
|
||||||
DECLARE @MigrationTarget TABLE (TableName varchar(100))
|
|
||||||
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'care_site'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'condition_era'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'condition_occurrence'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'death'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'device_exposure'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'drug_cost'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'drug_era'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'drug_exposure'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'location'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'measurement'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'observation'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'observation_period'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'payer_plan_period'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'person'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'procedure_cost'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'procedure_occurrence'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'provider'
|
|
||||||
INSERT INTO @MigrationTarget SELECT 'visit_occurrence'
|
|
||||||
|
|
||||||
select
|
|
||||||
ISNULL(V4.DBName, 'No V4 Table Equivalent') as "Database Name",
|
|
||||||
v4.TableName,
|
|
||||||
v4.rows,
|
|
||||||
ISNULL(V5.DBName, 'No V5 Table Equivalent') as "Database Name",
|
|
||||||
v5.TableName,
|
|
||||||
v5.rows,
|
|
||||||
CASE WHEN mt.TableName IS NULL THEN 'N' ELSE 'Y' END AS "Migration Target",
|
|
||||||
ISNULL(v5.Rows, 0) - ISNULL(v4.Rows, 0) AS "Row Count Change"
|
|
||||||
from #v4_stats as v4
|
|
||||||
full outer join #v5_stats as v5 ON v4.TableName = v5.TableName
|
|
||||||
left join @MigrationTarget mt on v5.TableName = mt.TableName
|
|
||||||
order by v5.TableName
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,304 +0,0 @@
|
||||||
/*********************************************************************************
|
|
||||||
# Copyright 2015 Observational Health Data Sciences and Informatics
|
|
||||||
#
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
********************************************************************************/
|
|
||||||
|
|
||||||
/*******************************************************************************
|
|
||||||
|
|
||||||
PURPOSE: Use this script is used to help perform Quality Assurance activities
|
|
||||||
after you convert your OMOP V4 common data model to CDM V5.
|
|
||||||
|
|
||||||
last revised: 01 July 2015
|
|
||||||
author: Anthony Sena
|
|
||||||
|
|
||||||
This script was authored against SQL Server and will require conversion to other
|
|
||||||
dialects. Please keep this in mind if you plan to use this against another RDBMS.
|
|
||||||
|
|
||||||
General Notes
|
|
||||||
---------------
|
|
||||||
|
|
||||||
The V4 to V5 conversion script utilizes the standard vocabularies to map data
|
|
||||||
from the V4 source tables to the V5 target tables. As a result, comparing the
|
|
||||||
rowcounts for the following tables is not adviseable since we are expecting
|
|
||||||
certain entries to move from their source table to a different target:
|
|
||||||
|
|
||||||
Condition_Occurrence
|
|
||||||
Drug_Exposure
|
|
||||||
Observation
|
|
||||||
Measurement
|
|
||||||
Procedure_Occurrence
|
|
||||||
|
|
||||||
This script will produce 2 tables:
|
|
||||||
|
|
||||||
Table 1: This will contain the source table name (i.e. Condition_Occurrence),
|
|
||||||
the target domain_id from the V5 vocabulary and the expected rowcount
|
|
||||||
from the conversion. When there is no target defined in the V5 vocabulary,
|
|
||||||
the source data is carried over to the same target table in V5 with
|
|
||||||
a concept_id of 0. For example:
|
|
||||||
|
|
||||||
TableName Domain RowCount
|
|
||||||
--------- ------ --------
|
|
||||||
Condition_Occurrence condition 464849
|
|
||||||
Condition_Occurrence measurement 8416
|
|
||||||
Condition_Occurrence observation 31522
|
|
||||||
Condition_Occurrence procedure 24298
|
|
||||||
|
|
||||||
Table 2: This will contain a summary of the V5 Target Domains and Rowcounts.
|
|
||||||
I found this helpful to tie out the expected rowcounts and what actually
|
|
||||||
happened during the conversion.
|
|
||||||
|
|
||||||
INSTRUCTIONS
|
|
||||||
------------
|
|
||||||
|
|
||||||
1. This script has placeholders for your CDM V4 and CDMV5 database/schema.
|
|
||||||
In order to make this file work in your environment, you
|
|
||||||
should plan to do a global "FIND AND REPLACE" on this file to fill in the
|
|
||||||
file with values that pertain to your environment. The following are the
|
|
||||||
tokens you should use when doing your "FIND AND REPLACE" operation:
|
|
||||||
|
|
||||||
a. [SOURCE_CDMV4]
|
|
||||||
b. [TARGET_CDMV5]
|
|
||||||
|
|
||||||
2. Run the resulting script on your target RDBDMS.
|
|
||||||
|
|
||||||
*********************************************************************************/
|
|
||||||
|
|
||||||
USE [TARGET_CDMV5]
|
|
||||||
GO
|
|
||||||
|
|
||||||
/*
|
|
||||||
* CONCEPT MAP
|
|
||||||
*/
|
|
||||||
IF OBJECT_ID('tempdb..#concept_map', 'U') IS NOT NULL
|
|
||||||
DROP TABLE #concept_map;
|
|
||||||
|
|
||||||
--standard concepts
|
|
||||||
SELECT concept_id AS source_concept_id
|
|
||||||
,concept_id AS target_concept_id
|
|
||||||
,domain_id
|
|
||||||
,NULL AS source_concept_mapping_occurrence
|
|
||||||
INTO #concept_map
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE standard_concept = 'S'
|
|
||||||
AND invalid_reason IS NULL
|
|
||||||
|
|
||||||
UNION
|
|
||||||
|
|
||||||
--concepts with 'map to' standard
|
|
||||||
SELECT DISTINCT c1.concept_id AS source_concept_id
|
|
||||||
,c2.concept_id AS target_concept_id
|
|
||||||
,c2.domain_id
|
|
||||||
,NULL
|
|
||||||
FROM (
|
|
||||||
SELECT concept_id
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE (
|
|
||||||
(
|
|
||||||
standard_concept <> 'S'
|
|
||||||
OR standard_concept IS NULL
|
|
||||||
)
|
|
||||||
OR invalid_reason IS NOT NULL
|
|
||||||
)
|
|
||||||
) c1
|
|
||||||
INNER JOIN dbo.concept_relationship cr1 ON c1.concept_id = cr1.concept_id_1
|
|
||||||
INNER JOIN dbo.concept c2 ON cr1.concept_id_2 = c2.concept_id
|
|
||||||
WHERE c2.standard_concept = 'S'
|
|
||||||
AND c2.invalid_reason IS NULL
|
|
||||||
AND cr1.relationship_id IN ('Maps to')
|
|
||||||
AND cr1.invalid_reason IS NULL
|
|
||||||
|
|
||||||
UNION
|
|
||||||
|
|
||||||
--concepts without 'map to' standard with another non 'is a' relation to standard
|
|
||||||
SELECT DISTINCT c1.concept_id AS source_concept_id
|
|
||||||
,c2.concept_id AS target_concept_id
|
|
||||||
,c2.domain_id
|
|
||||||
,NULL
|
|
||||||
FROM (
|
|
||||||
SELECT concept_id
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE (
|
|
||||||
(
|
|
||||||
standard_concept <> 'S'
|
|
||||||
OR standard_concept IS NULL
|
|
||||||
)
|
|
||||||
OR invalid_reason IS NOT NULL
|
|
||||||
)
|
|
||||||
AND concept_id NOT IN (
|
|
||||||
SELECT DISTINCT c1.concept_id
|
|
||||||
FROM (
|
|
||||||
SELECT concept_id
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE (
|
|
||||||
(
|
|
||||||
standard_concept <> 'S'
|
|
||||||
OR standard_concept IS NULL
|
|
||||||
)
|
|
||||||
OR invalid_reason IS NOT NULL
|
|
||||||
)
|
|
||||||
) c1
|
|
||||||
INNER JOIN dbo.concept_relationship cr1 ON c1.concept_id = cr1.concept_id_1
|
|
||||||
INNER JOIN dbo.concept c2 ON cr1.concept_id_2 = c2.concept_id
|
|
||||||
WHERE c2.standard_concept = 'S'
|
|
||||||
AND c2.invalid_reason IS NULL
|
|
||||||
AND cr1.relationship_id IN ('Maps to')
|
|
||||||
AND cr1.invalid_reason IS NULL
|
|
||||||
)
|
|
||||||
) c1
|
|
||||||
INNER JOIN dbo.concept_relationship cr1 ON c1.concept_id = cr1.concept_id_1
|
|
||||||
INNER JOIN dbo.concept c2 ON cr1.concept_id_2 = c2.concept_id
|
|
||||||
WHERE c2.standard_concept = 'S'
|
|
||||||
AND c2.invalid_reason IS NULL
|
|
||||||
AND cr1.relationship_id IN (
|
|
||||||
'RxNorm replaced by'
|
|
||||||
,'SNOMED replaced by'
|
|
||||||
,'UCUM replaced by'
|
|
||||||
,'Concept replaced by'
|
|
||||||
,'ICD9P replaced by'
|
|
||||||
,'LOINC replaced by'
|
|
||||||
,'Concept same_as to'
|
|
||||||
,'Concept was_a to'
|
|
||||||
,'Concept alt_to to'
|
|
||||||
)
|
|
||||||
AND cr1.invalid_reason IS NULL
|
|
||||||
|
|
||||||
UNION
|
|
||||||
|
|
||||||
--concepts without 'map to' standard with 'is a' relation to standard
|
|
||||||
SELECT DISTINCT c1.concept_id AS source_concept_id
|
|
||||||
,c2.concept_id AS target_concept_id
|
|
||||||
,c2.domain_id
|
|
||||||
,NULL
|
|
||||||
FROM (
|
|
||||||
SELECT concept_id
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE (
|
|
||||||
(
|
|
||||||
standard_concept <> 'S'
|
|
||||||
OR standard_concept IS NULL
|
|
||||||
)
|
|
||||||
OR invalid_reason IS NOT NULL
|
|
||||||
)
|
|
||||||
AND concept_id NOT IN (
|
|
||||||
SELECT DISTINCT c1.concept_id
|
|
||||||
FROM (
|
|
||||||
SELECT concept_id
|
|
||||||
FROM dbo.concept
|
|
||||||
WHERE (
|
|
||||||
(
|
|
||||||
standard_concept <> 'S'
|
|
||||||
OR standard_concept IS NULL
|
|
||||||
)
|
|
||||||
OR invalid_reason IS NOT NULL
|
|
||||||
)
|
|
||||||
) c1
|
|
||||||
INNER JOIN dbo.concept_relationship cr1 ON c1.concept_id = cr1.concept_id_1
|
|
||||||
INNER JOIN dbo.concept c2 ON cr1.concept_id_2 = c2.concept_id
|
|
||||||
WHERE c2.standard_concept = 'S'
|
|
||||||
AND c2.invalid_reason IS NULL
|
|
||||||
AND cr1.relationship_id IN (
|
|
||||||
'Maps to'
|
|
||||||
,'RxNorm replaced by'
|
|
||||||
,'SNOMED replaced by'
|
|
||||||
,'UCUM replaced by'
|
|
||||||
,'Concept replaced by'
|
|
||||||
,'ICD9P replaced by'
|
|
||||||
,'LOINC replaced by'
|
|
||||||
,'Concept same_as to'
|
|
||||||
,'Concept was_a to'
|
|
||||||
,'Concept alt_to to'
|
|
||||||
)
|
|
||||||
AND cr1.invalid_reason IS NULL
|
|
||||||
)
|
|
||||||
) c1
|
|
||||||
INNER JOIN dbo.concept_relationship cr1 ON c1.concept_id = cr1.concept_id_1
|
|
||||||
INNER JOIN dbo.concept c2 ON cr1.concept_id_2 = c2.concept_id
|
|
||||||
WHERE c2.standard_concept = 'S'
|
|
||||||
AND c2.invalid_reason IS NULL
|
|
||||||
AND cr1.relationship_id IN ('Is a')
|
|
||||||
AND cr1.invalid_reason IS NULL;
|
|
||||||
GO
|
|
||||||
|
|
||||||
-- Update the source_concept_mapping_occurrence column
|
|
||||||
-- to contain a count to indicate the number of target_concept_ids
|
|
||||||
-- map to that source_concept_id. This will be used elsewhere in
|
|
||||||
-- the script to ensure that we generate new primary keys
|
|
||||||
-- for the target tables when applicable
|
|
||||||
UPDATE #concept_map
|
|
||||||
SET #concept_map.source_concept_mapping_occurrence = A.[Rowcount]
|
|
||||||
FROM #concept_map
|
|
||||||
,(
|
|
||||||
SELECT source_concept_id
|
|
||||||
,domain_id
|
|
||||||
,count(*) AS "rowcount"
|
|
||||||
FROM #concept_map
|
|
||||||
GROUP BY source_concept_id
|
|
||||||
,domain_id
|
|
||||||
) AS A
|
|
||||||
WHERE #concept_map.source_concept_id = A.source_concept_id
|
|
||||||
AND #concept_map.domain_id = A.domain_id
|
|
||||||
|
|
||||||
IF OBJECT_ID('tempdb..#concept_map_distinct', 'U') IS NOT NULL
|
|
||||||
DROP TABLE #concept_map_distinct;
|
|
||||||
|
|
||||||
SELECT DISTINCT source_concept_id
|
|
||||||
,domain_id
|
|
||||||
,COUNT(*) AS "rowcount"
|
|
||||||
INTO #concept_map_distinct
|
|
||||||
FROM #concept_map
|
|
||||||
GROUP BY source_concept_id
|
|
||||||
,domain_id
|
|
||||||
|
|
||||||
|
|
||||||
/*
|
|
||||||
* V4 - Condition_Occurrence summary and mapping to #concept_map
|
|
||||||
*/
|
|
||||||
IF OBJECT_ID('tempdb..#classification_map', 'U') IS NOT NULL
|
|
||||||
DROP TABLE #classification_map;
|
|
||||||
|
|
||||||
SELECT *
|
|
||||||
INTO #classification_map
|
|
||||||
FROM
|
|
||||||
(
|
|
||||||
SELECT 'Condition_Occurrence' as TableName, ISNULL(LOWER(cm.domain_id), 'condition') AS "Domain", COUNT(*) AS "RowCount"
|
|
||||||
FROM [SOURCE_CDMV4].[dbo].[Condition_Occurrence] as CO
|
|
||||||
LEFT JOIN #concept_map as CM ON co.condition_concept_id = cm.source_concept_id
|
|
||||||
GROUP BY ISNULL(LOWER(cm.domain_id), 'condition')
|
|
||||||
UNION
|
|
||||||
SELECT 'Drug_Exposure' as TableName, ISNULL(LOWER(cm.domain_id), 'drug') AS "Domain", COUNT(*) AS "RowCount"
|
|
||||||
FROM [SOURCE_CDMV4].[dbo].[Drug_Exposure] as de
|
|
||||||
LEFT JOIN #concept_map as CM ON de.drug_concept_id = cm.source_concept_id
|
|
||||||
GROUP BY ISNULL(LOWER(cm.domain_id), 'drug')
|
|
||||||
UNION
|
|
||||||
SELECT 'Observation' as TableName, ISNULL(LOWER(cm.domain_id), 'observation') AS "Domain", COUNT(*) AS "RowCount"
|
|
||||||
FROM [SOURCE_CDMV4].[dbo].[Observation] as o
|
|
||||||
LEFT JOIN #concept_map as CM ON o.observation_concept_id = cm.source_concept_id
|
|
||||||
GROUP BY ISNULL(LOWER(cm.domain_id), 'observation')
|
|
||||||
UNION
|
|
||||||
SELECT 'Procedure_Occurrence' as TableName, ISNULL(LOWER(cm.domain_id), 'procedure') AS "Domain", COUNT(*) AS "RowCount"
|
|
||||||
FROM [SOURCE_CDMV4].[dbo].[Procedure_Occurrence] as po
|
|
||||||
LEFT JOIN #concept_map as CM ON po.PROCEDURE_CONCEPT_ID = cm.source_concept_id
|
|
||||||
GROUP BY ISNULL(LOWER(cm.domain_id), 'procedure')
|
|
||||||
) AS A
|
|
||||||
ORDER by A.[TableName], A.[Domain]
|
|
||||||
|
|
||||||
select *
|
|
||||||
from #classification_map
|
|
||||||
order by [TableName], [Domain]
|
|
||||||
|
|
||||||
select domain, SUM([RowCount])
|
|
||||||
from #classification_map
|
|
||||||
group by domain
|
|
||||||
order by domain
|
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Binary file not shown.
|
@ -3,43 +3,56 @@ Conversion from CDM v4 to CDM v5
|
||||||
|
|
||||||
The scripts in this directory will aid you in moving your data from the Common Data Model (CDM) version 4 to version 5.
|
The scripts in this directory will aid you in moving your data from the Common Data Model (CDM) version 4 to version 5.
|
||||||
|
|
||||||
|
Overview
|
||||||
|
==============================================================
|
||||||
|
|
||||||
|
The resources in this folder provide you with a means for converting your CDM V4 database to CDM V5. The goal of these scripts is to provide a path for converting your data to the CDM V5 to take advantage of the other tools that are being built to support research on CDM V5. These scripts are **NOT** designed to replace a proper ETL from your source data to CDM V5.
|
||||||
|
|
||||||
|
One of the most important aspects to this conversion script is the use of the **[Standarized Vocabularies](http://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:introduction "Standarized Vocabularies")** to map from tables in the V4 database to their cooresponding V5 table using the vocabulary **[domains](http://www.ohdsi.org/web/wiki/doku.php?id=documentation:vocabulary:domains "domains")**. At the beginning of the conversion script, we create a #concept\_map temporary table which holds a mapping from source_concept_id's to standard target_concept_ids for each of the domains. This table is then used throughout the remainder of the script to map rows from each of the source V4 tables (i.e. condition\_occurrence) to the proper table in the V5 data model. As a result, the number of rows in the V4 condition\_occurrence will not be the same as in V5 since some rows may be converted to a different table based on the standard concept mapping.
|
||||||
|
|
||||||
Assumptions
|
Assumptions
|
||||||
==============================================================
|
==============================================================
|
||||||
|
|
||||||
We have created a directory per RDBMS that contains the conversion script for that database platform. All of the script assume the following:
|
We have created a directory per Relational Database Management System (RDBMS) that contains the conversion script for that database platform. All of the scripts have the same assumptions:
|
||||||
|
|
||||||
1. Your source CDM V4 database is on the same sever as your target CDM v5 database.
|
1. Your source CDM V4 database is on the same sever as your target CDM v5 database.
|
||||||
2. You have read rights to the CDM V4 database and database owner privileges on the target V5 database as this script will create an "ETL_WARNINGS" table in the process.
|
2. You have read rights to the CDM V4 database and database owner privileges on the target V5 database as this script will create an "ETL_WARNINGS" table in the process.
|
||||||
|
3. You have enough storage on your database server to perform the conversion.
|
||||||
|
|
||||||
Usage
|
Usage
|
||||||
=====
|
=====
|
||||||
|
|
||||||
1. The conversion script will hold a number of placeholders for your CDM V4 and CDMV5 database/schema. In order to make this file work in your environment, you should plan to do a global "FIND AND REPLACE" on the conversion script to fill in the file with values that pertain to your environment. The following are the tokens you should use when doing your "FIND AND REPLACE" operation:
|
1. **Create your V5 Target Database:** Create a CDM V5 database on the same server as your CDM V4 database by using the **[Common Data Model Scripts](https://github.com/OHDSI/CommonDataModel "Common Data Model Scripts")** for your RDBMS. **NOTE: Please review the data types that exist on your V4 database and ensure you carry forward any data type changes from V4 to V5. For example, if you converted columns from an INT to a BIGINT to accommodate tables with > 2.1 Billion Rows, you will need to make the corresponding changes in your V5 Database and potentially to this conversion script**
|
||||||
|
|
||||||
* [SOURCE_CDMV4]
|
2. **Download the conversion script:** The **[CDM V4 to V5 Conversion](https://github.com/OHDSI/CommonDataModel/tree/master/Version4%20To%20Version5%20Conversion "CDM V4 to V5 Conversion Directory")** folder has subfolders with scripts that will work on each RDBMS. In order to make this file work in your environment, you will need to perform a global "FIND AND REPLACE" on the conversion script to fill in the file with values that pertain to your environment. The following are the tokens you should use when doing your "FIND AND REPLACE" operation:
|
||||||
* [SOURCE_CDMV4].[SCHEMA]
|
|
||||||
* [TARGET_CDMV5]
|
|
||||||
* [TARGET_CDMV5].[SCHEMA]
|
|
||||||
|
|
||||||
2. Run the resulting script on your target RDBDMS.
|
* [SOURCE_CDMV4] - Your V4 database name
|
||||||
|
* [SOURCE_CDMV4].[SCHEMA] - Your V4 database name + schema
|
||||||
|
* [TARGET_CDMV5] - Your V5 database name
|
||||||
|
* [TARGET_CDMV5].[SCHEMA] - Your V5 database name + schema
|
||||||
|
|
||||||
|
3. Run the resulting script on your target RDBDMS. ** **NOTE** ** If you are running the Oracle script via Sql Developer or similar, you may need to alter the script to include the appropriate "/" symbols to mark the end of the anonymous code blocks. This has been done in the Oracle script that has been provided in this repository.
|
||||||
|
4. At the end of the conversion process, several tables will be produced that will help you to understand how your data has changed as a result of the conversion process. This is described in the Quality Assurance section below.
|
||||||
|
|
||||||
** **NOTE** ** If you are running the Oracle script via Sql Developer or similar, you may need to alter the script to include the appropriate "/" symbols to mark the end of the anonymous code blocks.
|
|
||||||
|
|
||||||
Quality Assurance
|
Quality Assurance
|
||||||
===================
|
===================
|
||||||
|
|
||||||
We have included 2 scripts in the root of this directory that were used while doing quality assurance on the conversion scripts:
|
At the end of the conversion script, there are 3 queries which will provide information on the conversion process. For reference, this section of the conversion script has a header comment:
|
||||||
|
|
||||||
* Conversion-QA - Sql Server.sql
|
/**** QUALITY ASSURANCE OUTPUT ****/
|
||||||
* Conversion-QA-Part-2 - Sql Server.sql
|
|
||||||
|
|
||||||
As noted in the file names, these scripts were written specifically for Sql Server but should be a fairly easy port to your RDBMS target. The goals of these scripts were to measure the following:
|
The first query provides a means for comparing the table row counts between the V4 and V5 databases. As mentioned in the overview section above, table row counts will differ between V4 and V5 based on the way that the standard vocabulary maps the data. The next set of queries will help to tie out the row count changes in these tables.
|
||||||
|
|
||||||
* **Conversion-QA - Sql Server.sql**: provides row counts from each table in the V4 and V5 databases. It also includes a column called "Migration Target" which notes if that table was a target of the migration. The full list will help you to see if there were any tables in V4 that were either missed or are not targeted as part of the migration. Of particular note: **Cohort** and **Source\_To\_Concept\_Map** are not targeted for this migration.
|
The second query shows the source V4 table (i.e condition\_occurrence) and how the row counts maps to the V5 domain. This table is useful to understand how the data from the V4 source was distributed into the V5 tables. As a note, 1 record in the V4 table could map to multiple records in V5 as some concepts will map to multiple standard domains.
|
||||||
|
|
||||||
* **Conversion-QA-Part-2 - Sql Server.sql**: provides 2 summary tables to help verify the output from the first script. The first summary table provides row counts for specific V4 tables and how the rows in the tables map to the V5 domains. This summary is useful to understand why the row counts for these tables will vary between the V4 and V5. The second summary table provides a row count sum by domain which should then match the V5 row counts for the corresponding V5 tables. The tables that are summarized in this script are: condition\_occurrence, drug\_exposure, observation, procedure\_occurrence.
|
The third query uses the information from the second query and provides a summary for each V5 domain. This is useful for tying out the rowcounts we'd expect from the script with the actual results observed in the first query.
|
||||||
|
|
||||||
Contributions
|
We have included a spreadsheet called "QA-Results.xlsx" which provides an example of how to utilize these 3 result queries to understand the results of the conversion process. The results of the first query go into the "Rowcounts" worksheet. The results of the second and third queries go into the "Classification Map Results" worksheet. If the conversion process worked as expected, all of the "Difference" columns should equal 0 in the "Classification Map Results" worksheet.
|
||||||
|
|
||||||
|
Getting Involved
|
||||||
==============================================================
|
==============================================================
|
||||||
|
Each script found in the RDBMS directory was generated from the OHDSI-SQL file: *OMOP CDMv4 to CDMv5 - OHDSI-SQL.sql* found in the root of this directory. If you would like to contribute to this script, we'd suggest you modify this script and use **[SqlRender](https://github.com/OHDSI/SqlRender "SqlRender")** to re-generate the specific RDBMS scripts. We have also supplied a basic R script in this directory to help re-generate the scripts using SqlRender.
|
||||||
|
|
||||||
Each script found in the RDBMS directory was generated from the template SQL file: *OMOP CDMv4 to CDMv5 - templateSQL.sql* found in the root of this directory. If you would like to contribute to this script, we'd suggest you modify this script and use **[SqlRender](https://github.com/OHDSI/SqlRender "SqlRender")** to re-generate the specific RDBMS scripts.
|
Developer questions/comments/feedback: OHDSI Forum
|
||||||
|
We use the GitHub issue tracker for all bugs/issues/enhancements
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue