OMOP/Impala
clairblacketer 29b0899b74 Closes CDM proposal #73 2017-07-14 10:38:48 -04:00
..
DataImport Add scripts for creating a schema and importing data into Impala. 2016-10-12 11:36:40 +01:00
VocabImport Add scripts for creating a schema and importing data into Impala. 2016-10-12 11:36:40 +01:00
OMOP_CDM_ddl_Impala.sql Closes CDM proposal #73 2017-07-14 10:38:48 -04:00
OMOP_Parquet.sql Closes cdm proposal #71 and updates headers for CDM v5.2 2017-07-14 10:28:17 -04:00
OMOP_Parquet_v5.1.sql Change empty 'invalid_reason' field to null for compatibility with Atlas cohort generation. 2017-07-06 09:48:01 +01:00
OMOP_Parquet_v5.2.sql Closes CDM proposal #73 2017-07-14 10:38:48 -04:00
README.md Add a step to transform data into Parquet, and use TIMESTAMP type for dates. 2017-04-18 11:05:47 +01:00

README.md

Common-Data-Model / Impala

This folder contains the SQL scripts for Impala.

In order to create your instantiation of the Common Data Model, we recommend following these steps:

  1. Create an empty schema.
impala-shell -q 'CREATE DATABASE omop_cdm'
  1. Execute the script OMOP_CDM_ddl_Impala.sql to create the tables and fields.
impala-shell -d omop_cdm -f OMOP_CDM_ddl_Impala.sql
  1. Load your data into the schema.

a. Load the vocabulary tables.

First, download the data from http://www.ohdsi.org/web/athena/ and unzip into a cdmv5vocab directory, then run

hadoop fs -put cdmv5vocab cdmv5vocab
hadoop fs -chmod +w cdmv5vocab
impala-shell -d omop_cdm -f VocabImport/OMOP_CDM_vocabulary_load_Impala.sql --var=OMOP_VOCAB_PATH=/user/$USER/cdmv5vocab

b. Load the patient data.

For example, download the 1000 person sample of simulated CMS SynPUF patient data from http://www.ltscomputingllc.com/downloads/ and unzip into a synpuf directory, then run

hadoop fs -put synpuf synpuf
hadoop fs -chmod +w synpuf
impala-shell -d omop_cdm -f DataImport/OMOP_CDM_synpuf_load_Impala.sql --var=OMOP_SYNPUF_PATH=/user/$USER/synpuf
  1. Convert to Parquet format.
impala-shell -q 'CREATE DATABASE omop_cdm_parquet'
impala-shell -f OMOP_Parquet.sql
  1. Run simple queries to sanity check.
impala-shell -d omop_cdm_parquet -q 'SELECT COUNT(1) FROM concept'
impala-shell -d omop_cdm_parquet -q 'SELECT COUNT(1) FROM person'