878f108b44 | ||
---|---|---|
.. | ||
DataImport | ||
VocabImport | ||
OMOP CDM impala ddl.txt | ||
README.md |
README.md
Common-Data-Model / Impala
This folder contains the SQL scripts for Impala.
In order to create your instantiation of the Common Data Model, we recommend following these steps:
- Create an empty schema.
impala-shell -q 'CREATE DATABASE omop_cdm'
- Execute the script
OMOP CDM impala ddl.txt
(you will need to convert it to a sql file first) to create the tables and fields.
impala-shell -d omop_cdm -f OMOP_CDM_impala_ddl.sql
- Load your data into the schema.
a. Load the vocabulary tables.
First, download the data from http://www.ohdsi.org/web/athena/ and unzip into a cdmv5vocab directory, then run
hadoop fs -put cdmv5vocab cdmv5vocab
hadoop fs -chmod +w cdmv5vocab
impala-shell -d omop_cdm -f VocabImport/OMOP_CDM_vocabulary_load_Impala.sql --var=OMOP_VOCAB_PATH=/user/$USER/cdmv5vocab
b. Load the patient data.
For example, download the 1000 person sample of simulated CMS SynPUF patient data from http://www.ltscomputingllc.com/downloads/ and unzip into a synpuf directory, then run
hadoop fs -put synpuf synpuf
hadoop fs -chmod +w synpuf
impala-shell -d omop_cdm -f DataImport/OMOP_CDM_synpuf_load_Impala.sql --var=OMOP_SYNPUF_PATH=/user/$USER/synpuf
- Convert to Parquet format.
impala-shell -q 'CREATE DATABASE omop_cdm_parquet'
impala-shell -f OMOP_Parquet.sql
- Run simple queries to sanity check.
impala-shell -d omop_cdm_parquet -q 'SELECT COUNT(1) FROM concept'
impala-shell -d omop_cdm_parquet -q 'SELECT COUNT(1) FROM person'