OMOP/R/WikiParser.R

# Copyright 2017 Observational Health Data Sciences and Informatics
#
# This file is part of DDLGeneratr
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


#' Parse Wiki files
#'
#' @description
#' Parses all .md files in the specified location (or any subfolders), extracting definitions
#' of the Common Data Model.
#'
#' @param mdFilesLocation Path to the root folder of the Wiki repository.
#' @param output_file     Path to where the output CSV file should be written.
#' @importFrom utils write.csv
#' @export
parseWiki <- function(mdFilesLocation, output_file) {
  # mdFilesLocation <- "../CommonDataModel.wiki"
  files <- list.files(mdFilesLocation, pattern = ".*\\.md", recursive = TRUE, full.names = TRUE)
  file <- files[18]
  parseTableRow <- function(row) {
    cells <- stringr::str_trim(stringr::str_split(row, "\\|")[[1]])
    if (substr(row, 1, 1) == "|") {
      cells <- cells[2:5]
    }
    return(data.frame(field = tolower(cells[1]),
                      required = cells[2],
                      type = toupper(cells[3]),
                      description = cells[4]))
  }

  parseMdFile <- function(file) {
    text <- readChar(file, file.info(file)$size)
    lines <- stringr::str_split(text, "\n")[[1]]
    lines <- stringr::str_trim(lines)
    tableStart <- grep("\\s*field\\s*\\|\\s*required\\s*\\|\\s*type\\s*\\|\\s*description\\s*", tolower(lines))
    if (length(tableStart) > 1)
      stop("More than one table definition found in ", file)

    if (length(tableStart) == 1) {
      tableName <- basename(file)
      tableName <- tolower(stringr::str_sub(tableName, 1, -4))
      writeLines(paste("Parsing table", tableName))
      tableStart <- tableStart + 2
      tableEnd <- c(which(lines == ""), length(lines) + 1)
      tableEnd <- min(tableEnd[tableEnd > tableStart]) - 1
      tableDefinition <- lapply(lines[tableStart:tableEnd], parseTableRow)
      tableDefinition <- do.call(rbind, tableDefinition)
      tableDefinition$table <- tableName
      return(tableDefinition)
    } else {
      return(NULL)
    }
  }
  tableDefinitions <- lapply(files, parseMdFile)
  tableDefinitions <- do.call(rbind, tableDefinitions)
  write.csv(tableDefinitions, output_file, row.names = FALSE)
}
Moved R project from CdmDdlBase This commit pulls the R project over from the CdmDdlBase repository. This will be the jumping-off code for the new OMOP CDM repo. 2021-06-09 00:19:01 +00:00			`# Copyright 2017 Observational Health Data Sciences and Informatics`
			`#`
			`# This file is part of DDLGeneratr`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`


			`#' Parse Wiki files`
			`#'`
			`#' @description`
			`#' Parses all .md files in the specified location (or any subfolders), extracting definitions`
			`#' of the Common Data Model.`
			`#'`
			`#' @param mdFilesLocation Path to the root folder of the Wiki repository.`
			`#' @param output_file Path to where the output CSV file should be written.`
Add unit tests for all databases and DDLs (#431) * Add github actions workflow to build package and run tests. * update Description file * rename .Rproj file. * Consolidate 'create' functions into one file. * Add tests for create functions. * update description * removed spaces in file and folder names. Regenerated ddl output. Tried to fix Field_Level.csv file. * consolidate write functions into one file. Add execute function. * update docs * add tests for write and execute functions * update documentation * Add windows and linux runners in github actions. * update github actions * download drivers before running tests * fix small error in setup test file. * debug github actions * debug github actions * debug github actions * debug github actions * fix tiny bug * comment out execute ddl test * fix bug in test * Add execute test back in * revert accidental change in description * add print statement for debugging schema error on github actions. * Fix schema environment variable name * Add comment to github actions workflow file. * remove placeholder text in function documentation. * Rename createdDdl.R to createDdl.R * Hack-a-thon updates Closes #81, #387, #239, #412, #391, #330, #408, #365, #306, #264 * Changed bigint to integer for consistency * Updated DDLs * Add tests for redshift. Clean up test setup file. * Foreign key fixes * Add imports and update docs. * Fix bug in setup test script. * update setup file * Add tests for oracle and sql server. Move setup.R file. * fix bug in setup * debug tests on github * debug github actions * debug actions. * debug actions * debug actions. * Add missing secrets to yaml!! * debug actions * test connection on all platforms * add ddl execution * add windows and linux runners Co-authored-by: Adam Black <adam.black@odysseusinc.com> Co-authored-by: Clair Blacketer <mblacke@its.jnj.com> 2021-08-20 11:59:29 +00:00			`#' @importFrom utils write.csv`
Moved R project from CdmDdlBase This commit pulls the R project over from the CdmDdlBase repository. This will be the jumping-off code for the new OMOP CDM repo. 2021-06-09 00:19:01 +00:00			`#' @export`
			`parseWiki <- function(mdFilesLocation, output_file) {`
			`# mdFilesLocation <- "../CommonDataModel.wiki"`
			`files <- list.files(mdFilesLocation, pattern = ".*\\.md", recursive = TRUE, full.names = TRUE)`
			`file <- files[18]`
			`parseTableRow <- function(row) {`
			`cells <- stringr::str_trim(stringr::str_split(row, "\\\|")[[1]])`
			`if (substr(row, 1, 1) == "\|") {`
			`cells <- cells[2:5]`
			`}`
			`return(data.frame(field = tolower(cells[1]),`
			`required = cells[2],`
			`type = toupper(cells[3]),`
			`description = cells[4]))`
			`}`

			`parseMdFile <- function(file) {`
			`text <- readChar(file, file.info(file)$size)`
			`lines <- stringr::str_split(text, "\n")[[1]]`
			`lines <- stringr::str_trim(lines)`
			`tableStart <- grep("\\sfield\\s\\\|\\srequired\\s\\\|\\stype\\s\\\|\\sdescription\\s", tolower(lines))`
			`if (length(tableStart) > 1)`
			`stop("More than one table definition found in ", file)`

			`if (length(tableStart) == 1) {`
			`tableName <- basename(file)`
			`tableName <- tolower(stringr::str_sub(tableName, 1, -4))`
			`writeLines(paste("Parsing table", tableName))`
			`tableStart <- tableStart + 2`
			`tableEnd <- c(which(lines == ""), length(lines) + 1)`
			`tableEnd <- min(tableEnd[tableEnd > tableStart]) - 1`
			`tableDefinition <- lapply(lines[tableStart:tableEnd], parseTableRow)`
			`tableDefinition <- do.call(rbind, tableDefinition)`
			`tableDefinition$table <- tableName`
			`return(tableDefinition)`
			`} else {`
			`return(NULL)`
			`}`
			`}`
			`tableDefinitions <- lapply(files, parseMdFile)`
			`tableDefinitions <- do.call(rbind, tableDefinitions)`
			`write.csv(tableDefinitions, output_file, row.names = FALSE)`
			`}`