diff --git a/README.md b/README.md
index f84a810..cfb7a75 100644
--- a/README.md
+++ b/README.md
@@ -12,9 +12,8 @@ We wrote this frame to be used in both command line or as a library within in yo
| -------- | --- |
|X12 claims/remits| parsing of {x12} claims/remittances into JSON format with human readible attributes|
|Multi Processing| capable of processing multiple files simultaneously to speed up processing|
-|Analytics support| descriptive statistical analytics : distribution, various counts|
|Process Recovery| capable of recovering interrupted runs|
-|Export to RDBMS| exports data to relational format (NoSQL -> SQL) 7 supported databases
* PostgreSQL,
* Redshift,
* Neteeza,
* Mysql,
* Mariadb,
* bigquery,
* sqlite3 |
+|Export to RDBMS| exports data to relational format (NoSQL -> SQL) 8 supported industry standard databases
* PostgreSQL,
* Redshift,
* Neteeza,
* Mysql,
* Mariadb,
* bigquery,
* sqlite3
*databricks |
|**Issues and Bug reports**| info@the-phi.com
@@ -31,96 +30,26 @@ For advanced features visit [Healthcare/IO::Parser](https://healthcareio.the-phi
**Installation command**
- pip install --upgrade git+https://hiplab.mc.vanderbilt.edu/git/lab/parse-edi.git
+ pip install --upgrade git+https://hiplab.mc.vanderbilt.edu/git/hiplab/parser.git
## Usage
-Healthcare/IO is primarily intended to be used as a command line parser (for now). It is fully written in python 3+
+Healthcare/IO is primarily intended to be used as a command line parser (for now). However it can be used as a library that you integrate into custom code. It is fully written in python 3+ under MIT License
-**CLI :**
+**Learning More**
-1. signup to get parsing configuration
-
- The parser is driven by a configuration file that specifies fields to parse and how to parse them. You need by signing up, to get a copy of the configuration file.
- #
- # Use sqlite as data-store
- healthcare-io.py --signup [--store ]
-
-2. check version
-
- Occasionally the attributes in the configuration file may change, This function will determine if there is a new version available.
-
- healthcare-io.py --check-update
-
-3. parsing data stored in a folder
-
- The parser will recursively traverse a directory with claims and or remittances
-
- healthcare-io.py --parse --folder [--batch ] [--resume]
-
- with :
- --parse tells the engine what to parse claims or remits
- --folder location of the claims|remits
- --batch number of processes to spawn to parse the files
- --resume tells the parser to resume parsing
- if all files weren't processed or new files were added into the folder
-
-4. export data to a relational data-store
-
- The parser will export data into other data-stores as a relational tables allowing users to construct views to support a variety of studies.
-
- healthcare-io.py --export <835|837> --export-config
-
- with:
- --export-config configuration to support data-store
-
- **example**
- 1. Exporting to PostgreSQL
-
- {"provider":"postgresql","database":"healthcareio","schema":"foo"}
-
- **NOTE**
-
- The output generates a set of tables that are the result of transforming unstructured data to relational structure. The tables can be bound with the attribute **_id**
-
- The configuration file needed to implement export is modelled after the following template:
-
-
- {
- "provider":"",
- "database":"",["host":"server-name"],["port":5432],
- ["user":"me"],["password":"!@#z4qm"],["schema":"target-schema"]
-
- }
-
- **parameters:**
-
- provider postgresql,redshift,mysql or mariadb (supported providers)
- database name of the database
-
- **optional:**
-
- schema name of the target schema. If not provided we will assume the default
- host host of the database. If not provided assuming localhost
- port port value of the database if not provided the default will be used
- user database user name. If not provided we assume security settings to trust
- password password of database user. If not set we assume security settings to trust
+- More can be found at https://healthcareio.the-phi.com/
+- The source code & example code are at https://hiplab.mc.vanderbilt.edu/git/hiplab/parser
**Known Limitations**
1. By default it does NOT come with all {X12} Segments.
-2. Does not support an easy way to rename attributes it parses
+2. Renaming attributes requires writing plugins
3. Upgrade configuration may require dropping tables
-4. For now can only read {x12} from disk
+4. For now can only read {x12} from disk (or s3 bucket)
There is support for additional features and attributes available at [Healthcare/IO::Parser](https://healthcareio.the-phi.com/parser).
-**In development**
-
-1. Wizard/UI to enable attribute renaming
-2. Dashboard for quick overview
-3. Reading {x12} from s3 and other cloud buckets
-4. Docker Image
## Credits
diff --git a/healthcareio/Dockerfile b/healthcareio/Dockerfile
index 23464ef..a4c265f 100644
--- a/healthcareio/Dockerfile
+++ b/healthcareio/Dockerfile
@@ -10,11 +10,11 @@ RUN ["apt-get","-y","install","apt-utils"]
RUN ["apt","update","--fix-missing"]
RUN ["apt-get","upgrade","-y"]
-RUN ["apt-get","install","-y","mongo","sqlite3","sqlite3-pcre","libsqlite3-dev","python3-dev","python3","python3-pip","git","python3-virtualenv","wget"]
+RUN ["apt-get","install","-y","mongo","sqlite3","sqlite3-pcre","libsqlite3-dev","python3-dev","python3","python3-pip","git","wget"]
#
#
RUN ["pip3","install","--upgrade","pip"]
-# RUN ["pip3","install","git+https://healthcare.the-phi.com/git/code/parser.git","botocore"]
+RUN ["pip3","install","healthcareio@git+https://healthcare.the-phi.com/git/code/parser.git"]
USER health-user
#
# This volume is where the data will be loaded from (otherwise it is assumed the user will have it in the container somehow)
@@ -24,6 +24,7 @@ VOLUME ["/data"]
# This is the port from which some degree of monitoring can/will happen
EXPOSE 80
# wget https://healthcareio.the-phi.com/git/code/parser.git/bootup.sh
+RUN ["wget","https://github.com/sosedoff/pgweb/releases/download/v0.14.2/pgweb_linux_amd64.zip"]
COPY bootup.sh bootup.sh
ENTRYPOINT ["bash","-C"]
CMD ["bootup.sh"]
diff --git a/healthcareio/__init__.py b/healthcareio/__init__.py
index 0c358ce..7be2258 100644
--- a/healthcareio/__init__.py
+++ b/healthcareio/__init__.py
@@ -24,3 +24,4 @@ import healthcareio.params as params
from healthcareio import logger
# from healthcareio import server
+import meta
\ No newline at end of file
diff --git a/healthcareio/export.json b/healthcareio/export.json
new file mode 100644
index 0000000..4446b24
--- /dev/null
+++ b/healthcareio/export.json
@@ -0,0 +1,3 @@
+{
+ "provider":"postgresql","database":"healthcareio","schema":"public","context":"write"
+}
diff --git a/healthcareio/healthcare-io.py.bak b/healthcareio/healthcare-io.py.bak
new file mode 100644
index 0000000..54d718e
--- /dev/null
+++ b/healthcareio/healthcare-io.py.bak
@@ -0,0 +1,397 @@
+#!/usr/bin/env python3
+"""
+(c) 2019 Claims Toolkit,
+Health Information Privacy Lab, Vanderbilt University Medical Center
+
+Steve L. Nyemba
+Khanhly Nguyen
+
+
+This code is intended to process and parse healthcare x12 837 (claims) and x12 835 (remittances) into human readable JSON format.
+The claims/outpout can be forwarded to a NoSQL Data store like couchdb and mongodb
+Usage :
+ Commandline :
+ python edi-parser --scope --config --folder --store <[mongo|disk|couch]> --
+
+ with :
+ --scope
+ --config path of the x12 to be parsed i.e it could be 835, or 837
+ --folder location of the files (they must be decompressed)
+ --store data store could be disk, mongodb, couchdb
+ --db|path name of the folder to store the output or the database name
+
+ Embedded in Code :
+
+ import edi.parser
+ import json
+
+ file = '/data/claim_1.x12'
+ conf = json.loads(open('config/837.json').read())
+ edi.parser.get_content(filename,conf)
+"""
+from healthcareio.params import SYS_ARGS
+from transport import factory
+import requests
+from healthcareio import analytics
+from healthcareio import server
+
+
+from healthcareio.parser import get_content
+import os
+import json
+import sys
+import numpy as np
+from multiprocessing import Process
+import time
+from healthcareio import x12
+from healthcareio.export import export
+import smart
+import transport
+from healthcareio.server import proxy
+import pandas as pd
+
+PATH = os.sep.join([os.environ['HOME'],'.healthcareio'])
+OUTPUT_FOLDER = os.sep.join([os.environ['HOME'],'healthcare-io'])
+INFO = None
+URL = "https://healthcareio.the-phi.com"
+if not os.path.exists(PATH) :
+ os.mkdir(PATH)
+import platform
+import sqlite3 as lite
+# PATH = os.sep.join([os.environ['HOME'],'.edi-parser'])
+CONFIG_FILE = os.sep.join([PATH,'config.json']) if 'config' not in SYS_ARGS else SYS_ARGS['config']
+HELP_MESSAGE = """
+ cli:
+ #
+ # Signup, allows parsing configuration to be downloaded
+ #
+
+ # Support for SQLite3
+ healthcare-io.py --signup steve@the-phi.com --store sqlite
+
+ #or support for mongodb
+ healthcare-io.py --signup steve@the-phi.com --store mongo
+
+
+ healthcare-io.py --<[signup|init]> --store [--batch ]
+ healthcare-io.py --parse --folder [--batch ] [--resume]
+ healthcare-io.py --check-update
+ healthcare-io.py --export <835|837> --config
+ action :
+ --signup|init signup user and get configuration file
+ --parse starts parsing
+ --check-update checks for updates
+ --export export data of a 835 or 837 into another database
+ parameters :
+ --<[signup|init]> signup or get a configuration file from a parsing server
+ --folder location of the files (the program will recursively traverse it)
+ --store data store mongo or sqlite or mongodb
+ --resume will attempt to resume if there was an interruption
+ """
+def signup (**args) :
+ """
+ :email user's email address
+ :url url of the provider to signup
+ """
+
+ email = args['email']
+ url = args['url'] if 'url' in args else URL
+ folders = [PATH,OUTPUT_FOLDER]
+ for path in folders :
+ if not os.path.exists(path) :
+ os.mkdir(path)
+
+ #
+ #
+ store = args['store'] if 'store' in args else 'sqlite'
+ headers = {"email":email,"client":platform.node(),"store":store,"db":args['db']}
+ http = requests.session()
+ r = http.post(url,headers=headers)
+
+ #
+ # store = {"type":"disk.DiskWriter","args":{"path":OUTPUT_FOLDER}}
+ # if 'store' in args :
+ # store = args['store']
+ # filename = (os.sep.join([PATH,'config.json']))
+ filename = CONFIG_FILE
+ info = r.json() #{"parser":r.json(),"store":store}
+ info = dict({"owner":email},**info)
+ info['store']['args']['path'] =os.sep.join([OUTPUT_FOLDER,'healthcare-io.db3']) #-- sql
+ info['out-folder'] = OUTPUT_FOLDER
+
+ file = open( filename,'w')
+ file.write( json.dumps(info))
+ file.close()
+ _m = """
+ Thank you for signingup!!
+ Your configuration file is store in :path,
+ - More information visit https://healthcareio.the-phi.com/parser
+ - Access the source https://healthcareio.the-phi.com/git/code/parser
+
+ """.replace(":path",CONFIG_FILE)
+ print (_m)
+ #
+ # Create the sqlite3 database to
+
+
+def log(**args):
+ """
+ This function will perform a log of anything provided to it
+ """
+ pass
+def init():
+ """
+ read all the configuration from disk.
+ Requirements for configuration file :
+ {out-folder,store,837,835 }
+ """
+ # filename = os.sep.join([PATH,'config.json'])
+ filename = CONFIG_FILE
+ info = None
+ if os.path.exists(filename):
+ #
+ # Loading the configuration file (JSON format)
+ file = open(filename)
+ info = json.loads(file.read())
+
+
+ if 'output-folder' not in info and not os.path.exists(OUTPUT_FOLDER) :
+ os.mkdir(OUTPUT_FOLDER)
+ elif 'output-folder' in info and not os.path.exists(info['out-folder']) :
+ os.mkdir(info['out-folder'])
+ # if 'type' in info['store'] :
+ lwriter = None
+ is_sqlite = False
+ if'type' in info['store'] and info['store']['type'] == 'disk.SQLiteWriter' and not os.path.exists(info['store']['args']['path']) :
+ lwriter = transport.factory.instance(**info['store'])
+ is_sqlite = True
+ elif 'provider' in info['store'] and info['store']['provider'] == 'sqlite' :
+ lwriter = transport.instance(**info['store']) ;
+ is_sqlite = True
+ if lwriter and is_sqlite:
+ for key in info['schema'] :
+ if key != 'logs' :
+ _id = 'claims' if key == '837' else 'remits'
+ else:
+ _id = key
+
+ if not lwriter.has(table=_id) :
+ lwriter.apply(info['schema'][key]['create'])
+
+ # [lwriter.apply( info['schema'][key]['create']) for key in info['schema'] if not lwriter.has(table=key)]
+ lwriter.close()
+
+ return info
+
+def upgrade(**args):
+ """
+ :email provide us with who you are
+ :key upgrade key provided by the server for a given email
+ """
+ url = args['url'] if 'url' in args else URL+"/upgrade"
+ headers = {"key":args['key'],"email":args["email"],"url":url}
+def check(**_args):
+ """
+ This function will check if there is an update available (versions are in the configuration file)
+ :param url
+ """
+ url = _args['url'][:-1] if _args['url'].endswith('/') else _args['url']
+ url = url + "/version"
+ if 'version' not in _args :
+ version = {"_id":"version","current":0.0}
+ else:
+ version = _args['version']
+ http = requests.session()
+ r = http.get(url)
+ return r.json()
+
+if __name__ == '__main__' :
+ info = init()
+
+ if 'out-folder' in SYS_ARGS :
+ OUTPUT_FOLDER = SYS_ARGS['out-folder']
+ SYS_ARGS['url'] = SYS_ARGS['url'] if 'url' in SYS_ARGS else URL
+
+ if set(list(SYS_ARGS.keys())) & set(['signup','init']):
+ #
+ # This command will essentially get a new copy of the configurations
+ # @TODO: Tie the request to a version ?
+ #
+
+ email = SYS_ARGS['signup'].strip() if 'signup' in SYS_ARGS else SYS_ARGS['init']
+ url = SYS_ARGS['url'] if 'url' in SYS_ARGS else URL
+ store = SYS_ARGS['store'] if 'store' in SYS_ARGS else 'sqlite'
+ db='healthcareio' if 'db' not in SYS_ARGS else SYS_ARGS['db']
+ signup(email=email,url=url,store=store,db=db)
+ # else:
+ # m = """
+ # usage:
+ # healthcareio --signup --email myemail@provider.com [--url ]
+
+ # """
+ # print (m)
+ elif 'upgrade' in SYS_ARGS :
+ #
+ # perform an upgrade i.e some code or new parsers information will be provided
+ #
+
+ pass
+ elif 'parse' in SYS_ARGS and info:
+ """
+ In this section of the code we are expecting the user to provide :
+ :folder location of the files to process or file to process
+ :
+ """
+ files = []
+ if 'file' in SYS_ARGS :
+ files = [SYS_ARGS['file']] if not os.path.isdir(SYS_ARGS['file']) else []
+ if 'folder' in SYS_ARGS and os.path.exists(SYS_ARGS['folder']):
+ for root,_dir,f in os.walk(SYS_ARGS['folder']) :
+
+ if f :
+ files += [os.sep.join([root,name]) for name in f]
+
+ # names = os.listdir(SYS_ARGS['folder'])
+ # files += [os.sep.join([SYS_ARGS['folder'],name]) for name in names if not os.path.isdir(os.sep.join([SYS_ARGS['folder'],name]))]
+ else:
+ #
+ # raise an error
+
+ pass
+ #
+ # if the user has specified to resume, we should look into the logs and pull the files processed and those that haven't
+ #
+ if 'resume' in SYS_ARGS :
+ store_config = json.loads( (open(CONFIG_FILE)).read() )
+ files = proxy.get.resume(files,store_config )
+ # print (["Found ",len(files)," files unprocessed"])
+ #
+ # @TODO: Log this here so we know what is being processed or not
+ SCOPE = None
+
+ if files : #and ('claims' in SYS_ARGS['parse'] or 'remits' in SYS_ARGS['parse']):
+
+ BATCH_COUNT = 1 if 'batch' not in SYS_ARGS else int (SYS_ARGS['batch'])
+
+ files = np.array_split(files,BATCH_COUNT)
+ procs = []
+ index = 0
+ for row in files :
+
+ row = row.tolist()
+ # logger.write({"process":index,"parse":SYS_ARGS['parse'],"file_count":len(row)})
+ # proc = Process(target=apply,args=(row,info['store'],_info,))
+ # parser = x12.Parser(os.sep.join([PATH,'config.json']))
+
+ parser = x12.Parser(CONFIG_FILE)
+ parser.set.files(row)
+ parser.start()
+ procs.append(parser)
+ # index = index + 1
+ while len(procs) > 0 :
+ procs = [proc for proc in procs if proc.is_alive()]
+ time.sleep(2)
+ uri = OUTPUT_FOLDER
+ store_config = json.loads( (open(CONFIG_FILE)).read() )['store']
+ if 'type' in store_config :
+ uri = store_config['args']['host'] if 'host' in store_config['args'] else ( store_config['args']['path'] if 'path' in store_config['args'] else store_config['args']['database'])
+ if 'SQLite' in store_config['type']:
+ provider = 'sqlite'
+ elif 'sql' in store_config['type'] :
+ provider = 'SQL'
+ else:
+ provider = 'mongo'
+
+ else:
+ provider = store_config['provider']
+ _msg = """
+ Completed Parsing, The data is available in :provider database at :uri
+ Logs are equally available for errors and summary statistics to be compiled
+ """.replace(":provider",provider).replace(":uri",uri)
+ print (_msg)
+
+ pass
+ elif 'analytics' in SYS_ARGS :
+ PORT = int(SYS_ARGS['port']) if 'port' in SYS_ARGS else 5500
+ DEBUG= int(SYS_ARGS['debug']) if 'debug' in SYS_ARGS else 0
+ SYS_ARGS['context'] = SYS_ARGS['context'] if 'context' in SYS_ARGS else ''
+ #
+ #
+
+ # PATH= SYS_ARGS['config'] if 'config' in SYS_ARGS else os.sep.join([os.environ['HOME'],'.healthcareio','config.json'])
+
+ if os.path.exists(CONFIG_FILE) :
+ e = analytics.engine(CONFIG_FILE) #--@TODO: make the configuration file globally accessible
+ e.apply(type='claims',serialize=True)
+ SYS_ARGS['engine'] = e
+ SYS_ARGS['config'] = json.loads(open(CONFIG_FILE ).read())
+ else:
+ SYS_ARGS['config'] = {"owner":None,"store":None}
+
+ if 'args' not in SYS_ARGS['config'] :
+ SYS_ARGS['config']["args"] = {"batch":1,"resume":True}
+ #
+ # folder is mandatory
+ # SYS_ARGS['config']['args']['folder'] = SYS_ARGS['folder']
+
+ # pointer = lambda : server.app.run(host='0.0.0.0',port=PORT,debug=DEBUG,threaded=False)
+ # pthread = Process(target=pointer,args=())
+ # pthread.start()
+ elif 'check-update' in SYS_ARGS :
+ _args = {"url":SYS_ARGS['url']}
+ try:
+ if os.path.exists(CONFIG_FILE) :
+ SYS_ARGS['config'] = json.loads(open(CONFIG_FILE ).read())
+ else:
+ SYS_ARGS['config'] = {}
+ if 'version' in SYS_ARGS['config'] :
+ _args['version'] = SYS_ARGS['config']['version']
+ version = check(**_args)
+ _version = {"current":0.0}if 'version' not in SYS_ARGS['config'] else SYS_ARGS['config']['version']
+ if _version['current'] != version['current'] :
+ print ()
+ print ("You need to upgrade your system to version to ",version['current'])
+ print ("\t- signup (for new configuration)")
+ print ("\t- use pip to upgrade the codebase")
+ else:
+ print ()
+ print ("You are running the current configuraiton version ",_version['current'])
+ except Exception as e:
+ print (e)
+ pass
+
+ elif 'export' in SYS_ARGS:
+ #
+ # this function is designed to export the data to csv
+ #
+ path = SYS_ARGS['export-config']
+
+ X12_TYPE = SYS_ARGS['export'] if 'export' in SYS_ARGS else '835'
+ if not os.path.exists(path) or X12_TYPE not in ['835','837']:
+ print (HELP_MESSAGE)
+ else:
+ #
+ # Let's run the export function ..., This will push files into a data-store of choice Redshift, PostgreSQL, MySQL ...
+ #
+
+ # _store = {"type":"sql.SQLWriter","args":json.loads( (open(path) ).read())}
+ _store = json.loads( (open(path) ).read())
+
+ pipes = export.Factory.instance(type=X12_TYPE,write_store=_store,config = CONFIG_FILE) #"inspect":0,"cast":0}})
+ # pipes[0].run()
+ # print (pipes)
+
+
+ for thread in pipes:
+
+ if 'table' in SYS_ARGS and SYS_ARGS['table'] != thread.table :
+ continue
+ thread.start()
+ time.sleep(1)
+ thread.join()
+
+
+
+ else:
+
+ print(HELP_MESSAGE)
diff --git a/healthcareio/logger.py b/healthcareio/logger.py
new file mode 100644
index 0000000..b7ae696
--- /dev/null
+++ b/healthcareio/logger.py
@@ -0,0 +1,38 @@
+from datetime import datetime
+import transport
+import copy
+import json
+import pandas as pd
+class X12Logger :
+ def __init__(self,**_args) :
+ self._store = copy.deepcopy(_args['store'])
+ self._store['table'] = 'logs'
+ self._store['context'] = 'write'
+
+ def log(self,**_args):
+ _date = datetime.now()
+ _info = {'date':'-'.join([str(_date.month),str(_date.day),str(_date.year)])}
+ for key in ['module','action','data'] :
+ value = 'NA' if key not in _args else _args[key]
+ value = value if type(value) not in [dict,list] else json.dumps(value)
+ _info[key] = value
+
+ # print ([key, type(value) in [dict,list], type(value)])
+ #
+ # Storing the whole thing
+ try:
+ _xwriter = transport.factory.instance(**self._store)
+ # if not _xwriter.has(table=self._store['table']) :
+ # table = self._store['table']
+ # sql = f'''CREATE TABLE {table} (date date,module char(255), data JSON)'''
+ # print (sql)
+ # _xwriter.apply(sql)
+ _df = pd.DataFrame([_info])
+
+ _xwriter.write(_df)
+ if hasattr(_xwriter,'close') :
+ _xwriter.close()
+ except Exception as e:
+
+ print ([e])
+ pass
\ No newline at end of file
diff --git a/healthcareio/x12/utils.py.old b/healthcareio/x12/utils.py.old
new file mode 100644
index 0000000..5cbea07
--- /dev/null
+++ b/healthcareio/x12/utils.py.old
@@ -0,0 +1,144 @@
+import numpy as np
+import os
+
+"""
+This file contains utilities that will be used accross the x12 framework/platform
+@TODO:
+ - Provisions with multiprocessing (locks/releases)
+"""
+class ContentHandler :
+ """
+ This class implements {x12} content handling
+ """
+ def split (self,_stream) :
+ if type(_stream) == str :
+ _xchar = '~\n' if '~\n' in _stream else ('~' if '~' in _stream else ('\n' if '\n' in _stream else None))
+
+ if _xchar :
+ _xchar = ''.join(_xchar)
+ _rows = _stream.split(_xchar)
+
+ return [row.strip().split('*') for row in _rows if row.strip()]
+ else:
+ return _stream.split('*')
+
+ def classify(self,_content):
+ """
+ This function is designed to split claim information from the rest of the information (envelope header)
+ :_content The file content (already split by row and seperator)
+ """
+ _indexes = [1 if 'HL' in line else 0 for line in _content]
+ _indexes = [_index for _index,_value in enumerate(_indexes) if _value == 1]
+
+ #
+ # At this point we know how many claims are in the file (log this somewhere)
+ #
+ _beg = 0
+ _end = _indexes[0]
+ _header = _content[_beg:_end]
+ _block = []
+ for _index,_beg in enumerate(_indexes) :
+ if _index + 1 == len(_indexes) :
+ _end = len(_content)
+ else:
+ _end = _indexes[_index + 1]
+ _block.append(_content[_beg:_end])
+
+ return {'header':_header,'block':_block}
+
+ def merge (self,_x,_y):
+ """
+ This function will merge two objects _x, _y
+ """
+ _zcols = list(set(_x.keys()) & set(_y.keys())) #--common columns
+
+ if _zcols :
+ _out = dict(_x,**{})
+ for _key in _y.keys() :
+ if not _key in _zcols :
+
+ _out[_key] = _y[_key]
+ else:
+ if type(_out[_key]) == list :
+ _out[_key] += _y[_key]
+ elif type(_out[_key]) == dict:
+ _out[_key] = dict(_out[_key],**_y[_key])
+ else:
+ _out[_key] = _y[_key]
+
+ return _out
+ else:
+
+ return dict(_x,**_y)
+ def _inspect_row(self,**_args):
+ """
+ This function makes sure the indexes actually exist in the row
+ :row row to be parsed (already split)
+ :indexes list of indexes
+ :columns columns to be used in the creation of the object
+ """
+ _max = np.max(_args['indexes'])
+ _len = np.size(_args['row']) -1
+ return _max > _len and np.size(_args['indexes']) == np.size(_args['columns'])
+
+ def _parse (self,**_args):
+ """
+ This function will parse an x12 element given
+ :row row of the x12 element
+ :_columns attributes of the object to be returned
+ :_indexes indexes of interest
+ """
+ pass
+ _row = _args['row']
+ _meta = _args['meta']
+ _columns = _args['columns']
+ _indexes = np.array(_args['indexes'])
+ if not self._inspect_row (_args) :
+ #
+ # Minimizing parsing errors by padding the line
+ _delta = 1+ np.max(_indexes) - np.size(_row)
+ _row = _row + np.repeat('',_delta).tolist()
+ #
+ # @TODO: Log that the rows were padded
+ #
+ _row = np.array(_row)
+ return dict(zip(_columns,_row[_indexes].tolist()))
+
+ def _buildObject (self,**_args):
+ """
+ :meta data that is pulled from the decorator function
+ :object row parsed and stored as an object
+ :document existing document being parsed
+ """
+ _meta = _args['meta']
+ _document = _args['document']
+ _object = _args['object']
+ if 'field' not in _meta and 'container' not in _meta :
+ _document = self.merge(_document,_object)
+ elif 'field' :
+ field = _meta['field']
+ if field in _document :
+ _document[field] = self.merge(_document[field],_object)
+ else:
+ _document[field] = _object
+ elif 'container' in _meta :
+ _label = _meta['container']
+ if _label not in _document :
+ _document[_label] = []
+
+ _document[_label].append(_object)
+ return _document
+ def get_files(self,**_args):
+ folder = _args['folder']
+ files = []
+ if not os.path.exists(folder) :
+ return []
+ elif os.path.isdir(folder):
+
+ for root,_dir,f in os.walk(folder) :
+ if f :
+ files += [os.sep.join([root,name]) for name in f]
+ files = [path for path in files if os.path.isfile(path)]
+ else:
+ files = [folder]
+ return files
\ No newline at end of file