documentation ...

This commit is contained in:
Steve Nyemba 2022-08-19 16:00:37 -05:00
parent 3a3946c7d8
commit 8bb495842a
1 changed files with 65 additions and 4 deletions

View File

@ -25,7 +25,8 @@ Mostly data scientists that don't really care about the underlying database and
1. Familiarity with **pandas data-frames** 1. Familiarity with **pandas data-frames**
2. Connectivity **drivers** are included 2. Connectivity **drivers** are included
3. Useful for data migrations or ETL 3. Mining data from various sources
4. Useful for data migrations or ETL
# Usage # Usage
@ -35,7 +36,8 @@ Within the virtual environment perform the following :
pip install git+https://dev.the-phi.com/git/steve/data-transport.git pip install git+https://dev.the-phi.com/git/steve/data-transport.git
Once installed **data-transport** can be used as a library in code or a command line interface (CLI) Once installed **data-transport** can be used as a library in code or a command line interface (CLI), as a CLI it is used for ETL and requires a configuration file.
## Data Transport as a Library (in code) ## Data Transport as a Library (in code)
--- ---
@ -112,12 +114,71 @@ df = reader.read(mongo=_command)
print (df.head()) print (df.head())
reader.close() reader.close()
``` ```
**Writing to Mongodb** **Read/Writing to Mongodb**
--- ---
Scenario 1: Mongodb with security in place
1. Define an authentication file on disk
The semantics of the attributes are provided by mongodb, please visit [mongodb documentation](https://mongodb.org/docs). In this example the file is located on _/transport/mongo.json_
<div style="display:grid; grid-template-columns:60% auto; gap:4px">
<div>
<b>configuration file</b>
```
{
"username":"me","password":"changeme",
"mechanism":"SCRAM-SHA-1",
"authSource":"admin"
}
```
<b>Connecting to Mongodb </b>
```
import transport
PIPELINE = ... #-- do this yourself
MONGO_KEY = '/transport/mongo.json'
mreader = transport.factory.instance(provider=transport.providers.MONGODB,auth_file=MONGO_KEY,context='read',db='mydb',doc='logs')
_aggregateDF = mreader.read(mongo=PIPELINE) #--results of a aggregate pipeline
_collectionDF= mreader.read()
```
In order to enable write, change **context** attribute to **'read'**.
</div>
<div>
- The configuration file is in JSON format
- The commands passed to mongodb are the same as you would if you applied runCommand in mongodb
- The output is a pandas data-frame
- By default the transport reads, to enable write operations use **context='write'**
|parameters|description |
| --- | --- |
|db| Name of the database|
|port| Port number to connect to
|doc| Name of the collection of documents|
|username|Username |
|password|password|
|authSource|user database that has authentication info|
|mechanism|Mechnism used for authentication|
**NOTE**
Arguments like **db** or **doc** can be placed in the authentication file
</div>
</div>
**Limitations**
Reads and writes aren't encapsulated in the same object, this is to allow the calling code to deliberately perform actions and hopefully minimize accidents associated with data wrangling.
``` ```
import transport import transport
improt pandas as pd improt pandas as pd
writer = factory.instance(provider='mongodb',context='write',host='localhost',port='27018',db='example',doc='logs') writer = factory.instance(provider=transport.providers.MONGODB,context='write',host='localhost',port='27018',db='example',doc='logs')
df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]}) df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]})
writer.write(df) writer.write(df)