documentation ...
This commit is contained in:
parent
3a3946c7d8
commit
8bb495842a
69
README.md
69
README.md
|
@ -25,7 +25,8 @@ Mostly data scientists that don't really care about the underlying database and
|
|||
|
||||
1. Familiarity with **pandas data-frames**
|
||||
2. Connectivity **drivers** are included
|
||||
3. Useful for data migrations or ETL
|
||||
3. Mining data from various sources
|
||||
4. Useful for data migrations or ETL
|
||||
|
||||
# Usage
|
||||
|
||||
|
@ -35,7 +36,8 @@ Within the virtual environment perform the following :
|
|||
|
||||
pip install git+https://dev.the-phi.com/git/steve/data-transport.git
|
||||
|
||||
Once installed **data-transport** can be used as a library in code or a command line interface (CLI)
|
||||
Once installed **data-transport** can be used as a library in code or a command line interface (CLI), as a CLI it is used for ETL and requires a configuration file.
|
||||
|
||||
|
||||
## Data Transport as a Library (in code)
|
||||
---
|
||||
|
@ -112,12 +114,71 @@ df = reader.read(mongo=_command)
|
|||
print (df.head())
|
||||
reader.close()
|
||||
```
|
||||
**Writing to Mongodb**
|
||||
**Read/Writing to Mongodb**
|
||||
---
|
||||
|
||||
Scenario 1: Mongodb with security in place
|
||||
|
||||
1. Define an authentication file on disk
|
||||
|
||||
The semantics of the attributes are provided by mongodb, please visit [mongodb documentation](https://mongodb.org/docs). In this example the file is located on _/transport/mongo.json_
|
||||
<div style="display:grid; grid-template-columns:60% auto; gap:4px">
|
||||
<div>
|
||||
<b>configuration file</b>
|
||||
|
||||
```
|
||||
{
|
||||
"username":"me","password":"changeme",
|
||||
"mechanism":"SCRAM-SHA-1",
|
||||
"authSource":"admin"
|
||||
}
|
||||
```
|
||||
<b>Connecting to Mongodb </b>
|
||||
|
||||
```
|
||||
import transport
|
||||
PIPELINE = ... #-- do this yourself
|
||||
MONGO_KEY = '/transport/mongo.json'
|
||||
mreader = transport.factory.instance(provider=transport.providers.MONGODB,auth_file=MONGO_KEY,context='read',db='mydb',doc='logs')
|
||||
_aggregateDF = mreader.read(mongo=PIPELINE) #--results of a aggregate pipeline
|
||||
_collectionDF= mreader.read()
|
||||
|
||||
|
||||
```
|
||||
|
||||
In order to enable write, change **context** attribute to **'read'**.
|
||||
</div>
|
||||
<div>
|
||||
- The configuration file is in JSON format
|
||||
- The commands passed to mongodb are the same as you would if you applied runCommand in mongodb
|
||||
- The output is a pandas data-frame
|
||||
- By default the transport reads, to enable write operations use **context='write'**
|
||||
|
||||
|parameters|description |
|
||||
| --- | --- |
|
||||
|db| Name of the database|
|
||||
|port| Port number to connect to
|
||||
|doc| Name of the collection of documents|
|
||||
|username|Username |
|
||||
|password|password|
|
||||
|authSource|user database that has authentication info|
|
||||
|mechanism|Mechnism used for authentication|
|
||||
|
||||
**NOTE**
|
||||
|
||||
Arguments like **db** or **doc** can be placed in the authentication file
|
||||
</div>
|
||||
</div>
|
||||
|
||||
**Limitations**
|
||||
|
||||
Reads and writes aren't encapsulated in the same object, this is to allow the calling code to deliberately perform actions and hopefully minimize accidents associated with data wrangling.
|
||||
|
||||
|
||||
```
|
||||
import transport
|
||||
improt pandas as pd
|
||||
writer = factory.instance(provider='mongodb',context='write',host='localhost',port='27018',db='example',doc='logs')
|
||||
writer = factory.instance(provider=transport.providers.MONGODB,context='write',host='localhost',port='27018',db='example',doc='logs')
|
||||
|
||||
df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]})
|
||||
writer.write(df)
|
||||
|
|
Loading…
Reference in New Issue