documentation ...
This commit is contained in:
parent
3a3946c7d8
commit
8bb495842a
69
README.md
69
README.md
|
@ -25,7 +25,8 @@ Mostly data scientists that don't really care about the underlying database and
|
||||||
|
|
||||||
1. Familiarity with **pandas data-frames**
|
1. Familiarity with **pandas data-frames**
|
||||||
2. Connectivity **drivers** are included
|
2. Connectivity **drivers** are included
|
||||||
3. Useful for data migrations or ETL
|
3. Mining data from various sources
|
||||||
|
4. Useful for data migrations or ETL
|
||||||
|
|
||||||
# Usage
|
# Usage
|
||||||
|
|
||||||
|
@ -35,7 +36,8 @@ Within the virtual environment perform the following :
|
||||||
|
|
||||||
pip install git+https://dev.the-phi.com/git/steve/data-transport.git
|
pip install git+https://dev.the-phi.com/git/steve/data-transport.git
|
||||||
|
|
||||||
Once installed **data-transport** can be used as a library in code or a command line interface (CLI)
|
Once installed **data-transport** can be used as a library in code or a command line interface (CLI), as a CLI it is used for ETL and requires a configuration file.
|
||||||
|
|
||||||
|
|
||||||
## Data Transport as a Library (in code)
|
## Data Transport as a Library (in code)
|
||||||
---
|
---
|
||||||
|
@ -112,12 +114,71 @@ df = reader.read(mongo=_command)
|
||||||
print (df.head())
|
print (df.head())
|
||||||
reader.close()
|
reader.close()
|
||||||
```
|
```
|
||||||
**Writing to Mongodb**
|
**Read/Writing to Mongodb**
|
||||||
---
|
---
|
||||||
|
|
||||||
|
Scenario 1: Mongodb with security in place
|
||||||
|
|
||||||
|
1. Define an authentication file on disk
|
||||||
|
|
||||||
|
The semantics of the attributes are provided by mongodb, please visit [mongodb documentation](https://mongodb.org/docs). In this example the file is located on _/transport/mongo.json_
|
||||||
|
<div style="display:grid; grid-template-columns:60% auto; gap:4px">
|
||||||
|
<div>
|
||||||
|
<b>configuration file</b>
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"username":"me","password":"changeme",
|
||||||
|
"mechanism":"SCRAM-SHA-1",
|
||||||
|
"authSource":"admin"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
<b>Connecting to Mongodb </b>
|
||||||
|
|
||||||
|
```
|
||||||
|
import transport
|
||||||
|
PIPELINE = ... #-- do this yourself
|
||||||
|
MONGO_KEY = '/transport/mongo.json'
|
||||||
|
mreader = transport.factory.instance(provider=transport.providers.MONGODB,auth_file=MONGO_KEY,context='read',db='mydb',doc='logs')
|
||||||
|
_aggregateDF = mreader.read(mongo=PIPELINE) #--results of a aggregate pipeline
|
||||||
|
_collectionDF= mreader.read()
|
||||||
|
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
In order to enable write, change **context** attribute to **'read'**.
|
||||||
|
</div>
|
||||||
|
<div>
|
||||||
|
- The configuration file is in JSON format
|
||||||
|
- The commands passed to mongodb are the same as you would if you applied runCommand in mongodb
|
||||||
|
- The output is a pandas data-frame
|
||||||
|
- By default the transport reads, to enable write operations use **context='write'**
|
||||||
|
|
||||||
|
|parameters|description |
|
||||||
|
| --- | --- |
|
||||||
|
|db| Name of the database|
|
||||||
|
|port| Port number to connect to
|
||||||
|
|doc| Name of the collection of documents|
|
||||||
|
|username|Username |
|
||||||
|
|password|password|
|
||||||
|
|authSource|user database that has authentication info|
|
||||||
|
|mechanism|Mechnism used for authentication|
|
||||||
|
|
||||||
|
**NOTE**
|
||||||
|
|
||||||
|
Arguments like **db** or **doc** can be placed in the authentication file
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
**Limitations**
|
||||||
|
|
||||||
|
Reads and writes aren't encapsulated in the same object, this is to allow the calling code to deliberately perform actions and hopefully minimize accidents associated with data wrangling.
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
import transport
|
import transport
|
||||||
improt pandas as pd
|
improt pandas as pd
|
||||||
writer = factory.instance(provider='mongodb',context='write',host='localhost',port='27018',db='example',doc='logs')
|
writer = factory.instance(provider=transport.providers.MONGODB,context='write',host='localhost',port='27018',db='example',doc='logs')
|
||||||
|
|
||||||
df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]})
|
df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]})
|
||||||
writer.write(df)
|
writer.write(df)
|
||||||
|
|
Loading…
Reference in New Issue