documentation
This commit is contained in:
parent
685c567661
commit
df47ed4cb2
36
README.md
36
README.md
|
@ -12,32 +12,33 @@ This package is designed to generate synthetic data from a dataset from an origi
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
After installing the easiest way to get started is as follows (using pandas). The process is as follows:
|
After installing the easiest way to get started is as follows (using pandas). The process is as follows:
|
||||||
1. Train the GAN on the original/raw dataset
|
|
||||||
|
**Train the GAN on the original/raw dataset**
|
||||||
|
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
import data.maker
|
import data.maker
|
||||||
|
|
||||||
df = pd.read_csv('sample.csv')
|
df = pd.read_csv('sample.csv')
|
||||||
column = 'gender'
|
column = 'gender'
|
||||||
id = 'id'
|
id = 'id'
|
||||||
context = 'demo'
|
context = 'demo'
|
||||||
data.maker.train(context=context,data=df,column=column,id=id,logs='logs')
|
data.maker.train(context=context,data=df,column=column,id=id,logs='logs')
|
||||||
|
|
||||||
The trainer will store the data on disk (for now) in a structured folder that will hold training models that will be used to generate the synthetic data.
|
The trainer will store the data on disk (for now) in a structured folder that will hold training models that will be used to generate the synthetic data.
|
||||||
|
|
||||||
|
|
||||||
2. Generate a candidate dataset from the learnt features
|
**Generate a candidate dataset from the learned features**
|
||||||
|
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
import data.maker
|
import data.maker
|
||||||
|
|
||||||
df = pd.read_csv('sample.csv')
|
df = pd.read_csv('sample.csv')
|
||||||
id = 'id'
|
id = 'id'
|
||||||
column = 'gender'
|
column = 'gender'
|
||||||
context = 'demo'
|
context = 'demo'
|
||||||
data.maker.generate(data=df,id=id,column=column,logs='logs')
|
data.maker.generate(data=df,id=id,column=column,logs='logs')
|
||||||
|
|
||||||
## Limitations
|
## Limitations
|
||||||
|
|
||||||
|
@ -46,11 +47,14 @@ GANS will generate data assuming the original data has all the value space neede
|
||||||
- No new data will be created
|
- No new data will be created
|
||||||
|
|
||||||
Assuming we have a dataset with an gender attribute with values [M,F].
|
Assuming we have a dataset with an gender attribute with values [M,F].
|
||||||
|
|
||||||
The synthetic data will not be able to generate genders outside [M,F]
|
The synthetic data will not be able to generate genders outside [M,F]
|
||||||
|
|
||||||
- Not advised on continuous values
|
- Not advised on continuous values
|
||||||
|
|
||||||
GANS work well on discrete values and thus are not advised to be used.
|
GANS work well on discrete values and thus are not advised to be used.
|
||||||
e.g:measurements (height, blood pressure, ...)
|
e.g:measurements (height, blood pressure, ...)
|
||||||
|
- For now will only perform on a single feature.
|
||||||
|
|
||||||
## Credits :
|
## Credits :
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue