privacykit/README.md

# deid-risk

This project is intended to compute an estimated value of risk for a given database.

    1. Pull meta data of the database  and create a dataset via joins
    2. Generate the dataset with random selection of features
    3. Compute risk via SQL using group by
## Python environment

The following are the dependencies needed to run the code:

        pandas
        numpy
        pandas-gbq
        google-cloud-bigquery

        
## Usage

**Generate The merged dataset**

    python risk.py create --i_dataset <in dataset|schema> --o_dataset <out dataset|schema> --table <name> --path <bigquery-key-file>  --key <patient-id-field-name> [--file ]


**Compute risk (marketer, prosecutor)**

    python risk.py compute --i_dataset <dataset> --table <name> --path <bigquery-key-file>  --key <patient-id-field-name> 
## Limitations
    - It works against bigquery for now
    
    @TODO:    
        - Need to write a transport layer (database interface)
        - Support for referential integrity, so one table can be selected and a dataset derived given referential integrity
        - Add support for journalist risk
Initial commit 2018-09-06 17:45:59 +00:00			`# deid-risk`

Update 'README.md' 2018-09-10 14:48:04 +00:00			`This project is intended to compute an estimated value of risk for a given database.`

documentation 2018-09-27 15:35:35 +00:00			`1. Pull meta data of the database and create a dataset via joins`
Update 'README.md' 2018-09-10 14:53:18 +00:00			`2. Generate the dataset with random selection of features`
documentation 2018-09-27 15:35:35 +00:00			`3. Compute risk via SQL using group by`
			`## Python environment`

Update 'README.md' 2018-09-27 15:49:19 +00:00			`The following are the dependencies needed to run the code:`
documentation 2018-09-27 15:35:35 +00:00
			`pandas`
			`numpy`
			`pandas-gbq`
			`google-cloud-bigquery`


			`## Usage`

Update 'README.md' 2018-09-27 15:49:19 +00:00			`Generate The merged dataset`

documentation 2018-09-27 15:35:35 +00:00			`python risk.py create --i_dataset <in dataset\|schema> --o_dataset <out dataset\|schema> --table <name> --path <bigquery-key-file> --key <patient-id-field-name> [--file ]`

Update 'README.md' 2018-09-27 15:47:02 +00:00
Update 'README.md' 2018-09-27 15:49:19 +00:00			`Compute risk (marketer, prosecutor)`
documentation 2018-09-27 15:35:35 +00:00
			`python risk.py compute --i_dataset <dataset> --table <name> --path <bigquery-key-file> --key <patient-id-field-name>`
			`## Limitations`
			`- It works against bigquery for now`
Update 'README.md' 2018-09-27 15:49:19 +00:00
documentation 2018-09-27 15:35:35 +00:00			`@TODO:`
			`- Need to write a transport layer (database interface)`
			`- Support for referential integrity, so one table can be selected and a dataset derived given referential integrity`
			`- Add support for journalist risk`