Update 'README.md'
This commit is contained in:
parent
942e3bd812
commit
7d82cc6e3a
42
README.md
42
README.md
|
@ -1,33 +1,29 @@
|
||||||
# Re-Identification Risk
|
# Re-Identification Risk
|
||||||
|
|
||||||
This framework computes re-identification risk of a dataset assuming the data being shared can be loaded into a dataframe (pandas)
|
This framework computes re-identification risk of a dataset by extending pandas. It works like a pandas **add-on**
|
||||||
The framework will compute the following risk measures:
|
The framework will compute the following risk measures: marketer, prosecutor, journalist and pitman risk.
|
||||||
- marketer
|
There are two modes available :
|
||||||
- prosecutor
|
|
||||||
- pitman
|
|
||||||
|
|
||||||
References :
|
**explore:**
|
||||||
|
|
||||||
[http://ehelthinformation.ca](http://www.ehealthinformation.ca/wp-content/uploads/2014/08/2009-De-identification-PA-whitepaper1.pdf)
|
Here the assumption is that we are not sure of the attributes to be disclosed,
|
||||||
[https://www.scb.se/contentassets](https://www.scb.se/contentassets/ff271eeeca694f47ae99b942de61df83/applying-pitmans-sampling-formula-to-microdata-disclosure-risk-assessment.pdf)
|
The framework will explore a variety of combinations and associate risk measures every random combinations it can come up with
|
||||||
|
|
||||||
This framework integrates pandas (for now) as an extension and can be used in two modes :
|
**evaluation**
|
||||||
* 1. explore: *
|
|
||||||
Here the assumption is that we are not sure of the attributes to be disclosed,
|
|
||||||
The framework will explore a variety of combinations and associate risk measures every random combinations it can come up with
|
|
||||||
|
|
||||||
* 2. evaluation: *
|
Here the assumption is that we are clear on the sets of attributes to be used and we are interested in computing the associated risk.
|
||||||
Here the assumption is that we are clear on the sets of attributes to be used and we are interested in computing the associated risk.
|
|
||||||
|
|
||||||
|
|
||||||
# Four risk measures are computed :
|
### Four risk measures are computed :
|
||||||
|
|
||||||
- Marketer risk
|
- Marketer risk
|
||||||
- Prosecutor risk
|
- Prosecutor risk
|
||||||
- Journalist risk
|
- Journalist risk
|
||||||
- Pitman Risk
|
- Pitman Risk
|
||||||
|
|
||||||
# Usage:
|
### Usage:
|
||||||
|
|
||||||
|
The framework will depend on pandas and numpy (for now)
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
|
@ -46,17 +42,15 @@ This framework integrates pandas (for now) as an extension and can be used in tw
|
||||||
pop = pd.DataFrame({"x":np.random.choice( np.random.randint(1,10),150),"y":np.random.choice( np.random.randint(1,10),150) ,"q":np.random.choice( np.random.randint(1,10),150)})
|
pop = pd.DataFrame({"x":np.random.choice( np.random.randint(1,10),150),"y":np.random.choice( np.random.randint(1,10),150) ,"q":np.random.choice( np.random.randint(1,10),150)})
|
||||||
mydf.risk.evaluate(pop=pop)
|
mydf.risk.evaluate(pop=pop)
|
||||||
|
|
||||||
|
### References :
|
||||||
|
|
||||||
|
[http://ehelthinformation.ca] (http://www.ehealthinformation.ca/wp-content/uploads/2014/08/2009-De-identification-PA-whitepaper1.pdf)
|
||||||
|
|
||||||
|
[https://www.scb.se/contentassets](https://www.scb.se/contentassets/ff271eeeca694f47ae99b942de61df83/applying-pitmans-sampling-formula-to-microdata-disclosure-risk-assessment.pdf)
|
||||||
|
|
||||||
@TODO:
|
@TODO:
|
||||||
- Evaluation of how sparse attributes are (the ratio of non-null over rows)
|
- Evaluation of how sparse attributes are (the ratio of non-null over rows)
|
||||||
- Have a smart way to drop attributes (based on the above in random policy search)
|
- Have a smart way to drop attributes (based on the above in random policy search)
|
||||||
Basic examples that illustrate usage of the the framework are in the notebook folder. The example is derived from
|
Basic examples that illustrate usage of the the framework are in the notebook folder. The example is derived from
|
||||||
|
|
||||||
|
|
||||||
Dependencies:
|
|
||||||
numpy
|
|
||||||
pandas
|
|
||||||
|
|
||||||
Limitations:
|
|
||||||
|
|
||||||
@TODO:
|
|
||||||
- Add support for journalist risk
|
|
||||||
|
|
Loading…
Reference in New Issue