PrivacyGuardians

Development of an AI System for Classifying, Evaluating and Interpreting Privacy Policies and Terms of Conditions

Authors:

Erika Raymundo
Pavan Kumar Muppala
Sahil Sahni
Yash Pulse

Business problem

Several studies reveal that privacy policies are often not read or understood due to their complexity and length. This ambiguity has led to data misuse, a violation of privacy, and loss of trust. The implementation of a tool to classify these documents and offer clarity to users can contribute to ensuring privacy and transparency, and also enhance user understanding and privacy awareness.

Objectives

Design and implement a machine learning model to classify privacy policies/terms and conditions documents.
Define metrics for ‘acceptable' and ‘non-acceptable' categories based on human values, legal, ethical, and user-centered criteria.
Identify and highlight sections within these documents that are potentially problematic.
Generate simplified explanations of these identified sections to improve user understanding.
Evaluate the performance of the model and refine its accuracy

Data

Data can include source and high-level description (e.g. # obs)

Methods

descriptive analysis
choices made
key relevant findings from exploritory data analysis for mod 1, will be more involved in future mod

Results

Here are examples of how to embed images from your sub-folder

Visual 1

Sentence about visualization.

Visual 2

Sentence about visualization.

Recommendations:

More of your own text here

Limitations & Next Steps

More of your own text here

For further information

Please review the narrative of our analysis in our jupyter notebook or review our presentation

For any additional questions, please contact **email, email, email)

Repository Structure:

Here is where you would describe the structure of your repoistory and its contents, for exampe:

├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default Sphinx project; see sphinx-doc.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks for exploraation work.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
│
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

zyferion / privacyguardians Goto Github PK