Development of an AI System for Classifying, Evaluating and Interpreting Privacy Policies and Terms of Conditions
Authors:
- Erika Raymundo
- Pavan Kumar Muppala
- Sahil Sahni
- Yash Pulse
Several studies reveal that privacy policies are often not read or understood due to their complexity and length. This ambiguity has led to data misuse, a violation of privacy, and loss of trust. The implementation of a tool to classify these documents and offer clarity to users can contribute to ensuring privacy and transparency, and also enhance user understanding and privacy awareness.
- Design and implement a machine learning model to classify privacy policies/terms and conditions documents.
- Define metrics for ‘acceptable' and ‘non-acceptable' categories based on human values, legal, ethical, and user-centered criteria.
- Identify and highlight sections within these documents that are potentially problematic.
- Generate simplified explanations of these identified sections to improve user understanding.
- Evaluate the performance of the model and refine its accuracy
Data can include source and high-level description (e.g. # obs)
- descriptive analysis
- choices made
- key relevant findings from exploritory data analysis for mod 1, will be more involved in future mod
Sentence about visualization.
Sentence about visualization.
More of your own text here
More of your own text here
Please review the narrative of our analysis in our jupyter notebook or review our presentation
For any additional questions, please contact **email, email, email)
Here is where you would describe the structure of your repoistory and its contents, for exampe:
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks for exploraation work.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models <- Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── tox.ini <- tox file with settings for running tox; see tox.readthedocs.io