Giter Club home page Giter Club logo

cookieblock's Introduction

CookieBlock Browser Extension

CookieBlock is a browser extension that automatically enforces your GDPR consent preferences for browser cookies. It classifies cookies on-the-fly into four distinct categories, and deletes those that the user did not consent to.

This helps enforce user privacy without having to rely on the website hosting the cookies.

Table of Contents

Description

CookieBlock is an extension that allows the user to apply their cookie consent preferences to any website, no matter if the website has a cookie banner. The user specifies their consent options once when the extension is first installed, and then CookieBlock will try to remove any cookies that do not align with the user's policy as they are being created.

This is intended to ensure that the privacy of the user is preserved. One can reject any of the following categories:

  • Functionality Cookies
  • Analytical/Statistics Cookies
  • Advertising/Tracking Cookies

Note that CookieBlock does not handle the cookie banner itself. In order to remove these annoying banners, we recommend using the Consent-O-Matic extension:

Download Links

CookieBlock is compatible with both Firefox and Chromium-based browsers, and it is available on the following addon stores:

Feedback

If you would like to submit feedback, or report websites that break because of the addon, you can open an issue on this Github page, or alternatively use this Google Forms document.

Build Instructions

No requirements outside of what is contained in this repository is needed to build CookieBlock. Simply pack the contents of the subfolder src into a zip file, and you can install it into your browser. More information here.

Alternatively, you can also install npm and use the web-ext command-line tool, with the command web-ext build.

Reproducing the Model Files

The model files are constructed in the following process:

  1. Run a webcrawler to collect browser cookies and categories from Consent Management Providers. The relevant code is found here.
  2. Extract from the resulting database the training cookies, in JSON format. The script for this is found here.
  3. Use the feature extractor in this repository to transform the cookies JSON into a sparse LibSVM matrix representation.
  4. Provide this LibSVM with the associated class weights as input to the XGBoost classifier implementation (xgb_train.py) found in this repository.
  5. Execute a secondary Python script (xgboost_small_dump.py) to transform the XGBoost model into a minified JSON tree structure. This script produces the four model files forest_class0.json to forest_class3.json.
  6. Copy these files into the folder:
./src/ext_data/model/

And replace the existing forest class files. Make sure to preserve their names as is.

How It Works

The policy enforcement process is a background script that executes every time a cookie event is raised in the browser. If this event indicates that a cookie was added or updated, the extension will store the cookie in a local history of cookie updates, and then perform a classification for that cookie.

The category for each cookie is predicted using a forest of decision trees model trained via the XGBoost classifier, and a set of feature extraction steps. First, the cookie is turned into a numerical vector, which is then provided as an input to the forest of trees. This produces a score for each class, and the best score is the class that gets assigned to the cookie.

Available cookie categories are:

  • Strictly Necessary
  • Functionality
  • Analytics
  • Advertising/Tracking

Granularity is intentionally kept low to make the decision as simple as possible for the user. Note that "strictly necessary" cookies cannot be rejected, as this is the class of cookies that is required to make the website work. Without them, essential services such as logins would stop working.

The feature extractor can be found in the subfolder nodejs-feature-extractor/. This is used to extract the features for the training data set.

For the classifier training, see:

https://github.com/dibollinger/CookieBlock-Consent-Classifier

Known Issues

The classifier is not completely accurate. It may occur that certain functions on some sites are broken because essential cookies get misclassified. This is hard to resolve without gathering more cookie data to train on. As such, the approach has its limits.

To resolve these problems, we maintain a list of known cookie categories. This is a JSON file storing cookie labels for known cookie identifiers. If a cookie is contained in this file. the prediction is skipped, and the known class is applied.

By reporting broken websites, you can help us keep an updated list of cookie exceptions. This makes the extension more useable for everyone while also keeping a high level of privacy.

Repository Contents

  • node-feature-extractor/: Contains the feature extractor implemented using npm. Used to extract features with the same JavaScript code as the extension.
    • /modules/: Contains code used to perform the feature extraction and prediction.
    • /outputs/: Output directory for the feature extraction.
    • /training_data/: Path for cookie data in json format, used for extracting features.
    • /validation_data/: Path for cookie features, extracted in libsvm format, used for prediction and verifying model accuracy.
    • /cli.js: Command-line script used to run the feature extraction.
  • logo/: Contains the original CookieBlock logo files.
  • promotional/: Contains store page text, localizations and promo images used for those stores.
  • src/: Source code for the CookieBlock extension.
    • _locales/: Contains all human-readable strings displayed on the extension interface, with translations to different languages.
    • background/: JavaScript code and HTML for the extension background process. Currently only Manifest v2 compatible.
    • ext_data/: All external data required to perform the feature extraction and class label prediction.
      • model/: Extracted CART prediction tree forests, one for each class of cookies.
      • resources/: Resources used with the feature extraction.
      • default_config.json: Defines default storage values used in the extension.
      • features.json: Defines how the feature extraction operates, and which individual feature are enabled.
      • known_cookies.json: Defines default categorizations for some known cookies. Used to grant exceptions to potential website breakages.
    • icons/: Browser extension icons.
    • modules/: Contains scripts that handle certain aspects of the feature extraction and prediction.
      • third_party/: Third party code libraries.
    • options/: Contains the options and first time setup page of the extension.
    • popup/: Contains code for the extension popup.
    • credits.txt: Links to the third-party libraries and credits to the respective authors.
    • LICENSE: License of the extension.

Credits

  • CookieBlock logo designed by Charmaine Coates.

  • Czech translation provided by Karel Kubicek.

  • Japanese translation provided by Shitennouji.

  • Spanish translation provided by @6ig6oy.

  • Automated localization in store pages performed using DeepL.

Libraries

CookieBlock includes code from the following libraries and projects:

Thesis

This repository was created as part of the master thesis "Analyzing Cookies Compliance with the GDPR", which can be found at:

https://www.research-collection.ethz.ch/handle/20.500.11850/477333

as well as the paper "Automating Cookie Consent and GDPR Violation Detection", which can be found at:

https://karelkubicek.github.io/post/cookieblock.html

Thesis Supervision and co-authors:

  • Karel Kubicek
  • Dr. Carlos Cotrini
  • Prof. Dr. David Basin
  • Information Security Group at ETH Zürich

See also the following repositories for other components that were developed as part of the thesis:

License

Copyright © 2021-2022 Dino Bollinger, Department of Computer Science at ETH Zürich, Information Security Group

MIT License, see included LICENSE file

cookieblock's People

Contributors

bender250 avatar darioackermann avatar dibollinger avatar elmar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cookieblock's Issues

Broken funcionality - list of websites

I think it would be good to start reporting all the websites which are getting partially broken due to the default settings applied in CookieBlock.
In this case, we could perhaps create a white list?
Ideally, it would be great to report a broken functionality directly via the extension.

My findings:

  • linkedin.com: dark mode does not work
  • youtube.com: once the user logs in to the service, they are not being remembered (even if their avatar is still shown in the corner)

Feature request (from feedback form) - statistics in icon like uBlock

It would be interesting to have statistics directly on the page used, like uBlock, it permit also to the user to see whats is going on with the current page.

uBlock

Implementation insight: this would be easy for first-party cookies, but with third-party cookies the extension API does not tell us to which tab these cookies belong.

Couchsurfing.com issue with cookieblock

When cookieblock is present without site exception, it does not seem possible to make any change/edit.
Web developer tools show return HTTP 422 Unprocessable Content - https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/422 and page displays

The change you wanted was rejected.
Maybe you tried to change something you didn't have access to.

Request headers cookies don't seem much different.

_safe_cookies__known_cookies	"XSRF-TOKEN|_hammertime_session|remember_user_token|ht_utm_campaign|ht_utm_source|ht_utm_medium|ht_utm_content|ht_utm_term"

is there a way to identify how cookies are classified and which ones are removed?
don't see a clear match with https://raw.githubusercontent.com/dibollinger/CookieBlock/main/src/ext_data/known_cookies.json

Tested on Firefox 115esr Linux with pages
Edit Profile: add country - https://www.couchsurfing.com/users/profile?id=xxx
Edit Message Template - https://www.couchsurfing.com/message_templates/xxx
Likely need an account which need both registration and pay.

Confirmed not related to Firefox Tracking protection or other browser extension like ublock or privacy badger.
Normal workaround: add an exception for site

Can't log in to any website in Chromium

The extension prevents a successful login to any website in the Chromium browser when Advertising/Tracking Cookies are not allowed.
I fill the login information and when I click OK I'm redirected to the home page of the website as if I just logged out. The extension is set to Allow Functionality Cookies.
When I try the Firefox extension it works as expected with the same exact configuration that fails on Chromium (Functionality Cookies allowed and Advertising/Tracking Cookies not allowed).

Add spanish translation

Hello,
I was just wondering if you could add locales/es to your fantastic extension.

It would be interesting to have a website like weblate.org to make translations.

I hope you find the translation useful,
greetings

PD: Sorry, I was unable to open a pull request.
locales_ES.zip

posteid.poste.it missing QR code used for login

Steps

and the PosteID login page shows a username and password on the left and a qr code on the right. The last one is used to login through the smartphones without having to put credentials and OTP.

If Cookie Block is active for https://posteid.poste.it, the QR code is not displayed

workaround

whitelist https://posteid.poste.it

screenshoots

with Cookie Block
Screenshot 2022-05-29 at 17-30-52 Login IDP


without CookieBlock (QR code has been altered for privacy)
Screenshot 2022-05-29 at 17-46-29 Login IDP

Safari extension

Not really an issue, more of a request - are there any plans to offer a Safari version? That would help for Apple devices (computers and mobile devices).

Microsoft Outlook - Something went wrong

Hi,

With the extension enabled, Not only does Microsoft outlook ask for authentication everytime you try to login but it also presents the following page:
image

I would kindly appreciate it if this can be fixed.
Thank you

bancaprogetto login at user's area is broken

CookieBlock is setup with "Allow Functionality Cookies (recommended)"

Steps

  1. open bancaprogetto.it
  2. click "Area Privati"
  3. now the page begins with https://ihbnext.cedacri.it

actual behaviour

After trying to login with my credential, the bank says it's wrong (which is not).

workaround

adding https://ihbnext.cedacri.it as a domain exception in CookieBlock

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.