Giter Club home page Giter Club logo

adblocker's Introduction

Adblocker

Efficient · Minimal · JavaScript · TypeScript · uBlock Origin- and Easylist-compatible
Node.js · Puppeteer · Electron · WebExtension

Github Actions Build Status Github Actions Assets Status Blazing Fast npm version weekly downloads from npm
code style: prettier Follow Ghostery on Twitter Dependabot License Badge


Cliqz' adblocker is a JavaScript library for blocking ads, trackers, and annoyances with a strong focus on efficiency. It was designed with compatibility in mind and integrates seamlessly with the following environments:

Getting Started

Cliqz' adblocker is the easiest and most efficient way to block ads and trackers in your project. Only a few lines of code are required to integrate smoothly with Puppeteer, Electron, a Chrome- and Firefox-compatible browser extension, or any environment supporting JavaScript (e.g. Node.js or React Native).

Here is how to do it in two steps for a Chrome- and Firefox-compatible WebExtension:

  1. Install: npm install --save @cliqz/adblocker-webextension
  2. Add the following in your background script:
import { WebExtensionBlocker } from '@cliqz/adblocker-webextension';

WebExtensionBlocker.fromPrebuiltAdsAndTracking().then((blocker) => {
  blocker.enableBlockingInBrowser(browser);
});

Congratulations, you are now blocking all ads and trackers! 🎉

Compatibility

The library supports 99% of all filters from the Easylist and uBlock Origin projects. Check the compatibility matrix on the wiki for more details.

Contributing

This project makes use of lerna and yarn workspaces under the hood. Quickly get started with:

  1. Fork and clone the repository,
  2. Enable corepack: corepack enable,
  3. Install dependencies: yarn install --immutable,
  4. Build: yarn build,
  5. Test: yarn test,

For any question, feel free to open an issue or a pull request to get some help!

Who is using it?

This library is the building block technology used to power the adblockers from Ghostery and Cliqz on both desktop and mobile platforms. It is already running in production for millions of users and has been battle-tested to satisfy the following use-cases:

  • Mobile-friendly adblocker in react-native, WebExtension, or custom JavaScript context: Ghostery for iOS.
  • Ads and trackers blocker in Electron applications, Puppeteer headless browsers, Cliqz browser, ghostery and standalone).
  • Batch requests processing in Node.js, HTML fuzzy keywork matcher, and more.

The innovative algorithms and architecture designed and implemented in this project have been shown to be among the most efficient ways to implement ad-blockers and have been used in other projects to implement highly performant adblockers such as Brave.

Swag

Show the world you're using ghostery/adblockerpowered by Ghostery

[![powered by Ghostery](https://img.shields.io/badge/ghostery-powered-blue?logo=ghostery)](https://github.com/ghostery/adblocker)

Or HTML:

<a href="https://github.com/ghostery/adblocker/" target="_blank" rel="noopener noreferrer">
    <img alt="powered by Ghostery" src="https://img.shields.io/badge/ghostery-powered-blue?logo=ghostery">
</a>

License

Mozilla Public License 2.0

adblocker's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adblocker's Issues

Extremely slow request decision

Hi, I noticed after update (0.10 I guess) the engine.match method became super-slow. Here's the code I'm using in my Electron app, and it literally hangs up:

 webRequest.onBeforeRequest(
    { urls: ['<all_urls>'] },
    async (details: Electron.OnBeforeRequestDetails, callback: any) => {
      if (engine && settings.isShieldToggled) {
        console.time('engine.match');
        const { match, redirect } = engine.match(
          Request.fromRawDetails({
            type: details.resourceType as any,
            url: details.url,
          }),
        );
        console.timeEnd('engine.match');

        if (match || redirect) {
          appWindow.webContents.send(`blocked-ad-${details.webContentsId}`);

          if (redirect) {
            callback({ redirectURL: redirect });
          } else {
            callback({ cancel: true });
          }

          return;
        }
      }

      callback({ cancel: false });
    },
  );

image

Bring back some patterns which look like RegExps

We currently drop RegExps filters /.../. Although there does not seem to be any example of such filter in the lists so far, it would be nice to only drop the filters which actually contain RegExp specific characters like *. This way, we would be able to retain /ads/ (for example).

Add more dynamic optimizations

It would be nice to extend the set of optimizations performed at runtime by the adblocker to speed-up matching. We could also consider optimizing individual filters; currently some of them are performed directly while parsing, but we could simplify (and speed-up) this phase and instead rely on the optimizer for this.

Clean-up request's types

We currently still handle legacy codes from Firefox Bootstrap extensions, which could be removed. It's also a good occasion to make sure all types are properly handled (they should).

"main_frame", "sub_frame", "stylesheet", "script", "image", "font", "object", "xmlhttprequest", "ping", "csp_report", "media", "websocket", "other"

Update adblockplus.js for thirdParty parameter

We removed the thirdParty parameter of matchesAny() recently (adblockplus/adblockpluscore@23cd2b9). After this commit, the adblockplus.js file here no longer works.

Here are the changes I made locally to fix the issue:

diff --git a/bench/comparison/blockers/adblockplus.js b/bench/comparison/blockers/adblockplus.js
index 2125c2f..3410e00 100644
--- a/bench/comparison/blockers/adblockplus.js
+++ b/bench/comparison/blockers/adblockplus.js
@@ -6,7 +6,7 @@ const { URL } = require('url');

 const { CombinedMatcher } = require('./adblockpluscore/lib/matcher.js');
 const { Filter, RegExpFilter } = require('./adblockpluscore/lib/filterClasses.js');
-const { parseURL, isThirdParty } = require('./adblockpluscore/lib/url.js');
+const { parseURL } = require('./adblockpluscore/lib/url.js');

 // Chrome can't distinguish between OBJECT_SUBREQUEST and OBJECT requests.
 RegExpFilter.typeMap.OBJECT_SUBREQUEST = RegExpFilter.typeMap.OBJECT;
@@ -75,12 +75,10 @@ module.exports = class AdBlockPlus {
   match(request) {
     const url = parseURL(request.url);
     const sourceURL = parseURL(request.frameUrl);
-    const thirdParty = isThirdParty(url, sourceURL.hostname);
     const filter = this.matcher.matchesAny(
-      url.href,
+      url,
       RegExpFilter.typeMap[resourceTypes.get(request.type) || 'OTHER'],
       sourceURL.hostname,
-      thirdParty,
       null,
       false,
     );

PS: The thirdParty parameter is no longer needed because the function calculates this on its own as needed based on the request URL and the hostname of the document making the request (Adblock Plus issue #7260).

Consider supporting globbing in regexps

I still don't think we should support full regexps in filter syntax, although having a more limited for like globbing could allow some more efficient filters in some cases.

||foo.bar/{scripts,ads,tracking}$xhr

This would allow to encode several possibilities in a single filter, with a clear syntax.

Implement 'doctests' for filters

It would be nice to have a way to specify tests inline for filters, similar to how doctest would work in Python.

! This filter blocks ads on foo.com
! >>> https://foo.com/js
||foo.com$script

This would allow to both document filters as well as test them easily. One thing which is not clear is, what is the nicest way to specify the test cases (url, source url, type of request, etc.)

Fix regex hostname-matching logic

Some rare filters are not matched properly. For instance:

  • ||geo*.hltv.org^ should match https://geo2.hltv.org/rekl13.php
  • ||www*.swatchseries.to^$script should match https://www1.swatchseries.to/sw.js
  • '||imp*.tradedoubler.com^$third-party' should match https://impde.tradedoubler.com/imp
  • ||www*.swatchseries.to^$script should match https://www1.swatchseries.to/public/js/bootstrap-modal.js

It seems that the meaning of * depends on the context where it appears. Also, it begs the question of what should be considered the end of the hostname in an hostname anchor; currently I'm guessing this should always be the next separator. I'm hoping that all filters are consistent and follow this implicit rule but this will have to be investigated.

Allow serialization of engine even after internal optimizations triggered

Currently it is not possible to serialize the engine after internal optimizations triggered because of the change in structure in buckets. It would be nice to still keep the list of original filters before optimizations to allow for serialization at any point of time. (Note: keeping the id of filters might be enough).

Reduce serialized size further using adaptative coding of length

StaticDataView currently needs to store the size/length of some elements (e.g.: size of the string in pushASCII). We could make the representation more compact by using the strict minimum number of bytes to represent the size. We currently either use a 16 bits or 32 bits number depending on the type of data stored, but they could all benefit from a smarter encoding. For example we could use only 1 byte for length <= 127, then 2 or 4 bytes for higher values.

Create unified config

Currently multiple entities share similar configs but have their own. Let's having one instance shared instead.

Consider adding tldts in the bundle again

We currently do not depend directly upon any library to parse URLs but some public APIs require injection of a parse function which allows that. We usually use tldts for this purpose. We should consider adding it again as a dependency for convenience so that @cliqz/adblocker can be used and works out of the box. Also we could still provide the ability to by-pass the use of tldts.

Offer commonjs distribution

For usecases like Ghostery, when hosting project control the build system, Adblocker should come in source code form, so the build system may optimize dependecy loading.
Currently adblocker bundle comes with tldts embeded, which has quite high loading cost. If we distribute commonjs sources, builds systems may bundle tldts (or other dependencies) only once (or not at all if it gets exetranlized).

[Question] How to use with Puppeteer

Hello,

First of all, thanks for creating this awesome project!

I'm trying to leverage adblock capabilities into my Puppeteer code.

Basically, puppeteer allows you abort requests, so just the thing I need is to determinate if an ad request should be aborted.

As a reference, that's my current implementation for aborting tracking requests: https://github.com/Kikobeats/browserless/blob/master/packages/goto/src/index.js#L46

If I read docs correctly, I need to create a FiltersEngine instance and check for match property.

Something like this can replace the previous code:

if (abortTrackers) {
      const { match } = engine.match(req)
      if (match) {
        debug(`abort:tracker:${++reqCount.abort}`, resourceUrl)
        return req.abort()
      }
    }

I created a FilterEngine instance providing a bit rules file, as a result of concatenating the most popular rules list (easylist, etc).

I'm not sure if I'm using the wrong API method, but the point is it never match the rule, even I try with sites I know it needs to match since I have the same rules on the browser 🤔

I suppose the thing is wrong is because req from Puppeteer is not the same than your Request object? not sure

Any idea about what is happening is welcome 😅

Optimize in-memory representation of cosmetic bucket

We do not need to keep the full instance of CosmeticFilter in the list of generic rules.

Edit: after running some benchmark on loading popular domains, it seems like a major part of the CPU time is spent in getCosmeticsFilters and createStylesheet. This should be optimized away.

Investigate ways to reduce memory usage

The memory usage could probably be reduced. Some ideas:

  • Use smaz.js to compress patterns. It should be possible to compress both patterns and urls and perform the matching on compressed typed arrays directly. This would benefit both memory usage and serialized engine size.
  • Improve optimizer to allow fusion of more filters, potentially in custom forms (e.g.: automata for plain patterns)
  • Find ways to reduce the size of NetworkFilter and CosmeticFilter objects

Optimize matching with faster uin32 access in data view

Currently one of the bottle-necks in matching is getUint32() from StaticDataView, we could fix this by making sure arrays of 32 bits numbers are aligned on 4 bytes and then using a Uint32Array directly for the access on these sections of the view.

Optimize punycode implementation

The punycode implementation is already pretty fast but unnecessarily generic. We could specialize it for our exact needs and probably make it a bit more efficient.

Benchmarking code does not work out of the box

The current version of the benchmarking code (in particular the code in bench/comparison) does not work without modifications to the makefile.

I had to make the following changes to bench/comparison/Makefile to make it work:

  1. Change git:// URLs to https:// URLs
  2. Add a rule for the target adblockpluscore, which clones github.com/adblockplus/adblockpluscore and checks out a specific commit
  3. Replace requests.json with ../dataset/requests.json

Add helpers to create Request from puppeteer/electron

The library could expose helpers to help in matching/blocking requests in common environments like web-extension, puppeteer, electron, etc. This could take the form of:

  1. new helpers or methods exposed as part of the public API (e.g.: makeRequestFromElectron, makeRequestFromPuppeteer, matchPuppeteerRequest, etc.)
  2. add more examples in the example folders to show different but common use-cases of the library

Node support

Hi,

I'm trying to get this work with puppeteer and the fetch is failing so far.

There has been an error ReferenceError: fetch is not defined
    at fetchResource (../node_modules/@cliqz/adBlocker/dist/adblocker.cjs.js:3552:5

Is there a way to support this in Node?

Support $document option

Some filters specify a $document option, which means it should apply to cpt 6 (to block the loading of a document completely). Additionally, we might want to trigger the display of a warning page in this case to explain user why the page was blocked as well as propose to visit the site anyway.

Implement event logger

To help debugging, it would be nice to implement an event logger for the adblocker. This would optionally keep track of requests blocked/redirected, exceptions as well as cosmetics injected into the page.

Deserialize multiple cache buffers to one FiltersEngine instance

Hi, I have one cache file with easylist, easyprivacy etc. and I would like to also include optional, regional filters, but it seems there's no other way to do that without creating multiple FiltersEngine instances. Is there a way to do it like this?

const engine = new FiltersEngine();
engine.deserialize(buffer1);
engine.deserialize(buffer2);

I've also seen the update method, but why distinguishing cosmetic filters from network when the FiltersEngine.parse method does that, but it can be used only once per instance?

Consider using tabs.insertCSS for cosmetics

Webextension currently provides the following API to inject CSS and scripts into any tab: tabs.insertCSS. This could be used to replace manual injection from content-script.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.