Giter Club home page Giter Club logo

trex's Introduction

Notice of discontinuation

August 2023: Dear reader, as you may have read on the Tracking Exposed website, we're going through a complete restructuring/rebrainding/reframing. This repository, which hosts the TikTok and YouTube analytics toolkit, inherited code from 2016 and was becoming cumbersome to maintain. It has also undergone a major progressive refactoring from javascript to typescript over the last two years, as part of a more robust release pipeline that has served our algorithm accountability efforts.

This code was originally forked from the repositories used to analyse Facebook (extension and backend), and from there the AGPL-3 licence is inherited.

Other sites that have a more accurate deprecation notice are


Tracking Exposed toolkit

This monorepo will eventually include all packages needed and platforms supported by Tracking Exposed:

Requirements

  • node >=16
  • yarn >=3.2.3
  • node-canvas deps depending on your OS
  • docker

Monorepo structure

To start the services in production:

yarn pm2 start platforms/ecosystem.config.js
yarn pm2 status

(also for extension reviewer) how to build the extensions:

  • tiktok: yarn; yarn tk:ext dist; ls platforms/tktrex/extension/dist/*.zip
  • youtube: yarn; yarn yt:ext dist; ls platforms/yttrex/extension/dist/*.zip
  • youchoose: yarn; yarn ycai dist; ls platforms/ycai/studio/build/extension/*.zip

To assist debug

you might run yarn tsc-diagnostics and check out the content of diagnostics/ directory.

Supported Platforms

The browser extension of tiktok.tracking.exposed the tiktok algorithm analysis toolkit for researcher, power user, and algorithm analysts.

The browser extension of youtube.tracking.exposed the youtube algorithm analysis toolkit for researcher, power user, and algorithm analysts.

Initially sponsored by ALEX from University of Amsterdam DATACTIVE reseaerch group. Maintained by the Technical team of Tracking Exposed, more details on youtube.tracking.exposed.

A complete Pupetteer wrapper to orchestrate reproducible data collection with YTTrEx extension, documented with the name of Guardoni

Maintained by the Technical and Research team of Tracking Exposed, more details on youtube.tracking.exposed.

The browser extension for YouChoose.AI and studio dashboard studio.youchoose.ai

Sponsored by the European Commission Ledger project in 2021, Develope by the technical team of YouChoose AI a project by Tracking Exposed. It is listed separately as we consider YouChoose should develop its own governance, reach out to us if you want to know more.

Note on supported platforms

As you can see in Tracking Exposed website a few other platforms are supported, work in progress, or discontinued. For example: Pornhub, Facebook, Amazon. They are not imported in this repository, but making this repository a shared resource and a monorepo is part of the refactor begun in 2021.

Packages

A portable data table written in React to display TRex data by pre-configured API.

Tests

Tests are powered by jest and can be run all at once

yarn test

or by specific workspace

yarn yt:ext test

Run spec tests

To execute all the spec (unit testing) test files in the repo run:

yarn test spec --coverage

Run end-to-end tests

yarn pm2 start platforms/ecosystem.dev.config.js --env test
yarn test e2e
yarn pm2 stop all

Coverage output

To produce a coverage report run

yarn test --coverage

and the output will be produced at coverage/lcov-report/index.html

License

Affero-GPL 3, as file attached in this repository display.

trex's People

Contributors

ascariandrea avatar cramdoulfa avatar dependabot[bot] avatar djfm avatar howjmay avatar jaromil avatar kratacoa avatar nkint avatar rekoke avatar salvatoreromano1 avatar semantic-release-bot avatar spaghettinucleari avatar vecna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

trex's Issues

decide what to do about top-right advertising

image

  • should it be removed ?
  • should we inject our recommendation on top of it?
  • should be infer the present of "anything" and delete whatever youtube might put there?

and whatever choice we can take: should it be an switch in the popup?

[Injected] YT selectors query and cache

As for now the youtube dom selectors we use are hard coded into the extension, so it would require a new release when YT changes some of them.

To resolve this problem we need to add an API request that returns the updated selectors and re-fetch it every hour or so.
Then, the selectors are stored in local storage and communicated from the background to content_script

Support Edge and Safari

Research on the method, issues and task to be done, to port the web-extension on the aforementioned platforms

OGP enforcing content

As discussed with @djfm and @ascariandrea, the OGP api when receive a new URL recommendation, MIGHT save in the database empty values (null) but MUST return valid value to comply with typescript enforcing.

This open to three conditions:

  1. URL don't have a title: what can we return? should we accept it? (my suggestion is that, if title do not exist, we should replace it with the bare URL)
  2. URL don't have a description: but that's ok, it can be an empty string.
  3. URL don't have a picture: we'll produce some default one, and would be served from youchoose.ai server.

Is it correct? thanks!

Check recommendation by url on creation

When the user adds a new recommendation the API doesn't check if there is already a recommendation with the given URL and adds it anyway.

We can change the behavior to check if there's already a recommendation with the given url in db and return it when found.

This will prevent duplicated recommendations that produces dis-aggregated stats.

research - htmls.id and metadata.id generation changed

still we should check if exist the same ID with different href

-        const id = utils.hash({
+        const metadataId = utils.hash({
             publicKey: headers.publickey,
+            randomUUID: body.randomUUID,
+            href: body.href,
+        });
+        const id = utils.hash({
+            metadataId,
             size: _.size(body.element),
             contenthash: body.contenthash,
-            randomUUID: body.randomUUID,
+            href: body.href,
             i,
         });
-        const metadataId = utils.hash({
-            publicKey: headers.publickey,
-            randomUUID: body.randomUUID,
-        });

implement new APIs and update old APIs for the new 'personal' page

This issue keep tracks of the API to be tested and updated for the new personal page, @lc-d

  • add totalEvidences in personal API, it is necessary for the paging
  • the fields observed in should contains only the fields: { videoId, id, title, author, when, relative }
  • the evidence removal method: DELETE /api/v1/personal/:publicKey/evidence/:id

parser update and language coverage

 pubtimeAPI Special management, 'relative timing!' (zh-CH) 2019年4月24日 with clientTime 2021-01-22T19:11:16.000Z +0ms
  pubtimeAPI Relative time string missing? |2019年4月24日| +1ms
Deprecation warning: value provided is not in a recognized RFC2822 or ISO format. moment construction falls back to js Date(), which is not reliable across all browsers and versions. Non RFC2822/ISO date formats are discouraged and will be removed in an upcoming major release. Please refer to http://momentjs.com/guides/#/warnings/js-date/ for more info.
Arguments: 
[0] _isAMomentObject: true, _isUTC: false, _useUTC: false, _l: undefined, _i: invalid date, _f: undefined, _strict: undefined, _locale: [object Object]
Error
    at Function.createFromInputFallback (/home/oo/Dev/yttrex/backend/node_modules/moment/moment

new API to be implemented

  • export of email (similar to questionnaire)
  • statistics of email and questionairre
  • access to advertising by personal access token
  • access to advertising by content producer
  • statistics on advertising

info box closing

Please, it is possible when the popup is open, any interaction in youtube window might close it?
I'm not sure if is better to bind any click in or if there is a better approach.

Enhance and standardize extention pop up

potrexetention

The image above shows the look of poTREX web extension, it's a nice one since it uses the pornhub.tracking.exposed color palette and since there is the Contact Us hyperlink. It would be great if we could render the yTREX pop-up with the same aesthetic style - using the yTREX color palette - and create a hyperlink to send an email to the yTREX team in case of necessity. Finally, it might worth considering adding an additional hyperlink to an Ethics page, like this one

ytrexextention

methodology bug

python3 bin/autowatcher.py config/prova.txt
Profile directory found! prova
Opening https://www.youtube.com/watch?v=qaM80BjvLuA

Video Status: 1 playing
Returning snaps/prova/prova-1-snap-1.png
Error in checkStatus Message: unknown command: session/cf810a5967bf8948f86160b0f3871db6/element/0.41869062236446575-1/screenshot

Opening https://www.youtube.com/watch?v=6McuxV0Krxw

After running python3 bin/autowatcher.py config/prova.txt
we get the following error:

" Video Status: 1 playing
Returning snaps/prova/prova-2-snap-1.png
Video Status: 1 playing
Returning snaps/prova/prova-2-snap-1.png
Error in checkStatus Message: unknown command: session/cf810a5967bf8948f86160b0f3871db6/element/0.42571308286020115-1/screenshot
Test completed: closing "

wetest1: fields last fix

the field evidences is useless because recommendationOrder is more expressive, and has the same informative value.

the field recommendedLengthText is useless, it was there only as double check

Improve Image component to avoid glitches

Currently the Image component uses a brute force strategy to blindly try 2 possible ways of loading an image and avoid failures due to content security policy.

If it tries the wrong way first then the image will very briefly appear as not found and errors will be seen in the console. This is especially visible and annoying in the drag and drop.

A more subtle way to proceed would be to send an OPTIONS request first and then choose the appropriate image loading strategy.

progress into version 1.8

Several different patches and developments are needed for version 1.8, which implements necessary and missing features for Guardoni, privacy settings, overall step forward to scraping accuracy, and a small architectural upgrade by defining server-side objects in typescript.

In pull request #100, the three changes are to be tested and reviewed. A few of these belong to previously separated branches, but they are displaying interconnection, so now all are merged in this development branch. To be completed or reviewed:

  • client-side privacy settings: by default, it must send non-link-able information. This is the setting that will initially be used on youchoose, and should be a boolean selector eventually controlled by popup.
  • search and advertising scraping belonged to an experimental research that works differently from the standard. The extension now also should collect offset X and Y from the HTML elements in the DOM.
  • the extension has an internal cache optimization (before a lot of useless data came into the "labels" collection and the addressing IDs were still to be understood).
  • The parser and logic have been named 'leafs'. A text circulated in our internal discussion platform better explain the backstory in the scraping approach. leafs might be the first things we want to document (perhaps wirh the API v3) because the new design might speed up our ability to support new platforms if properly supported by tooling.
  • the new way to keep track of experiment in guardoni, by 1) uploading a given CSV in the server and receive and experimentId 2) use this experimentId to share the same settings 3) mark you execution with experimentags.
  • gradual migration / coverage of typescript in progress. A fews API (youchoose.js and the v3) uses object decode to ensure data with the client. In this branch Bluebird and nodemon have been removed as old dependencies.

Unlink channel should ask for any kind of confirmation + channel box

A link by mistake would force the content creator to repeat the signup process, instead should offer any form of second thought.

TODO: add a modal dialog (not an alert()) with text "Are you sure you want to unlink your channel? If you unlink your channel you will have to authenticate again. Your recommendations will remain visible on the platform." and Yes/No buttons.

adopt pupeteer-core as dependency

we need to switch dependency from puppeteer to puppeteer core because:

solution:

  • we need the default installation path of MacOS, Linux, Windows, hardcoded in guardoni.js
  • we need a quick test with fs.exist API to spot if any chrome path is found or not
  • this would replace the nconf.get('chrome') path

beta2 todo list

  • removed all the pug
  • updated README
  • imported stats from fbtrex
  • imported mirror from fbtrex

guardoni update

  • create a new variable in the csv with the time used for each Guardoni script (3000, end, small....).
  • fix experimentnumber
  • create a new variable for all the csv in yttrex, with the ReccomendedVideoCharset.
  • recognize the ad before play the video: play the ad but not keep playing and stop it every few seconds.

With the prospective of MechanicalKurd

  • 1) show thumbnail
  • 2) a way to add a qualitative category to the csv for each video

Disable autoplay

Automated profiles should automatic disabling autoplay when watching videos

Warning messages when local server is started on OSX

When runing npm run watch in /yttrex/backend, the server starts but shows the following warnings:

(node:55537) Warning: Accessing non-existent property 'count' of module exports inside circular dependency
(Use `node --trace-warnings ...` to show where the warning was created)
(node:55537) Warning: Accessing non-existent property 'findOne' of module exports inside circular dependency
(node:55537) Warning: Accessing non-existent property 'remove' of module exports inside circular dependency
(node:55537) Warning: Accessing non-existent property 'updateOne' of module exports inside circular dependency

Lab view should offer more information

(In order of priority)

  • Videos that have been customized should be seen in the UX
  • As only the last 30 videos can be imported, crucial is to allow content creator to add new videos by specifing the URL (still missing the backend part, it should be considered a 1 day taks)
  • Indicate how many independent observations and/or observed advertising is available per video
  • Compare / Related API might be integrated once the right UX is found

discussion on "example" query list

In the configuration file, you can point to any "discussion" URL, a forum, an instant messaging channel. Whatever works fine, as long as someone makes a pull request on the configuration file containing all the query term association for the different experiments running.

Currently the query list is made by innocent English animal names:

    "monkey",
    "sloth",
    "cappuccino monkey",
    "mountain goats"

bin/searches parser has a logic issiue in treating non-valid DB entry

Sun, 27 Sep 2020 10:20:52 GMT yttrex:label:OVERFLOW <NOT> 21 documents Sun, 27 Sep 2020 10:20:52 GMT yttrex:searches Processed 21 entries, effective searches 0, total searches video entry 0 Sun, 27 Sep 2020 10:20:58 GMT yttrex:label:OVERFLOW <NOT> 21 documents Sun, 27 Sep 2020 10:20:58 GMT yttrex:searches [+] 21 start a new cicle, 21 took: a few seconds and now process 21 htmls Sun, 27 Sep 2020 10:20:58 GMT yttrex:searches Processed 21 entries, effective searches 0, total searches video entry 0

This cause a problem because the tool never get quieter

produce the .zip for 1.8.99 version

  • removed existing "default-opt-in-extension" branch
  • fork from a stable 1.8.x
  • change versioning and extension/src/chrome/background/account.js
  • draft a release
  • update guardoni executable to match the new default extension
  • rebuild guardoni executable

Allow preview editing and define size limits

  • Currently the use the information fetch from open graph protocol, to define the recommendation. But they might be not the best description a creator wants. We should discuss a simple click-edit interface to allow creators in editing title/description.
  • An opengraph do not have any size limit, so even description 500 chars long might be acquired. They end us breaking or ruining a bit the experience of the watcher, so we should have a char limit,
  • We should also decide what to do with the additional chars (show more? force the text cutting? reduce the font ?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.