Giter Club home page Giter Club logo

Comments (10)

jomo avatar jomo commented on May 23, 2024 2

There is a predefined set of scrambled text variations. SHZ seems to have two, other newspapers four. I assume this merely depends number of caching backends used by the respective newspaper where each of those backends holds one version of scrambled text.

Apart from C1-Meter-Tracking-ID, the response body is entirely static (although randomly one of either scrambled version).

Some observations:

  • I compared two texts that start with an equal-length word (Clueso, Obwohl)
    • Results were Ceuosl, Clsuoe, ueCols, usoelC and hlwboO, hwbloO, lbohwO, owblhO
    • None of the scrambled versions use the same displacement
    • πŸ‘‰ They might be using a random seed for every article πŸ˜•
  • I also compared equal words within one scrambled paragraph
    • Each word uses a different displacement
    • πŸ‘‰ They might be using a random seed for every word πŸ˜•πŸ˜•

from paywallr.

sprechsucht avatar sprechsucht commented on May 23, 2024 1

Maybe free access for 1 month may be helpful.
https://t.co/4fEXAZ58iZ?amp=1

from paywallr.

jomo avatar jomo commented on May 23, 2024 1

It seems like this kind of obfuscation is indeed coming from InterRed CMS, which is used by a bunch of German newspapers:

image

I'm listing their domains here so others can find this issue.
  • rp-online.de
  • ngz.de
  • aachener-zeitung.de / aachener-nachrichten.de
  • saarbruecker-zeitung.de
  • volksfreund.de
  • ga.de / general-anzeiger-bonn.de
  • duesseldorfer-anzeiger.de
  • wuppertaler-rundschau.de
  • dumeklemmer-ratingen
  • mein-krefeld.de
  • erft-kurier.de
  • lokal-anzeiger-erkrath.de
  • stadt-panorama.de
  • schaufenster-mettmann.de
  • meine-woche.de
  • stadt-kurier.de
  • Some of these use https://github.com/rp-online/park as a frontend.
  • The source code mentions paywallTypes such as celeraone and interred
    • interred will merely add <div id="paywall_stoerer"></div>
    • celeraone loads paywall-article.js, which has a showContent() function, but it just reloads the page after subscription :(
  • There's also some public slides from RP DIGITAL that may provide some insights into the website structure, such as the one below. It doesn't provide any details about paywalls, though.

image

  • rp-online.de does offer a free subscription when you agree to spam mails (this option only appears to be offered when using an ad blocker)

As @Philzen already pointed out, there does not seem to be any client-side deobfuscation. The very first GET response already includes the plain text when logged in. Makes you wonder why they include the scrambled text at all. Perhaps just as some kind of gimmick to make the layout of the blurred text align with that of the plain text?

I also noticed that Google does find paywalled articles when searching for parts of the obfuscated plain text. I've tried imitating Googlebot and Googlebot-News' request headers, but always got the obfuscated text. Setting X-Forwarded-For to a Googlebot IP address didn't help, either :/

from paywallr.

Philzen avatar Philzen commented on May 23, 2024

The text scrambling seems to happen server-side. When visiting as a logged in user, the HTML repsonse does not contain any .blurred nodes (that CSS-style is not even included), and the articles' content is fully readable in the raw HTML.

The mentioned script (here is a deobfuscated version btw) seems to only deal with displaying the paywall / registration options.

However - as the scrambled text still closely resembles the original words, there may be a reversable algorithm at work here. Do we have any indication on what software they're using?

Also, it bugs me why that scrambled original text is part of the raw response anyway... what could be the advantage of doing so (SEO)?

from paywallr.

tobimori avatar tobimori commented on May 23, 2024

Yup, also I figured out that the text gets rescrambled when opening the same article in a new tab.

As I said, they move all chars in a string split by space randomly.
It won't change with a simple reload, but with deleting cookies and starting a new session.

from paywallr.

Philzen avatar Philzen commented on May 23, 2024

Damn, you're right.

It won't change with a simple reload, but with deleting cookies and starting a new session.

Actually, i don't even need to delete cookies. It's enough to open in a new tab, open up dev tools with "Deactivate Caching" on the network tab and it gets rescrambled. At other times, only opening a new tab has it rescramled.

I noticed the c1_headers var (global on window object M-) ) changes with different scramblings served, but not consistently. c1_headers.C1-Meter-User-Group-Action.cms_snippet.parameters sometimes contains an additional entry, which i could observe to be either Unregelmaessiger Nutzer, Zufaelliger Nutzer or none.

Not sure if this is interconnected or purely related to tracking.

from paywallr.

tobimori avatar tobimori commented on May 23, 2024

Seems to be related. C1 probably stands for CeleraOne which is a berlin-based company focused on creating paywalls. On their page, you can find a review by Nicolas L. Fromm which is the CEO at Digital (?) of medienhaus:nord, the company behind shz and others.

from paywallr.

Philzen avatar Philzen commented on May 23, 2024

Just wanted to read an RP Online article and when i looked at the HTML, the way the text is scrambled looks pretty familiar. Different CSS classes though, and the global window object to init & track the user paywall is called cre, but the way the words are scrambled (clearly word by word, re-scrambled on a per-session-basis) and the setup seems similar.

Also, when i go back to shz.de as a logged in user, this is the cookie being sent with every request:

creid={some_integer_id}; ovs_shzsessionid={some_hex_id}; cresid={some_hex_id}; shzsessionid={some_hex_id}; epaper_shzsessionid={some_hex_id}; xdefcc={some_hex_id}; BIGipServermhn_pay_http=1063546890.20480.0000; communitycookie=CPT/6wm15cSxqAx/doGV9OJ6X4o4etVS0Rdi762ab/AWn5j3RMygo64BU26Ru9Wx; communitydata="GCvbzP3QZew4hg1TzTamng=="; communitydata2=522997||; _cmv=1

So, on seeing creid it looks very likely they use one and the same product.

from paywallr.

tobimori avatar tobimori commented on May 23, 2024

The RP Online CMS is definitely InterRed Online.

from paywallr.

Philzen avatar Philzen commented on May 23, 2024

So how do we get ahold of their scrambling algorithm? Applying for a demo account probably may not help, as we most likely would need to get a peek into the server-side code.

from paywallr.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.