Giter Club home page Giter Club logo

Comments (18)

rosswintle avatar rosswintle commented on May 20, 2024 2

This is an interesting discussion and I hope you don't mind me both chipping in and following. @dannyvankooten - you look like you're WAY more into this than I am. But I'm currently working out this exact same thing on my (much smaller and probably less viable) Kownter project. In fact, one of the reasons I'm building it is to see what can be done without cookies.

Some kind of aggregate flow between pages as @da2x suggests is pretty much the thinking that I came to . I don't need to see individual's paths if I can see the aggregate drop-off as people move through the user journey. In this post I attempted to explain this, saying: "we can still report the ratio of conversions against page views."

OK, so there are probably some advanced cases where you want to know more than that but if you want that then you probably actually want GA, right?

I've also been wondering if it makes sense, is useful, and is performant enough (server-side) to store a cookie, but to recycle it on each visit. And, if you do this, is it any more private than just setting a cookie and leaving it there? I'm not sure it is.

I am not actually convinced that a session ID is personally identifiable as there's no reverse-lookup. If you were storing a session ID alongside an IP address then that would be different. I know GDPR says something about web-tokens, but I think what you/we are doing here is way within the spirit of GDPR.

Interested to see how this progresses.

from fathom.

da2x avatar da2x commented on May 20, 2024 2

@rosswintle, put this on repeat and get back to working on Kownter. It’ll be a great alternative (more alternatives is great!) and you just need to stay motivated. Make it your own and it’ll probably turn out great. Nick some stylesheets and graphs from Fathom if you like the visuals and stuff your own data in them.

This isn’t legal advice and I’m not anyone’s lawyer. The following could very well be totally wrong: Fewer magical identifiers means more transparency. It also mean people won’t contact the operator of the analytics service to ask for a copy of the data belonging to $magical_token or ask to have the data of $magical_token deleted. I specifically suggested cookie names that were named after the data they contain instead of a single magical cookie containing all the data. Individual cookies are more easy for people to inspect. Opting out of this is as simple as disabling cookies, and their use is easy to document in a privacy policy.

The following is provided for context (GDPR sections mentioning identifiers as relevant to this discussion):

GDPR Recital 30:

Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

GDPR Article 4 Definitions: Point 1:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

Hey @da2x,

Thanks for the suggestion and the thought out comment. This is great stuff!

We've been contemplating on whether we want to use fingerprinting vs. using a cookie and came to the same conclusion as you, using a cookie is actually more privacy-friendly as fingerprinting as with the cookie users can delete them in order to be forgotten.

Eventually we want to support "visitor paths" and I have yet to think of a way to do that without assigning a unique identifier to each user. This can and should definitely be anonymous though, imo. Do you have any ideas on how you'd tackle visitor paths without the use of identifiers?

from fathom.

da2x avatar da2x commented on May 20, 2024

@dannyvankooten, how is information about the visitor path more useful that just storing the referrer information for a given page? If you can tell me more about the specific requirements of this feature, I may be able to come up with another solution to get the same data.

Just use the Referer header. In aggregate, it’s enough data to build a visitor-path without profiling individual users. E.g. 6 % of visitors to /b had /a as referrer and 3 % of visitors to /b were referred to by /c, etc. You can’t tell the journey of one specific visitor, but you can determine that page /a is better at referring users than page /c. If you then look at which pages referred to /a you can build a path:

/b referrers:
- /ddg.co 60 %
- - /a 6 %
- - - /ddg.co 80 %
- - - /d 8 %
- - /c 3 %

(Speaking of aggregate data and referrer information: unique referrer data should be dropped after a week. E.g. if only 2 clicks came from example.com/webmail, then delete the link after a week. If 600 people came from reddit.com then that should be considered non-unique and you can keep storing it.)

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

Hey @rosswintle,

I definitely do not mind - quite the opposite. Thank you for chiming in! Kownter looks super interesting, I'm glad that there are more people thinking about solving analytics in a better way and providing more options besides "just use GA, even though you don't use 90% of what they're collecting".

I'm going to go through all of your posts as there's most likely a ton to learn - I'm only just getting back into this project and forgot about a lot of the decisions that went into this when I started 18 months ago.

I am not actually convinced that a session ID is personally identifiable as there's no reverse-lookup.

Same for me, although there may be other off-site identifiers that will still give away this particular user (eg cross referencing timestamped actions in app?).

Anyway, at this stage I lack the intricate details to really add anything to this discussion, so I'll get right to reading your posts and experimenting to see if there's something we can do to tackle this. Dropping the unique visitor ID entirely is definitely worth striving for, I'd say!

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

Great! I think with the backing of @pjrvs and whatever design skills you have that are way better than mine, your project will be much less of a toy/experiment than mine. Plus, writing it in Go probably helps with the scaling a LOT (though I think the downside is that self-hosted deployment might be harder for some people).

Great that some of what I've done developing the early stages in the open might help. Always happy to talk about the experience! I'll leave you and @da2x to work out the specific issue listed here.

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

Quick progress update: Fathom now relies on storing all visitor-specific data on the client side so that the server only has to keep track of aggregated data. Not only did this allow for much simpler code and better privacy (as discussed), it'll scale a whole lot better too. So thanks @da2x and @rosswintle for lending your brains here. Super helpful!

Right now visitors are still assigned a random ID but it is only stored for a theoretical maximum of 30 minutes (the expiration time of a session). If a visitor visits multiple pages, all pageview hits except the last one is deleted within 5 minutes (the time between aggregation).

The visitor ID is only needed to give an indication of realtime visitors (that is: distinct visitors that did a pageview or performed another event within the last 5 minutes). I haven't yet been able to come up with a way to do that without storing some kind of short-lived identifier on the server, but let me know if you have any suggestions here please.

Besides the visitor ID (a random string), no other identifiable data is stored anymore. 🎉 🍾

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

Could you, for the purposes of real-time unique visitors, just ignore any hits with an internal (same site) referring page?

(This was a super-quick thought I wanted to jot down. Will properly read and think another time.)

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

This also gives me the thought that you can set cookies to identify returning users without having an ID in the cookie.

You just set tracked-with-fathom = 1 and pick it up server side.

The PECR rules would still (currently) need the existence of the cookie to be disclosed to users. But it should keep you free of GDPR “personal data” rules.

Hmm. 🤔

from fathom.

da2x avatar da2x commented on May 20, 2024

Quick progress update [...]

Greatly appreciated.

Could you, for the purposes of real-time unique visitors, just ignore any hits with an internal (same site) referring page?

I was about to suggest the same. The Referer header holds this information. I don't see what a random ID adds here.

This also gives me the thought that you can set cookies to identify returning users without having an ID in the cookie. You just set tracked-with-fathom = 1 and pick it up server side.

That serves the same purpose as the last-visit cookie I suggested earlier.

The PECR rules would still (currently) need the existence of the cookie to be disclosed to users.

(This isn't legal advise, and I'm not a lawyer.) The ePrivacy Directive requirements varies greatly from country to country. Some require consent and opt-in, some require a prompt and opt-out, some just requires information, and some require browser settings to be respected (cookies, DNT). The ePrivacy Regulation (late 2018/early 2019?) changes this mess to the last option (browser settings) plus detailed documentation (privacy policy).

I'm personally aligning all my thinking with the ePrivacy Regulation + GDPR. The ePrivacy Directive and GDPR are mutually exclusive as far as I understand it, so the current situation is unclear. However, aiming for transparency (purposeful cookie names and matching privacy policies) and aiming for data minimization and avoiding any kind of user-profiling should be the way to go to be compliant with the intent of European regulations.

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

That’s neat...I’d not thought of using cookie existence to track a return visit. I like that.

And yeah, you’re right. Without IDs you could track NEW live visitors, but not those that had been around for a while...this could be an explained limitation...or just use your temporary IDs!

Great discussion and thanks for the update.

from fathom.

da2x avatar da2x commented on May 20, 2024

Use the lastvisit cookie? If last visit was within five minutes then count as pageview but not unique session/person, and more than five minutes (or unset) then it's a unique session/person.

Record the lastvisit offset in minutes. E.g. add a plus one count in table active_visitors with columns datetime (minute precision), 0min, 1min, 2min, ... 15min. For any minute of the day, you'd be able to see you can see how how many active sessions/people versus total pageviews by comparing pageview counts to the active_users table. This table should be cleared regularly, of course, and data stored in aggregate in a more practical table.

This should get you the info you want from just a timestamp cookie.

from fathom.

ckluis avatar ckluis commented on May 20, 2024

I'm 100% in favor of using no identifiers, but you could still store travel path

url: /          datetime: 1/1/2001 1:00
url: /contact   datetime: 1/1/2001 1:01

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

Quick update: current master does no longer store the anonymous session ID but moves knowledge of the previous pageview to the client instead, so that the client can tell us about it instead of us (the server) having to store something to get to that.

For a new visitor, the data sent to the Fathom backend will now look something like this. Given 3 pageviews:

GET /collect?id=abc&previous_id=&page=/about....
GET /collect?id=def&previous_id=abc&page=/about...
GET /collect?id=ghi&previous_id=def&page=/...

The "previous ID" is used to update the previous pageview (not a bounce, update time on page) but is not stored, so that each row in the pageviews table has nothing that leads back to the visitor generating that table entry.

id hostname page new_visitor new_session unique_pageview bounce duration referrer timestamp
abc http://site.com /about 0 0 1 0 0 2018-07-11 12:49:04
def http://site.com /about 0 0 0 0 120 2018-07-11 12:51:04
ghi http://site.com / 0 0 1 1 120 2018-07-11 12:53:04

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

That's very cool/smart.

I guess the ID is stored in the cookie for use in the next page view, right? But what does the ID actually represent and how is it generated? Is it some kind of hash of timestamp and device fingerprint?

from fathom.

rosswintle avatar rosswintle commented on May 20, 2024

(I'd read the code, but I'll be AFK very soon)

from fathom.

dannyvankooten avatar dannyvankooten commented on May 20, 2024

The relevant part is in assets/src/js/tracker.js. Mostly:

  const d = {
    id: util.randomString(20),
    pid: data.previousPageviewId || '',
    ....

So it's just a (truly) random string of 20 characters, leaving a small chance of collisions but given that the pageview table is cleared every minute I'd say the chances of that happening are nil. And even if it happens, it won't have any real consequences.

from fathom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.