The intro doc touches slightly on privacy from the perspective of random local observe

Privacy from a User Tracking Perspective?,about google/physical-web

Comments (30)

jonathanstark commented on May 18, 2024

Question: The privacy concern you raise on the open beacon page stems from the fact that the receiving application on the user's device is accessing the broadcast URL automatically in the background, correct?

If instead the user is presented with a list of URLs being broadcast in their proximity which they must then manually click to send the request, I don't think the same issue arises.

Apologies if I am missing something...

from physical-web.

commented on May 18, 2024

One possibility is that, instead of accessing the URL automatically, it must be opted-in before making a request to the server as @jonathanstark says. Known domains could be "trusted" to allow automatic GETs. The disadvantage is that all you have with the current spec before opting in is the URL. This scheme would then necessitate a way for the device to provide more data than just the URL prior to making the server request so that the user doesn't have to determine the URL's purpose from the URL itself.

from physical-web.

azdle commented on May 18, 2024

You are completely correct, I was just typing out a correction to myself. I didn't see that section on my first read through.

With the fact that you're just going to show a url, how do you expect the "safeurls.com/?id=12345" example to work. Maybe I'm misinterpreting that example, but as I understand it that is a 3rd party that is providing server-side hosting for the beacon information. Lets say that I'm going to use this to check into my hotel or something like that and they use your safeurls host, how do I know that "safeurls.com/?id=55352" means "Check Into the Hotel" without first making a request to that url?

from physical-web.

azdle commented on May 18, 2024

It does seem that your way would work if you were to use URLs like hiltion.com/checkin and hilton.com/wifi, but with a method like that you're using up a lot of bytes to make them readable and then loosing the information that you could have with a random ID, like which hotel they're trying to check into. Plus I'm not sure that most people are going to like looking at 18 character urls and trying to figure out what each means.

from physical-web.

commented on May 18, 2024

That's why you would need extra information provided by the device itself, to provide enough information to determine whether to click the link. This would be useful for the cases where the device provides data but doesn't need you to visit a URL as well.

from physical-web.

scottjenson commented on May 18, 2024

Exactly, we are VERY careful to make sure the app does not contact any of the URLs ahead of time. Only when the user asks do we show a list of URLs. The meta data we pull (for the moment, TITLE, DESCRIPTION, and FAVICON) are all pulled by a proxy. This means the user can see the list of nearby URLs with the web page having no idea. We assume that by clicking on the URL, the user is issuing some consent and at that point, the website could know where they are.

from physical-web.

azdle commented on May 18, 2024

But then this proxy knows everything, right? Or is there something that prevents the proxy from knowing what URL path is bring requested?

from physical-web.

scottjenson commented on May 18, 2024

Yes, the proxy does know everything (at least when you pull up a list of nearby beacons) The purpose of the Physical web is to create two ecosystems: 1) an open community of beacons that are openly broadcasting and 2) a community of clients that find/rank/sort/present them to users.

There is clearly some trust the user is placing in #2. However, that is exactly why this is an open project. We want there to be MANY clients out there finding/sorting all of those beacons. That type of choice provides competition and hopefully accountability.

from physical-web.

azdle commented on May 18, 2024

Okay, I just wanted to make sure I understood.

Thanks for the info.

from physical-web.

jonathanstark commented on May 18, 2024

Ah, yes. I forgot about the initial request to the URL to retrieve the metadata (e.g., title, description, favicon). Without that data, it'll be impossible for the user to know which URL to tap on, if any. And it seems like a bad idea to send that info from the beacon because of size limitations and difficulty the beacon owner would have of updating that data in the future.

NOTE: If by proxy, @scottjenson means a proxy server, I don't like that idea. Central point of failure, centralized target for hackers, central point of trust. Yuck, yuck, and more yuck :)

Backing up a step, are we overthinking this? The user's browser will not be making the request for the metadata - eventually, it'll be requested by their OS but until then, a beacon discovery app (BDA).

If the HTTP request made by the BDA passes minimal, generic HTTP headers, and doesn't accept cookies, I can't think of any way to track the user.

What am I missing? Can someone describe an exploit scenario?

from physical-web.

azdle commented on May 18, 2024

Well, there is still the device's IP address.

from physical-web.

scottjenson commented on May 18, 2024

We see the service on the phone a bit like a search engine. You actually do need something in the cloud, initially just for caching but eventually for some serious ranking (e.g. a very simplistic idea would be using click counts to help you rank URLs by popularity)

For example, when we prototyped this for GoogleIO it was clear that we couldn't have 6000 attendees hitting the 40 web page beacons over and over. We did the proxy server for the simple reason of caching. It just wouldn't be practical to have every phone download/cache everything it sees.

happy to discuss alternatives, just letting you know how we got here.

from physical-web.

jonathanstark commented on May 18, 2024

@azdle won't the IP change every time the user hops networks?

@scottjenson Can you elaborate on the need for caching in the cloud? Why not cache on device based on the response headers? (i.e., server says, "you don't have to re-request this resource for 60seconds/minutes/hours/days")

from physical-web.

jonathanstark commented on May 18, 2024

@scottjenson Regarding the search rankings, the user's device should be able to rank based on signal strength, right? You're kinda scaring me with the "there needs to be a cloud service layer" party line.

from physical-web.

scottjenson commented on May 18, 2024

You should know me by now, I'm an open web guy so my heart is in the right place ;-)

Caching: Of course, you can cache everything on the phone and if someone wants to write a client to do this, that is very much encouraged. However, keep in mind my GoogleIO scenario (which also applies to an airport or any congested place) In that case, if you open your phone and see 100 beacons, that means your phone has to hit 100 web pages to pull this information down. Multiply that by everyone in the room. It seems like a monumental waste of bandwidth and battery power.

The whole reason we're doing these clients open source (including the proxy server!) means that people can write their own. We are agnostic on this issue. In fact, we hope that others will be written to try other approaches. Also by having alternatives, we're giving users a choice as well.

As to ranking, signal strength is very good but you can image a couple of scenarios where ranking is helpful:

At your house, the lighting system is always at the top because you use it the most (even though it's signal strength is lower)
At a mall, the stores/signs/etc that are used the most are ranked higher

Of course, if you don't want that, then use a client that doesn't offer it. We're just trying to explore this space and experiment.

from physical-web.

Sneagan commented on May 18, 2024

There shouldn't be a "cloud layer" required by this spec. If people want their PWs to participate in some kind of ranking or data amalgamation system then they can route through them, but that doesn't seem to me to be part of the purpose of this spec.

from physical-web.

Sneagan commented on May 18, 2024

I like @scottjensen's response. An event could easily utilize some web sorting component and weight participating beacons differently. That should be permitted by the spec. I think we all agree that requiring something global would be bad.

from physical-web.

azdle commented on May 18, 2024

@jonathanstark If you're on wifi, then yes as you move from place to place your IP will change, but I'm under then impression that mobile devices' IPs don't change when jumping from tower to tower, so most of the time your IP won't be changing.
Also, I think @scottjenson is suggesting have a cloud layer that ranks beacons by popularity so that "Starbucks: Click here to Order" is ranked higher than "Patrick's Phone" even if it's further away.

from physical-web.

azdle commented on May 18, 2024

One thought that I had for the caching issue, was to have the device make a request to something like "example.com/.well-known/beacon" when it sees a beacon like "example.com/23fsfw33w" to get general info and ask the user if they want to start looking at the individual beacons. This might also be able to make a single download of info about what every beacon means. So maybe you go to the Mall of America, when you arrive you see your first beacon "moa.io/R/624" and your device will make a request to "moa.io/.well-known/beacon" which tells you that you can download a list of every beacon. Then when the user selects that they want to look at those beacons, it downloads something that says that anything in the "R" directory is a restroom and maybe includes a list of every store and their IDs with more info like hours and store URL. That way as you're walking around you don't need to make any other requests to figure out what each beacon is. This would be a several megabyte download, but I think that would be okay as long as you're not suddenly turning on a beacon for the first time in the middle of a crowd of thousands.

from physical-web.

scottjenson commented on May 18, 2024

This is sliding into my meta data conversation. We've been talking about a generic system that allows web pages to offer more information. One idea is to use the http://json-ld.org/ approach that many web pages use today to embed json into their pages for crawlers.

This would provide, at the top level, a MUCH more efficient way to get title, description, favicon, etc. However, it also provides an open way to deliver all sorts of interesting things such as @azdle 's suggestion that a beacon provide a list of 'sibling beacons'

from physical-web.

jonathanstark commented on May 18, 2024

Thanks for the well reasoned reply @scottjenson :-)

You should know me by now, I'm an open web guy so my heart is in the right place ;-)

True, true... Just keep in mind that there will be conspiracy theorists in the audience who don't know you personally :-)

Caching: Of course, you can cache everything on the phone and if someone wants to write a client to do this, that is very much encouraged.

So why not leave it at that as far as the spec is concerned? i.e., "Caching in the user agent is recommended."

if you open your phone and see 100 beacons, that means your phone has to hit 100 web pages to pull this information down.

Until PW is popular, this problem doesn't exist for me as user. As PW objects become more popular, my list will become unwieldy. At that point, I will select (or write) client software that allows me to filter based on personal criteria. Something as simple as "only look up the 20 closest beacons" might do the trick. It could still show the other 80 urls being broadcast and I could manually opt to look them up if I felt the need to.

Multiply that by everyone in the room. It seems like a monumental waste of bandwidth and battery power.

From the user side, I don't see the bandwidth/battery issue. Those metadata requests are only going to happen once I take an action (as opposed to in the background while I'm walking around). Plenty of typical web pages make 50 requests for data. How is this any different bandwidth- or battery-wise?

Also, from the user perspective, a request is a request... whether my phone is accessing the live web server or a proxy server or a CDN, I'm making a network request that retrieves a payload of a certain size. I don't see how a cloud caching layer helps the user in any meaningful way for the user (but I could be missing something).

Things are different on the beacon side, of course. If United airlines puts 500 beacons in O'Hare that all point at the same web server, then yes... they're going to want to do all the stuff you normally do when you're expecting a ton of web requests: caching, load balancing, CDN, etc...

The techniques for handling lots of web requests are well known. I don't see the point of considering them in the spec for PW (other than to perhaps warn broadcasters that they should think about it).

As to ranking, signal strength is very good but you can image a couple of scenarios where ranking is helpful:

At your house, the lighting system is always at the top because you use it the most (even though it's signal strength is lower)

At a mall, the stores/signs/etc that are used the most are ranked higher

These sound like user prefs to me. Favorites, drag to reorder... not a spec consideration.

Of course, if you don't want that, then use a client that doesn't offer it. We're just trying to explore this space and experiment.

Roger that. I realize that we're thinking things through here, and I applaud that. In this particular case however, we're speculating about how to solve a problem that doesn't yet exist (i.e., "We're going to need sorting, ranking, caching, etc... to solve the inevitable beacon overload!").

I'm happy to continue exploring the concepts but would recommend that we get more specific with the language about that parties in the transaction. e.g., users will benefit from beacon ranking because of X; publishers (or whatever beacon owners are called) will benefit from caching, etc.

My $0.02 ;)

from physical-web.

scottjenson commented on May 18, 2024

@jonathanstark I guess we only disagree on magnitude. I completely agree with your point that at first we'll be lucky to see even one beacon but I'd like to be sure we don't fall over as soon as things get fun...

I also agree that you only pull data down when the user asks. But without caching, if there are 30 beacons nearby, you have to download all 30 sites to get the meta data (that's a hack right now because we don't have a meta data scheme but even if we had one, it would only have partial use so this scraping is a 'last ditch' thing that will be used quite a bit)

This means it is actually a HUGE difference between using a cache and not. You'll be waiting minutes in a crowded location in order to even get the list to pick from.

As to the spec, I'm happy to say that 'caching is optional'. Our source code is there to show how you can cache. As I said before, if you want to give it shot without a cache, go for it, but it seems very likely that it'll bog down pretty quickly.

from physical-web.

jonathanstark commented on May 18, 2024

@scottjenson Okay, I'm starting to see where you're coming from. Thoughts...

I'd like to be sure we don't fall over as soon as things get fun.

Roger that. Ditto :)

This means it is actually a HUGE difference between using a cache and not. You'll be waiting minutes in a crowded location in order to even get the list to pick from.

Okay, that's a real problem but rather than solve it directly, I think it would be more useful to penalize the publishers for having slow pages. I think it's perfectly reasonable to bake server response time into the spec as an incentive to publishers to have very fast pages.

I would NOT go so far as to prescribe how publishers make their pages fast, or what format they should use for metadata, but I think it's fair to penalize them if they are slow. This may be as simple as: URLs that respond first are promoted to the top of the list.

ASIDE: If this is truly a web thing, HTML meta tags should be the only option for beacon list metadata. Publishers could make these beacon pages incredibly lean by having a virtually empty BODY tag on the page and a meta refresh in the HEAD to send the user to the actual content page if they eventually click on it. This is just one technique of many possible... I'm sure there's no reason to complicate things for PW by reinventing the web.

if there are 30 beacons nearby, you have to download all 30 sites to get the meta data

Well, not exactly... you have to download 30 sites to show ALL the meta data. You only have to download one to show SOME metadata.

Picture this:

I pull out my phone and launch the beacon app. It instantly shows me a list of all the raw URLs that were sent from beacons in my vicinity. This will usually not give me enough info to know which one to tap, but it will tell me how active this spot is and whether or not I should wait around for a couple seconds to see what happens.

Assuming that I've not set any preferences and nothing is cached on my phone, the raw urls are sorted by physical proximity (nearest to me at the top). The beacon app then sends out network requests to the raw URLs. As each server responds, the entry is converted from a raw url to a human readable "title, description, favicon" items and sorted ahead of the raw URLs.

Once the link is "enriched" in this way, the enriched items won't jump around and I can safely tap on one without fear of it jumping out from under my finger. The raw URL items will still be jumping around lower in the list but can't disrupt the ones that came back already.

This approach creates competition between publishers to have the fastest pages. I see no reason to solve this problem for them.

from physical-web.

Sneagan commented on May 18, 2024

Is there a reason that these URLs would have significant response times? As I've understood it the URL will point to the metadata and the largest item there will be a single image (in the examples I've seen here). I understand that loading 10, 20, 80 sets of metadata will take longer than 1, but we're not talking about rich web pages for metadata retrieval are we?

from physical-web.

lowesoftware commented on May 18, 2024

I think one of the amazing opportunities is that the network is open and can be peer to peer (caching or not)... but is also one of the risks. Without a circle of trust or a trusted authority as soon as any traction is gained it will be ripe for abuse. It's so cheap to have a beacon with a wide area of effect that says "Wells Fargo ATM" and have a phishing scheme without a third party to certify the service. Maybe an extreme example, but catch my drift?

If centralized repositories of trust will be used -- whether community governed or controlled by a benevolent entity like Google -- and we end up with central repositories of locations, URIs, metadata, and a measure of trust then the peer to peer broadcast almost doesn't matter anymore and we may as well just be geofenced and have the likes of PayPal or Square or Yelp or Google let us know that "something interesting to interact with is nearby, here is a list of links".

Generally, I like to trend away from the centralization to a richer peer to peer handshake (maybe centralization is an enhancement to discovery, but not a core part of it as others have suggested). It keeps my data between me and the objects I elect to use and also can provide physical security. Not only for the user, but for the vendor of the physical smart object.

To use the example on the site... if I can vend a candybar from my phone, I don't want one person to jab that button over and over -- or discover the URL from China and flood GET requests. I want to allow a user to be able to compare the Odwalla vs the Snickers bar on their phone, realize the healthy one will keep their health tracker happy, pick it, and then press a button on the vending machine to actually dispense it and finalize the transaction... or hit it with a mobile wallet or whatever.

Philosophically, I think the web page is a part of the user experience that needs to be additive to the user experience of the object. The byproduct of tightly joining the two can help provide secure and private exchanges with peer to peer communication and a physical interaction without a third party broker being required (though it may be a preferred option).

from physical-web.

jonathanstark commented on May 18, 2024

@Sneagan Yep, there is. As currently described, the URL that is broadcast by the beacon is used for two different purposes:

When the user displays the list of local beacons, the URL is used to retrieve the title, description, and favicon link from the meta tags in the head section on the target page
When the user taps in the URL in the beacon list, the default Web browser on the device is invoked and navigates to the URL.

It will likely be common for the page hosted at the URL to be quite large in order to support use #2. Since there is no way to retrieve just the head section of a page, requests sent on behalf of use case #1 would be downloading the entire (potentially large) page just to retrieve the the meta values from the head.

That said, a meta refresh or something similar would solve this problem.

from physical-web.

jonathanstark commented on May 18, 2024

@lowesoftware you raise a valid point with your Wells Fargo example, but a similar exploit already exists with Wi-Fi (access pont spoofing). The trust mechanism that makes this bendable in practice is SSL certificates, which will also work for PW.

from physical-web.

jonathanstark commented on May 18, 2024

Bearable, not bendable ;)

from physical-web.

scottjenson commented on May 18, 2024

This is a perfectly fine conversation but one which I would say should occur as an alternative client. As I said before, there are two worlds here, the world of the beacons and the world of clients. What is most important is that the beacons all broadcast URLs the same way. This allows the world of clients to experiment and try different caching alternatives (or to attempt doing away with them entirely)

However, It doesn't seem practical to assume that any webpage that plays in this space must use a new web format or a new technique (such as a header redirect) I respect the idea but part of the power of the web is in how backward compatible it is.

Keep in mind that to date, most IoT systems require a single centralized gatekeeper. That is bad, clearly. While the PW optionally can use a cache, it can have an infinite number of them so this is a far far cry from single overlord scenario.

from physical-web.

scottjenson commented on May 18, 2024

Closing for now, please feel free to open another issue if necessary

from physical-web.

Privacy from a User Tracking Perspective? about physical-web HOT 30 CLOSED

Comments (30)

Picture this:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent