Giter Club home page Giter Club logo

Comments (16)

mozfreddyb avatar mozfreddyb commented on July 30, 2024 2

Hold on. You can easily support binary AST with SRI as it is!

Example:

<script src="https://example.com/example-framework.js"
        integrity="sha384-hash-of-normal-JS-file
                   sha384-hash-of-binary-ast-file"
        crossorigin="anonymous"></script>

The user agent will notice that there are multiple hashes with the same strength (i.e., sha384), so only one of them has to match. User agents supporting binary AST, will receive a file that matches the second hash. User agents without support, will receive the JS file, that then matches the first hash.

(This is a rephrasing of Example 7 in the SRI specification. I've quited it for this example and rephrased for clarity, but feel free to read the original source!)

from proposal-binary-ast.

otherdaniel avatar otherdaniel commented on July 30, 2024 1

My context on 'binary AST' is a bit outdated, but my understanding is:

  • There is not (and cannot be) a 1:1 mapping between source and "binary AST". E.g., the binary AST drops source code comments, non-relevant whitespace (that is, whitespace outside of string/template literals), and maybe (some?) variable names. If so, you cannot reconstruct the original. If so, you also cannot compute the original's hash. (The FAQ makes a similar point.)

  • I'd know in theory how to build a hash that can survive this transformation (by normalizing the expendable parts in either representation, and then hashing), but that would effectively force a nearly-complete parsing step during hash calculation. I'm going to suggest that isn't happening.

  • I think that considering 'binary AST' to be a transfer encoding of .js is just not tenable. (Also for other reasons, like the "Early Error Semantics" chapter on the page.) I'd think a 'binary AST' is for all intents and purposes a separate resource, each with their own hash sums (over their respective byte sequence representations).

  • It's up to whoever includes that separate resource to also supply the appropriate SRI attributes. If both resources are served under the same URL, then all hash-sums should be in the integrity=... attribute.

I suspect this answer won't make Yoav very happy, but I'm having a really hard time imagining a solution where 'binary AST' could be served transparently and with integrity. 'Binary AST' just does a lot more than a mere content encoding could be expected to.

from proposal-binary-ast.

Yoric avatar Yoric commented on July 30, 2024

If I understand correctly, the hash is independent from the content encoding, right? If so, that's going to be complicated.

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

There are two options here: one is to hash the (normalized) source text. Another is to hash the "simple" encoding of BinAST, prior to any compression steps.

The latter seems more appropriate in this case.

from proposal-binary-ast.

yoavweiss avatar yoavweiss commented on July 30, 2024

I don't think it's appropriate for a transfer-encoding to change the semantics of SRI (e.g. gzip and brotli don't)

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

@yoavweiss I'm not sure I understand where there would be a need for semantic changes. Could you elaborate?

from proposal-binary-ast.

yoavweiss avatar yoavweiss commented on July 30, 2024

Currently SRI hashes are hashes of the content before gzip/brotli are applied. If AST encoding is just a content encoding, the same principles should apply, and SRI hashes should be calculated before AST encoding and after AST decoding is applied.

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

But can't we express this simply as a hash variant, which is already a supported concept in SRI?

More formally, a hash value H(F(x)), where F is some normalization function under some equivalence class we care about, can simply be restated as G(x) where G is treated as a slightly modified (but trivially of equivalent strength) hash function.

This really feels more like a nit than an actual issue of semantics.

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

@yoavweiss Hold on, I think I understand the problem a bit better now. I see where the issue is.

The problem is we're multiplexing the URL to serve both BinAST and plainjs files, but we only have one SRI to serve up.

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

It seems there isn't a way to slice this salami without introducing a hash specifically for the BinAST code. This would require the referrer page to include two hashes. From a standards perspective this is not a major issue - an extra hint attribute that will be ignored by other browsers. Firefox, when requesting SRI-checked resources, would add the binast mimetype to the accept header when it detected the presence of the second hash, and verify using that.

The problem here is that it requires changes on the content provider end - the referrer page must be modified.

However - I'd assume that SRI hashes are generated by toolchains these days anyway (as you'd want to recompute them on changes to source). Is that the case? If so whatever process that is should be modifiable to also produce a BinAST hash and include it as well.

@yoavweiss What do you think?

from proposal-binary-ast.

yoavweiss avatar yoavweiss commented on July 30, 2024

The problem here is that it requires changes on the content provider end - the referrer page must be modified.

Yeah, that adds a lot of complexity to the developer's flow and forces the page to know if some of its scripts will be binAST encoded, and if so, add two hashes instead of one.

However - I'd assume that SRI hashes are generated by toolchains these days anyway (as you'd want to recompute them on changes to source)

If you have a script that blindly adds SRI hashes, that won't help you if/when the origin gets hacked (which is a major use-case for SRI).

Overall, this seems like a discussion that should happen with the SRI folks.

/cc @mikewest

from proposal-binary-ast.

mikewest avatar mikewest commented on July 30, 2024

@mozfreddyb, @fmarier, @metromoxie, and @devd are the "SRI folks". :)

@otherdaniel might also have thoughts.

Also, https://tools.ietf.org/html/draft-thomson-http-mice-03 is relevant.

from proposal-binary-ast.

Yoric avatar Yoric commented on July 30, 2024

There is not (and cannot be) a 1:1 mapping between source and "binary AST". E.g., the binary AST drops source code comments, non-relevant whitespace (that is, whitespace outside of string/template literals), and maybe (some?) variable names. If so, you cannot reconstruct the original. If so, you also cannot compute the original's hash. (The FAQ makes a similar point.)

Variable names are maintained, and we have ideas for making source code comments stripping optional, but yes, that's the general idea.

I'd know in theory how to build a hash that can survive this transformation (by normalizing the expendable parts in either representation, and then hashing), but that would effectively force a nearly-complete parsing step during hash calculation. I'm going to suggest that isn't happening.

Ah, well, I was about to suggest that.

Out of curiosity, when (and how often) is hash calculated?

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

Talking around the office, a colleague observed that what we are trying to do here is in effect comparable to srcset tags on images. Conceptually: we want to identify an abstract resource which can be supplied by one or more different (but equivalent, under some criteria) representations of it.

In general I agree with @otherdaniel's assessments. I'm not sure I agree on the "it's not a content encoding" bit. We're running into this issue because we're using hashes to check resource integrity, and hashes are inherently tied to the representation of a particular piece of content.

They're convenient because representational equality subsumes all other equivalence class models axiomatically. As you noted, theoretically we could store delta(normalizedJS, originalJS) along with the BinAST representation and use that to satisfy the single SRI requirements. The reason we don't want to do this is purely performance and unnecessary complexity.

Philosophical waxing aside, though, I agree it seems we can't slide this through purely transparently on a mime-type basis and still keep SRI support.

from proposal-binary-ast.

otherdaniel avatar otherdaniel commented on July 30, 2024

@Yoric Currently, in Chrome/Chromium, the hashes are checked once, after the network has delivered the last byte to the renderer, just before the resource is being used. There is a very annoying but hard to fix bug where sometimes that doesn't work and we reload and recheck the resource. The intent is to move this 'lower' into the browser process or network service, although I'm not sure if or when this is happening.

@kannanvijayan Granted, one can see the "content encoding" thing either way.

One additional thought: Hashes apply universally to all resource types, and have well-understood security properties, and are hard to mis-use. I bet that once a js-equals-binast-hash is created, some clown will create a pair of .css files (or other resources) that are equivalent under that hash but have otherwise quite different properties. And while obviously a js-specifc hash shouldn't be applied to non-js resources, similar things have happened elsewhere (e.g. MIME-type confusion attacks) and this might lead to similar problems.

@mozfreddyb Yes, that works as of today. I think the use case implied here is that 'binary AST' can be applied transparently by the web server or a CDN, just like those instances could decide to apply gzip without requiring the page author to change the page. I find that a super valid use case, and without a capability like that deployment will be a good bit harder. But so far I'm not seeing a good mechanism that would facilitate that.

--

Generally speaking, I expect a custom hash with any appreciably complexity is going to be a very hard sell, to both implementor and security communities.

from proposal-binary-ast.

kannanvijayan-zz avatar kannanvijayan-zz commented on July 30, 2024

@mozfreddyb I did not realize the integrity attribute supported multiple hashes! Thanks for bringing that to our attention.

As @otherdaniel noted, it doesn't get us to full mimetype-only level transparency, but it's still a far step above another hint attribute on script tags. Good to know!

from proposal-binary-ast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.