Giter Club home page Giter Club logo

Comments (28)

jobertabma avatar jobertabma commented on July 30, 2024 7

What if YAML would be used for the entire file @oreoshake? To me, that gives the convenience of having a structured, readable, and consumable format. I'm not sure about using multiple formats or having the vendors use a tool to generate any of the files.

Just thinking / writing out loud, but I can see something like this work pretty well:

scope:
  allowed: gratipay.com
  disallowed: beta.gratipay.com
rewards:
  swag: true
  monetary:
    currency: USD, EUR, BTC
    methods: PayPal, bank transfer, Coinbase
  hall_of_fame:
    url: http://hackerone.com/gratipay/thanks

The format allows for comments, multi-line strings, and all scalar values (like JSON), which gives the right flexibility to the people using it while still easy to consume by both humans and computers. JSON would obviously be similar, but non-formatted JSON is still hard to read for people, doesn't have a build-in mechanism for comments, and has odd character restrictions (\x notation for example). Thoughts?

PS Using YAML might also use some cool security vulnerabilities in the parsers people will write for this. I'd say that's a win-win. :trollface:

from security-txt.

EdOverflow avatar EdOverflow commented on July 30, 2024 7

So after giving this a bit more thought and gathering ideas from other people in the industry, I have decided to stick to the original TXT format. My ultimate goal is widespread adoption by companies. Therefore I need to find the simplest format. In my opinion, neither JSON nor YAML are the simplest format to write security.txt in from the company's perspective.

I will leave this issue open, because this could change in the future and I would like to keep the discussion open.

from security-txt.

oreoshake avatar oreoshake commented on July 30, 2024 5

A simple script could also generate both formats and we could serve both formats at different URLs 😄

from security-txt.

OmgImAlexis avatar OmgImAlexis commented on July 30, 2024 4

I'd like to suggest instead of using json, yaml, etc. a field be added to the txt file that indicates where other formats can be found be that json, yaml, etc.

from security-txt.

EdOverflow avatar EdOverflow commented on July 30, 2024 3

I see pros and cons for TXT and JSON. The reasons why I decided to go down the TXT route are:

  • Personally, I think if we want companies to adopt this system it should be as straightforward as possible.
  • I like the sound of JSON too, but in my opinion, by claiming it is as "simple" as robots.txt that alone should encourage companies to start using security.txt.
  • By using JSON, you lose the human side of things, which is important for generating the actual file. I do not want companies to have to rely on a security.txt generator script.

@oreoshake Let me know what you think about my reasoning and could you suggest some possible solutions/compromises?

from security-txt.

keymandll avatar keymandll commented on July 30, 2024 3

True. What I wanted to say it's easier (I think) to pass the data to a JSON library.
If there is an easy to use, secure, widely adopted parser library to parse this format then it's fine of course.

from security-txt.

EdOverflow avatar EdOverflow commented on July 30, 2024 2

Thank you for the clever idea, @OmgImAlexis. Interestingly, this might actually work with the current directives in the Internet draft: https://www.ietf.org/id/draft-foudil-securitytxt-00.txt. The Contact: directive allows one to specify an external security page/file. So you could link to a JSON, YAML, etc. file using that directive. I will have to give that a bit more thought, but that does appear to be a very good solution.

from security-txt.

davidmays avatar davidmays commented on July 30, 2024 2

I have a completely different take. Why not use semantic HTML? Using formats like txt and yaml seem like a step backwards, or at least away from, the web.

Pros:

  • It can be rendered cleanly in a browser, given a reasonable stylesheet (Human Readable)
  • It can be indexed (or not, according to robots.txt or built-in directives) by currently existing search engines, making this information more easily discoverable by security researchers
  • Things like links (e.g. to pgp keys, bug bounty programs, etc) all have a clear semantic meaning, and are supported in the browser
  • Given a DTD or schema, it can be easily validated by existing tools

Cons:

  • Trickier than a text file to author by hand

from security-txt.

ccoenen avatar ccoenen commented on July 30, 2024 2

I do not believe JSON would provide any real advantage here. Keep it nice for humans.

from security-txt.

ccoenen avatar ccoenen commented on July 30, 2024 2

The existing format does not hinder machine-readability in the slightest. If you were to suggest that it does, you'd say that software had problems with HTTP-Headers (which use essentially the same format, except for comments).

from security-txt.

austinheap avatar austinheap commented on July 30, 2024 2

@invisiblethreat Not sure why one would need to write a parser or "end up in regex parsing hell" with the current format -- and that's certainly not what anyone would want. In Python2 you can use mimetools.Message(...), in Go you can use textproto.ReadMIMEHeader, and in PHP5 you can use http_parse_headers(...) (or http\Header::parse(...) in PHP7). I'm assuming similar things exist in all major languages.

@davidmays One of the benefits of having a standard here is there isn't a need to use semantic tags to explain what things are as the spec inherently supplies those definitions.

Personally I think it's important to remember that the KVPs are structured -- just as much as they would be if they were thrown into a JSON. The issue for me is -- if it's not text format using standard KVP entries -- then why JSON and not YAML or XML or... It feels like a slippery and unnecessary slope to get on.

The KISS principle def applies here IMHO 🙂

from security-txt.

ccoenen avatar ccoenen commented on July 30, 2024 1

Regarding "why not both", which came up a few times:

A simple script could also generate both formats and we could serve both formats at different URLs

Well, then you need to support reading all different formats, since you can't rely on every server outputting every format. In the worst case, somebody publishes security.yaml and you can only consume security.json - well that's awkward. I think this would be the worst possible outcome.

from security-txt.

keymandll avatar keymandll commented on July 30, 2024 1

So, the first thing that came into mind when I saw the proposal for JSON is how awesome it would be to have a standard that is not only human-parseable but software can easily deal with it as well. This would allow things like Burp Suite Pro extensions (or native support) to identify and understand scope, etc.

(I've read in the previous comments that having the need to have a generator should be avoided. The first time I had a look the the concept (simple security.txt) I got the idea that I would likely create a shell script to generate security.txt anyway.)

from security-txt.

ccoenen avatar ccoenen commented on July 30, 2024 1

@invisiblethreat Thanks, I believe I have a pretty good grasp on the technology. And: Yes, I believe we have just witnessed a pretty substantial update of HTTP. Guess what: no JSON?

You may or may not believe it, but JSON is not the only way to structure data.

Also the JSON standard does not even allow for comments (yes, thanks for pointing out that this is a feature, I know it is a feature), I do not believe it's even worth discussing for this specific use case.

from security-txt.

EdOverflow avatar EdOverflow commented on July 30, 2024 1

Closing this ticker for now and will reopen when we decide to discuss the file format again. Thank you all for your help.

from security-txt.

sbehrens avatar sbehrens commented on July 30, 2024

+1 on JSON.

from security-txt.

oreoshake avatar oreoshake commented on July 30, 2024

@EdOverflow thanks for the thoughtful response.

I should preface this with I really don't think this is important. This is borderline bikeshedding to me. And I also want to emphasize that parsing the robots.txt format is not hard, it's just less standard and well supported than JSON.

Who is this for?

Humans

Humans can read and understand any general format and account for obvious errors. We can understand any permutation of basic values even if they don't match the supplied grammar.

Humans are very slow.

Computers

Computers cannot understand loosely formatted things well without a parser. I'm sure there are robots.txt parsers available for every major language, but JSON parsing is ubiquitous and often part of the language itself these days. Computers can't understand simple mistakes without accounting for them ahead of time and that knowledge has be to shared across implementations.

Computers are very fast.

Humans using computers

I am no bounty researcher but if I'm looking for targets and people publish some structured data perhaps with validation, I can quickly scan many projects without reading a single page with my own eyes.

Humans using computers scale.

From the readme:

The main purpose of security.txt is to help make things easier for companies and security researchers when trying to secure platforms.

I think a json format makes things easier for researchers but not for companies. Personally, the balance feels ok.

What is this for?

To me, the point is to get the information in front of more people. Structured data seems easier to distribute and automate. Automation will help more people access it.

Responses

Personally, I think if we want companies to adopt this system it should be as straightforward as possible.

This comes back to the "straightforward for whom" point. Personally, I'd probably write yaml that generates JSON. Seems silly, but the conversions are trivial and the result is universal.

I like the sound of JSON too, but in my opinion, by claiming it is as "simple" as robots.txt that alone should encourage companies to start using security.txt.

I don't think we should follow anything from robots.txt other than ride the coattails of its awareness.

By using JSON, you lose the human side of things, which is important for generating the actual file.

💯 true. While a flat array of attributes is less human readable, it's the most readable json format possible 🤷. I wouldn't say you "lose" the human side but it does become less readable.

I do not want companies to have to rely on a security.txt generator script.

This is absolutely a downside to using JSON. However, I feel that any company advertising disclosure has the technical competency to do this. You could argue that they could parse a format just as easily as generating a specific structure.

If you've made it to this point, I owe you a 🍺. Again, parsing robots.txt is not a p / np problem.

from security-txt.

EdOverflow avatar EdOverflow commented on July 30, 2024

Thank you for very much for the informative response.

To me, the point is to get the information in front of more people. Structured data seems easier to distribute and automate. Automation will help more people access it.

I fully agree with you.

I don't think we should follow anything from robots.txt other than ride the coattails of its awareness.

👍

This comes back to the "straightforward for whom" point. Personally, I'd probably write yaml that generates JSON. Seems silly, but the conversions are trivial and the result is universal.

Interesting. There was a Google group discussion about this: https://groups.google.com/d/msg/hackerstxt/LYVdH_v6KvQ/mC1Nfj3HCQAJ.

This is absolutely a downside to using JSON. However, I feel that any company advertising disclosure has the technical competency to do this. You could argue that they could parse a format just as easily as generating a specific structure.

Fair enough. I am probably underestimating their capabilities.

A simple script could also generate both formats and we could serve both formats at different URLs 😄

Someone actually suggested this to me after seeing this issue. The only thing here is that it might start getting a little confusing. Still I think we are on the right track. This would probably be the best solution. 🙂

from security-txt.

oreoshake avatar oreoshake commented on July 30, 2024

Interesting. There was a Google group discussion about this: https://groups.google.com/d/msg/hackerstxt/LYVdH_v6KvQ/mC1Nfj3HCQAJ.

Heh yeah, I had been working with Rob, Eduardo, and @devd on that one 😄 There was some consensus on using JSON in private thread but it was certainly not agreed upon.

The only thing here is that it might start getting a little confusing. Still I think we are on the right track.

👍

from security-txt.

oreoshake avatar oreoshake commented on July 30, 2024

To me, that gives the convenience of having a structured, readable, and consumable format.

@jobertabma what about TOML? :trollface:. 💯 agree that YAML is a great balance of structure and readability. I'm not sure I agree on the consumability. (I could be totally wrong but) It seems like JSON support is in most major stdlibs where YAML is not. You can parse JSON in a browser, you can't parse YAML without a lib.

My point was that any standard format is fine, but JSON strikes a nice balance to me. What that standard format becomes is not important.

PS Using YAML might also use some cool security vulnerabilities in the parsers people will write for this. I'd say that's a win-win. :trollface:

I won't even try to play the angle that YAML/ruby have a checkered past. You brought it up 😜 .

I'm not sure about using multiple formats or having the vendors use a tool to generate any of the files.

Yeah, it could get messy.

EDIT:

My view is skewed by my feeling that humans won't actually be reading this. Researcher robots and 3rd party sites will consume this data.

from security-txt.

invisiblethreat avatar invisiblethreat commented on July 30, 2024

@EdOverflow Thanks for taking on this initiative!

I think that an overabundance of importance is being given to the human readable/writable factor here. These policies are essentially a "write once(or twice)" and read hundreds/thousands/millions of times. Also, just about anyone that is going to write a policy is familiar with JSON due to its ubiquity. I feel that the asymmetry or writes to reads should be given much more weight.

99.999%(or more) of the time it's machines reading this file, and rolling parsers for loosely formatted data is a PITA. I'd happily write tools for this initiative that did output to JSON or XML. This could also be tied to a site for validation, like SSLabs.

Another thought, if robots.txt was being developed today, would it be plain text? I'd argue that it would be "well formatted" at easy for machines to ingest over easy for humans to read and write.

Please consider this comment as an offer of support to write the tooling necessary for this initiative. :)

from security-txt.

invisiblethreat avatar invisiblethreat commented on July 30, 2024

@ccoenen There's readability, and then there's usefulness. Also, if HTTP headers were being reimplemented today, do you think that flat text would be used?

In the age of microsevices, the ability to specify on a 'per-endpoint' basis becomes increasingly useful. For example, the endpoints of /account and /history can be two completely disparate systems, which have two entirely different teams running them and different scopes or directives for each. This is not possible with flat text, but any other format that permits nesting will allow for this granularity. This allows for a global policy, rather than a per-directory configuration.

Nesting would also allow for a global policy that could include www.example.com and blog.example.com rather than having a policy for every subdomain that needs to be tracked independently.

from security-txt.

invisiblethreat avatar invisiblethreat commented on July 30, 2024

Yes, I believe we have just witnessed a pretty substantial update of HTTP. Guess what: no JSON?

Ignoring the fact that the headers are Huffman encoded, sure. Not exactly human readable.

I do not believe it's even worth discussing for this specific use case.

Well, I disagree.

from security-txt.

austinheap avatar austinheap commented on July 30, 2024

Most (if not all) of the standards around HTTP are based on a Key: value relationship. I struggle to find where JSON adds value here, but it's clear that it would create complexity.

from security-txt.

invisiblethreat avatar invisiblethreat commented on July 30, 2024
  • KVP is exactly what proposal embraces.
  • You can have granularity due to nesting, similar to the Location directive without having to cascade changes and manage multiple files. This eases the management burden to one location. YAML, TOML, and XML would all provide this as well.
  • When you would like to provide multiple values, you can use a list, rather than multiple directives, or end up in regex parsing hell. Processing a list of 1 or 1 million follow the exact same pattern.
  • Adoption. As much fun as writing parsers are, why do it again?

Again, I feel that machines will read this file if the standard is accepted at a ratio of 1000:1, if not more.

If feels as if this has already been decided, so I'll focus my efforts elsewhere.

from security-txt.

davidmays avatar davidmays commented on July 30, 2024

from security-txt.

andreasvirkus avatar andreasvirkus commented on July 30, 2024

I'd like to throw one other format into the ring: Markdown.

Benefits: everything that a .txt file already includes + common parsing libs, a (somewhat) strict set of rules for the format, ability to render it into HTML if necessary (and the benefits of that were brought up by @davidmays here).

from security-txt.

austinheap avatar austinheap commented on July 30, 2024

@andreasvirkus I don't believe the file format is still an open item, but I'll defer to @EdOverflow to close this out.

from security-txt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.