Giter Club home page Giter Club logo

Comments (25)

jednano avatar jednano commented on August 23, 2024 1

I would say that the escaping of a backslash is always needed for consistency and less confusion.

Yes, Windows paths would need to be escaped. I wouldn't force people to use forward slashes, but it would be optional.

I agree that a clear definition would be valuable. I also think it warrants a separate project that ONLY does INI parsing; hence, my ini-parser project. Why? Because parsing an INI file isn't at all specific to EditorConfig.

from editorconfig-core-test.

ppalaga avatar ppalaga commented on August 23, 2024 1

I do not understand the voting question and this thread is too long to read. @xuhdev could you please add one or two examples to the question that explain the change in the spec?

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024 1

@ppalaga I created a gist with some notes I took while thinking about this.

Feel free to ask, comment, criticize.

from editorconfig-core-test.

florianb avatar florianb commented on August 23, 2024 1

@florianb asking here, not to pollute the voting:

👍

  1. I'd like to vote "no" because i think the problem is we didn't explicitly specify yet how custom properties may be used with the editorconfig.

What do you mean with custom properties?

People ask reocurringly for the ability to add properties besides of these defined by editorconfig. And as far as i @rakus attempt understand it has its origin to do so for vim/vim#2286. We do not allow having backslashes, semicolons, octothorpes as values (or am i wrong?) so we wouldn't need to escape anything, right?

And we didn't specify explicitly the editorconfig-fileformat, ini files itself aren't specified and the behavior of the reference parser (ConfigParser) depends on modal parameters. But i don't want to stay in the way of @rakus implementation.
2. I'd also like to vote yes, since we become a little more explicit about the handling of values. But i also think the escaping of octothorpes, semicolons and backslashes is by far not enough to allow a secure and complete serialization of valid UTF-8 strings.

Would you please provide an example of an UTF-8 string that cannot be serialized securely and completely? Just striving to understand, no attempt to prove you are wrong from my side.

All fine. I am pretty sure i can't neither - string handling has so many security implications. I'd just like to point out that these decisions may affect the security of the clients and/or may limit the proper handling of utf-8 string-literals (, not knowing if editorconfig-values are even utf8).
I'd rather like to see leaning towards an explicit string-literal standard (which would in addition need a unambiguous grammar).

I'd enjoy to see if this could lead to a discussion how a ini-file is formally defined

You call for a grammar, right?

Right. Specifying an ini-grammar is long overdue.

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

Oops! There is a pull request to fix this already: #11

Note that there are the following tests:

  • semicolon_in_property
  • escaped_semicolon_in_property
  • octothorpe_in_property
  • escaped_octothorpe_in_property

The last two needs fixing and that's what the pull request does.

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

I wondered how this bug could stay unrecognized for so long. So I started to test.

The escaped_octothorpe_in_property is defined like this:

# test escaped octothorpes are included in property value
new_ec_test(escaped_octothorpe_in_property comments.in test12.c
    "^key=value ; not comment[ \t\n\r]*$")

and runs successful with editor-core-py. Suprise!

So I changed the test definition and added nonsense behind the semicolon:

# test escaped octothorpes are included in property value
new_ec_test(escaped_octothorpe_in_property comments.in test12.c
    "^key=value ; NONSENSE not comment[ \t\n\r]*$")

Again surprise: The tests are still run successful.

This seems to connected to the cmake list syntax. In Strings a semicolon is a list separator. See here or here. And the test only fails if both regex expressions do not match.
If you add nonsense before and after the semicolon, the test will fail.

Due to lacking cmake knowledge, I cant provide any solution.

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

Looks like a solution is here.

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

The $<SEMICOLON> doesn't seem to work in this situation. I tested again and this time I figured out that the correct way to put a semicolon in a regex is to prefix it with two backslashes:

wrong: "^key=value ; not comment[ \t\n\r]*$"
right:  "^key=value \\; not comment[ \t\n\r]*$"

I didn't get that yesterday, as the editorconfig-core-py has a bug: It doesn't unescape an escaped comment character.
For

; Escaped semicolon in value
[test6.c]
key=value \; not comment

It returns

value \; not comment

The backslash is not stripped out.
The same is true for editorconfig-core-c.
I didn't test editorconfig-core-js, but from a look at the source, I guess it behaves the same. Maybe @jedmao knows more.

This is not easy fixable in editorconfig-core-py and editorconfig-core-c, as both don't have any special backslash handling. And when you introduce a special backsplash handling, you might break other things.

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

I don't think we are supposed to escape comma's in the files, and we never talked about them. The reason is that, at least until today, we do not expect commas to show up as values. Are you concerned about this?

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

The point is, that that the following two values are always returned verbatim (including the backspace):

key1=value \; not comment
key2=value \# not comment

You can check this with the project editorconfig-core-py. Clone it, got to the submodule test and do a git checkout master (to get the most up-to-date tests including pull-request 11). If you then run the tests, multiple will fail.

On of the failing tests is escaped_octothorpe_in_property as editorconfig-core-py returns key=value \# not comment, but the test expects key=value # not comment.

As of today this is not a problem, as editorconfig property values are just keywords or numbers.

But the problem is, that there is no clear definition on how to handle a escaped comment char in the property value. How to handle the following?

key=value \# not comment
key=value \\# not comment, or is it?
key=value \\\# not comment
...

This doesn't seem to be a real world problem now, but it is a problem with the file format specification. AFAIK there is no real file format specification, but just the tests in this project.

BTW: The test escaped_octothorpe_in_property also fails in editorconfig-core-c and editorconfig-core-js (don't forget to cd to the test directory and do a git checkout master!)

PS: The easiest "solution" might be to change the text on the editorconfig.org home page (section File Format Details). Replace "Comments should go on their own lines." with "Comments must go on their own lines." But this might make existing .editorconfig files invalid.

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

I kind of sorted out what was going on there. Initially, both semicolons and octothorpes were supposed to be escapable. The relevant test case escaped_octothorpe_in_property were supposed to test on escaping octothorpes, but for some reasons, they were actually tested against semicolons. Now we are in the mess: Move forward or keep the inconsistency. I personally leaning toward keeping both semicolons and octothorpes escapable, as everything will still be backward compatible at this point of time, assuming people do not use escaped octothorpes, which are not supported by the current specification anyway. However, this is likely to be decided by another voting. Let's initiate votings and start addressing these issues...

For your PS: I don't understand, in this context, why replacing "should" with "must" makes any difference, because both mean "should" and "must."

And thank you for bring this up -- we may never notice this.

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

Note that the test semicolon_in_property and escaped_semicolon_in_property are also faulty. The semicolon defines a list of regexes.

So they need to be escaped.

In semicolon_in_property with two backslashes, in escaped_semicolon_in_property with five (my guess: two to escape the ;, another two to insert a \ and one to escape the \ for use in the regex).

from editorconfig-core-test.

jednano avatar jednano commented on August 23, 2024
key=value \\# is a comment

Because the first slash escapes the second slash, but the octothorpe that follows is not escaped.

from editorconfig-core-test.

jednano avatar jednano commented on August 23, 2024

I've also been aware of the bugs within the tests and the js core for a while. That's why I created https://github.com/jedmao/ini-parser but never quite had the time to close the loop.

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

@jedmao Would the escaping of a backslash always be needed ore only in combination with a comment character.
If I would implement a custom editorconfig extension, say to provide c_include_path, would Windows pathes have to be written with double backslashes? Like C:\\dira\\dirb?
The other option would be to say, that pathes always have to be written with forward slashes.

I know, that this is not important to the current set of properties, but a clear definition would be useful for those providing extensions to editorconfig.

from editorconfig-core-test.

dyadix avatar dyadix commented on August 23, 2024

Is all this about escaped characters in values? Is it documented anywhere? https://editorconfig.org/#file-format-details?

from editorconfig-core-test.

rakus avatar rakus commented on August 23, 2024

@jedmao As the documentation refers to Python ConfigParser Library (I know, the text only says "is compatible with the format"), two points:

  1. ConfigParser does only allow comments on own lines
  2. It does not support escaping.

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

Just one note, as stated in the file format details:

Only forward slashes (/, not backslashes) are used as path separators and octothorpes (#) or semicolons (;) are used for comments.

So IMO no worry about paths.

My summary of the issue right now is: Should we add/keep or remove/avoid tests for escaping backslashes, semicolons, and octothorpes? I personally leaning towards to adding/keeping, because it may be practically vital in the future and does not break any compatibility at this point of time. Please add some discussion and I'll put it onto a voting on Sunday or Monday...

from editorconfig-core-test.

jednano avatar jednano commented on August 23, 2024

The only confusion is to have tests that are irrelevant to the specification. That said, I definitely don't think we should be removing any existing ones. I don't think there's really a wrong answer here at this point.

from editorconfig-core-test.

ppalaga avatar ppalaga commented on August 23, 2024

@florianb asking here, not to pollute the voting:

  1. I'd like to vote "no" because i think the problem is we didn't explicitly specify yet how custom properties may be used with the editorconfig.

What do you mean with custom properties?

And we didn't specify explicitly the editorconfig-fileformat, ini files itself aren't specified and the behavior of the reference parser (ConfigParser) depends on modal parameters. But i don't want to stay in the way of @rakus implementation.
2. I'd also like to vote yes, since we become a little more explicit about the handling of values. But i also think the escaping of octothorpes, semicolons and backslashes is by far not enough to allow a secure and complete serialization of valid UTF-8 strings.

Would you please provide an example of an UTF-8 string that cannot be serialized securely and completely? Just striving to understand, no attempt to prove you are wrong from my side.

I'd enjoy to see if this could lead to a discussion how a ini-file is formally defined

You call for a grammar, right?

from editorconfig-core-test.

cxw42 avatar cxw42 commented on August 23, 2024

Voting link, for ease of reference: editorconfig/editorconfig-vote#5 . My two cents:

  • I support having a grammar or other formal definition of the format.
    • Edit I would also ask that the definition include semantics, not just syntax --- e.g., editorconfig/editorconfig#371 (semantics of numeric ranges).
  • I might be able to fix the cmake semicolons issue the same way I fixed the out-of-order multi-line results issue.
  • I support forward-slash-only paths for cross-platform use.
  • I support keeping the existing tests and escapes only for # and ; .
  • I suggest requiring that any value needing backslashes or other escapes be double-quoted, e.g., as in TOML. That would permit storing arbitrary Unicode string values.
  • I suggest requiring that keys use only A-Za-z_ . That way a parser will always be able to see what key a line is, regardless of how the format may change in the future.

from editorconfig-core-test.

cxw42 avatar cxw42 commented on August 23, 2024

Related - as part of a grammar, also specify whether /^\s+[#;].*$/ (leading whitespace before the comment marker) is a valid comment line. I believe it should be, but just discovered that my vimcore tree doesn't treat it as such. I am guessing that means the Python core doesn't either, since I converted the Python to Vimscript as closely as I could.

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

@editorconfig/board-member

Given that the specification is more clear now, we have a new question: Should we kick out the test for removing escaping # (i.e., the test named escaped_octothorpe_in_value), and explicitly add that this kind of behavior (# or ; in the middle of a line) is undefined in the current specification? Any way, we probably should have more discussion here.

escaped_octothorpe_in_value tests that in the following case, \# should be treated as #:

[xxxx]
my_property = my_value   \# no sharp comment

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

I opened another relevant voting: editorconfig/editorconfig-vote#6

from editorconfig-core-test.

xuhdev avatar xuhdev commented on August 23, 2024

Done in 2b983f5 and editorconfig/specification@71e10b3

from editorconfig-core-test.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.