Giter Club home page Giter Club logo

Comments (26)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
I had the same problem. Unfortunately the \xxx octal sequences will be encoded 
on my
client side (GWT / JS) as Latin-1 Characters, so completely wrong.
I used the unescapeBytes function on my server side to convert back to the 
original
characters. Then I write to an UTF-8 Stream. Now it works also on the client 
side. 

Original comment by [email protected] on 11 May 2010 at 6:35

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
I'm having the same problem. \xxx octal sequences aren't valid JSON and my json 
parser is (quite rightly) 
vomiting on them.

Original comment by [email protected] on 18 May 2010 at 3:45

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
At the moment byte arrays can look like this:
"\123\124\125"
... Thats an invalid JSON escape sequence. Your parser parses it, but no 
standards compliant parser will.

The code should be changed here:
http://code.google.com/p/protobuf-java-format/source/browse/trunk/protobuf-java-
format/src/java/com/google/protobuf/JsonFormat.java#1186

Original comment by [email protected] on 18 May 2010 at 4:08

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Based on the standard definition from http://www.json.org, the required change 
is to the JsonFormat.escapeBytes() method:

        byte b = input.byteAt(i);
...
        default:
                if (b >= 0x20) {
                        builder.append((char) b);
                } else {
                        builder.append('\\u');
                        int code = Character.digit(b, 16);
                        String hex = Integer.toHexString(code);
                        builder.append(hex);
                }
                break;

Original comment by eliran.bivas on 25 May 2010 at 1:41

  • Changed state: Accepted
  • Added labels: Usability

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
mistakenly wrote 

builder.append('\\u'); 

instead of 

builder.append("\u");

Original comment by eliran.bivas on 25 May 2010 at 2:31

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
I'm still getting some weirdness with the string encoding.

According to the JSON RFC, string literals can contain unicode characters 
(except for some ctrl chars) OR be escaped (\uXXXX).

So encoding the value "é" as a JSON string can be either:
value:"é"
OR
value:"\u00E9"

Currently (r34) JsonFormat will first encode the string to UTF8 (which in our 
case results in 2 bytes) then, it will encode each byte. When the byte is 
outside the ASCII plane, it is encoded using its Unicode sequence (\uXXXX). As 
such, the result is double-encoding of the value (first to UTF-8 then to 
Unicode-32) so we get garbage on the client:

value:"\u00c3\u00a9"
where:
\u00c3 is the UTF-8 control character
\u00a9 is the second byte in the sequence

See section 2.5 of http://www.ietf.org/rfc/rfc4627.txt

Original comment by [email protected] on 2 Aug 2010 at 2:31

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
I've made the necessary change to escapeBytes() method.

Please review the latest revision and tell me if it's working for you.

Original comment by eliran.bivas on 3 Aug 2010 at 9:04

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Encoding "é" now results in "\uffc3\uffa9". Reading this back with JsonFormat 
results in "sY".

I think the problem is that the escapeText() method converts the text string 
into UTF-8. This should not happen at all. The escapeText method should encode 
a CharSequence directly, not a ByteString (since JSON supports unicode).

I'll try to submit a patch for this. Thanks for the quick turnaround.

Original comment by [email protected] on 3 Aug 2010 at 2:55

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Allright, here's a patch that encodes String values into JSON String as 
described in the RFC.

* all control characters are encoded using \uXXXX except for the ones that 
allow the short version \X (newline, formfeed, etc.)
* single quote is also encoded as \' even if the JSON RFC doesn't specify that 
it can
* any other character in the 0x0000-0xFFFF range is printed as-is
* any character outside of the 0x0000-0xFFFF range is printed by encoding it's 
UTF-16 surrogate pair (\uXXXX\uXXXX).

The patch also adds a few test cases.

Original comment by [email protected] on 3 Aug 2010 at 6:54

Attachments:

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
The test for this is to check if JSON.parse (from http://www.JSON.org/json2.js) 
accepts it.  Even if this piece of code is more strict that the spec, this is 
the piece of code that most people use to parse JSON.

Reading towards the end, you can see how it validates:

1. replace /\\(?:["\\\/bfnt]|u[0-9a-fA-F]{4})/ with @.    
       - i.e. only \" \\ \/ \b \f \n \t and \uXXXX are allowed

2. Now, a string must match /"[^"\\\n]*"/ 

So, ... octal escapes are not accepted by this validator.  Neither are \a \r \v 
and \' which are generated by escapeBytes().

BTW, both the encoding of strings and byte sequences have this problem,



Original comment by [email protected] on 18 Aug 2010 at 4:07

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
The patch I submitted will not encode \a, and \v (as did escapeBytes()), but 
will encode \r (as per the RFC). It will not do any octal escaping either.

Quickly looking at the json2.js source, I think it does accept \r.

The only thing that is not compliant in the patch I submitted is the encoding 
of \' which seems to be accepted by browsers.

Now that I think about it, I think it was a bad idea to add to the patch. It 
can be easily left out when applying the patch.

kresten.krab: Did you test the patch I submitted?

Original comment by [email protected] on 18 Aug 2010 at 4:22

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Hi, I haven't had the chance to look on it but after reading your comment 
about, i think it might need a change.

sorry for the late replay.

Original comment by eliran.bivas on 19 Aug 2010 at 6:13

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
In v.1.1.1 escaping of special chars (e.g. german umlauts like ä, ö, ü, ß) 
is still wrong.

When I apply philippe's patch (see issue-11.patch above) my test work. So I 
don't know why this has not found its way into the trunk during the last 6 
months.

Here's another patch for the current JsonFormat.java in v.1.1.1 (rev 43) which 
is essentially phillippe's patch without the single quote escaping (can be 
omitted, read previous comment).

Maybe this will find its way into the next version.

Original comment by [email protected] on 4 Apr 2011 at 1:25

Attachments:

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
I've been using a patched version in production for a while now and have not 
had a single issue.

It was important to leave out the single-quote special treatment as mentioned. 
So you should be fine with your patched version.

As a side note, our patched version also fixes issue 17. It's available here 
(as a maven project)

http://maven.obiba.org/maven2/com/google/protobuf/protobuf-java-format/

Original comment by [email protected] on 4 Apr 2011 at 4:23

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Good to know. One more reason why your patch should be included in the next 
release...

Original comment by [email protected] on 4 Apr 2011 at 7:06

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
any of you like to merge the patch?

Original comment by eliran.bivas on 6 Apr 2011 at 7:01

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
When will a new version be released with these important fixes?

Original comment by t.britz%[email protected] on 21 Apr 2011 at 2:04

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Eliran: I'm willing to merge the patch. I can also merge the fix for Issue 17.

Original comment by [email protected] on 21 Apr 2011 at 2:21

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024

Original comment by eliran.bivas on 21 Apr 2011 at 2:32

  • Changed state: Started

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Issue 16 has been merged into this issue.

Original comment by eliran.bivas on 21 Apr 2011 at 2:33

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Issue 18 has been merged into this issue.

Original comment by eliran.bivas on 21 Apr 2011 at 2:35

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Issue 31 has been merged into this issue.

Original comment by [email protected] on 26 Apr 2011 at 5:19

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Patch was merged into r47. Note that the patch does not encode the single 
quote, as per the RFC (see comments above).

Original comment by [email protected] on 26 Apr 2011 at 8:19

  • Changed state: Fixed

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
Issue 32 has been merged into this issue.

Original comment by [email protected] on 3 May 2011 at 12:39

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
maybe someone might find this comment helpful, too: 
http://code.google.com/p/protobuf-java-format/issues/detail?id=10#c25

tephan

Original comment by [email protected] on 5 May 2011 at 5:03

from protobuf-java-format.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 16, 2024
maybe someone might find this comment helpful, too: 
http://code.google.com/p/protobuf-java-format/issues/detail?id=10#c25

tephan

Original comment by [email protected] on 5 May 2011 at 5:04

from protobuf-java-format.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.