Comments (26)
I had the same problem. Unfortunately the \xxx octal sequences will be encoded
on my
client side (GWT / JS) as Latin-1 Characters, so completely wrong.
I used the unescapeBytes function on my server side to convert back to the
original
characters. Then I write to an UTF-8 Stream. Now it works also on the client
side.
Original comment by [email protected]
on 11 May 2010 at 6:35
from protobuf-java-format.
I'm having the same problem. \xxx octal sequences aren't valid JSON and my json
parser is (quite rightly)
vomiting on them.
Original comment by [email protected]
on 18 May 2010 at 3:45
from protobuf-java-format.
At the moment byte arrays can look like this:
"\123\124\125"
... Thats an invalid JSON escape sequence. Your parser parses it, but no
standards compliant parser will.
The code should be changed here:
http://code.google.com/p/protobuf-java-format/source/browse/trunk/protobuf-java-
format/src/java/com/google/protobuf/JsonFormat.java#1186
Original comment by [email protected]
on 18 May 2010 at 4:08
from protobuf-java-format.
Based on the standard definition from http://www.json.org, the required change
is to the JsonFormat.escapeBytes() method:
byte b = input.byteAt(i);
...
default:
if (b >= 0x20) {
builder.append((char) b);
} else {
builder.append('\\u');
int code = Character.digit(b, 16);
String hex = Integer.toHexString(code);
builder.append(hex);
}
break;
Original comment by eliran.bivas
on 25 May 2010 at 1:41
- Changed state: Accepted
- Added labels: Usability
from protobuf-java-format.
mistakenly wrote
builder.append('\\u');
instead of
builder.append("\u");
Original comment by eliran.bivas
on 25 May 2010 at 2:31
from protobuf-java-format.
I'm still getting some weirdness with the string encoding.
According to the JSON RFC, string literals can contain unicode characters
(except for some ctrl chars) OR be escaped (\uXXXX).
So encoding the value "é" as a JSON string can be either:
value:"é"
OR
value:"\u00E9"
Currently (r34) JsonFormat will first encode the string to UTF8 (which in our
case results in 2 bytes) then, it will encode each byte. When the byte is
outside the ASCII plane, it is encoded using its Unicode sequence (\uXXXX). As
such, the result is double-encoding of the value (first to UTF-8 then to
Unicode-32) so we get garbage on the client:
value:"\u00c3\u00a9"
where:
\u00c3 is the UTF-8 control character
\u00a9 is the second byte in the sequence
See section 2.5 of http://www.ietf.org/rfc/rfc4627.txt
Original comment by [email protected]
on 2 Aug 2010 at 2:31
from protobuf-java-format.
I've made the necessary change to escapeBytes() method.
Please review the latest revision and tell me if it's working for you.
Original comment by eliran.bivas
on 3 Aug 2010 at 9:04
from protobuf-java-format.
Encoding "é" now results in "\uffc3\uffa9". Reading this back with JsonFormat
results in "sY".
I think the problem is that the escapeText() method converts the text string
into UTF-8. This should not happen at all. The escapeText method should encode
a CharSequence directly, not a ByteString (since JSON supports unicode).
I'll try to submit a patch for this. Thanks for the quick turnaround.
Original comment by [email protected]
on 3 Aug 2010 at 2:55
from protobuf-java-format.
Allright, here's a patch that encodes String values into JSON String as
described in the RFC.
* all control characters are encoded using \uXXXX except for the ones that
allow the short version \X (newline, formfeed, etc.)
* single quote is also encoded as \' even if the JSON RFC doesn't specify that
it can
* any other character in the 0x0000-0xFFFF range is printed as-is
* any character outside of the 0x0000-0xFFFF range is printed by encoding it's
UTF-16 surrogate pair (\uXXXX\uXXXX).
The patch also adds a few test cases.
Original comment by [email protected]
on 3 Aug 2010 at 6:54
Attachments:
from protobuf-java-format.
The test for this is to check if JSON.parse (from http://www.JSON.org/json2.js)
accepts it. Even if this piece of code is more strict that the spec, this is
the piece of code that most people use to parse JSON.
Reading towards the end, you can see how it validates:
1. replace /\\(?:["\\\/bfnt]|u[0-9a-fA-F]{4})/ with @.
- i.e. only \" \\ \/ \b \f \n \t and \uXXXX are allowed
2. Now, a string must match /"[^"\\\n]*"/
So, ... octal escapes are not accepted by this validator. Neither are \a \r \v
and \' which are generated by escapeBytes().
BTW, both the encoding of strings and byte sequences have this problem,
Original comment by [email protected]
on 18 Aug 2010 at 4:07
from protobuf-java-format.
The patch I submitted will not encode \a, and \v (as did escapeBytes()), but
will encode \r (as per the RFC). It will not do any octal escaping either.
Quickly looking at the json2.js source, I think it does accept \r.
The only thing that is not compliant in the patch I submitted is the encoding
of \' which seems to be accepted by browsers.
Now that I think about it, I think it was a bad idea to add to the patch. It
can be easily left out when applying the patch.
kresten.krab: Did you test the patch I submitted?
Original comment by [email protected]
on 18 Aug 2010 at 4:22
from protobuf-java-format.
Hi, I haven't had the chance to look on it but after reading your comment
about, i think it might need a change.
sorry for the late replay.
Original comment by eliran.bivas
on 19 Aug 2010 at 6:13
from protobuf-java-format.
In v.1.1.1 escaping of special chars (e.g. german umlauts like ä, ö, ü, ß)
is still wrong.
When I apply philippe's patch (see issue-11.patch above) my test work. So I
don't know why this has not found its way into the trunk during the last 6
months.
Here's another patch for the current JsonFormat.java in v.1.1.1 (rev 43) which
is essentially phillippe's patch without the single quote escaping (can be
omitted, read previous comment).
Maybe this will find its way into the next version.
Original comment by [email protected]
on 4 Apr 2011 at 1:25
Attachments:
from protobuf-java-format.
I've been using a patched version in production for a while now and have not
had a single issue.
It was important to leave out the single-quote special treatment as mentioned.
So you should be fine with your patched version.
As a side note, our patched version also fixes issue 17. It's available here
(as a maven project)
http://maven.obiba.org/maven2/com/google/protobuf/protobuf-java-format/
Original comment by [email protected]
on 4 Apr 2011 at 4:23
from protobuf-java-format.
Good to know. One more reason why your patch should be included in the next
release...
Original comment by [email protected]
on 4 Apr 2011 at 7:06
from protobuf-java-format.
any of you like to merge the patch?
Original comment by eliran.bivas
on 6 Apr 2011 at 7:01
from protobuf-java-format.
When will a new version be released with these important fixes?
Original comment by t.britz%[email protected]
on 21 Apr 2011 at 2:04
from protobuf-java-format.
Eliran: I'm willing to merge the patch. I can also merge the fix for Issue 17.
Original comment by [email protected]
on 21 Apr 2011 at 2:21
from protobuf-java-format.
Original comment by eliran.bivas
on 21 Apr 2011 at 2:32
- Changed state: Started
from protobuf-java-format.
Issue 16 has been merged into this issue.
Original comment by eliran.bivas
on 21 Apr 2011 at 2:33
from protobuf-java-format.
Issue 18 has been merged into this issue.
Original comment by eliran.bivas
on 21 Apr 2011 at 2:35
from protobuf-java-format.
Issue 31 has been merged into this issue.
Original comment by [email protected]
on 26 Apr 2011 at 5:19
from protobuf-java-format.
Patch was merged into r47. Note that the patch does not encode the single
quote, as per the RFC (see comments above).
Original comment by [email protected]
on 26 Apr 2011 at 8:19
- Changed state: Fixed
from protobuf-java-format.
Issue 32 has been merged into this issue.
Original comment by [email protected]
on 3 May 2011 at 12:39
from protobuf-java-format.
maybe someone might find this comment helpful, too:
http://code.google.com/p/protobuf-java-format/issues/detail?id=10#c25
tephan
Original comment by [email protected]
on 5 May 2011 at 5:03
from protobuf-java-format.
maybe someone might find this comment helpful, too:
http://code.google.com/p/protobuf-java-format/issues/detail?id=10#c25
tephan
Original comment by [email protected]
on 5 May 2011 at 5:04
from protobuf-java-format.
Related Issues (20)
- SmileFormat print is closing the outputstream HOT 2
- Target Java 1.5 instead of 1.6
- Common Interface for all formats HOT 5
- Code review request HOT 1
- Tokenizer bug in XmlFormat.merge HOT 1
- Escaped slash is not correctly handled HOT 3
- It just can not work for my test
- Fail for partial build
- Option to omit top-level element wrapping
- JsonFormat.merge does not parse doubles correctly for undefined field HOT 1
- invalid escapes \a and \v in output of byte array HOT 4
- Serializing a builder that has a default value, but doesn't override it, creates json without that value. HOT 1
- com.googlecode.protobuf.format.JsonFormat#escapeBytes handles '\a' incorrectly. '\a' is ivalid value for json. HOT 2
- Feature Request: Some level of Avro Support
- Unknown fields violate JSON syntax HOT 1
- (valid) empty object throws a ParseException HOT 2
- JSON parser does not accept camel-case fieldnames.
- Error while parsing Json string when a non-primitive field is null
- question about UTF8
- JSON formatting breaks when a proto Message contains UnknownFields
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from protobuf-java-format.