Giter Club home page Giter Club logo

Comments (8)

wongoo avatar wongoo commented on June 25, 2024 4

@pantianying I find that java uses two 16-bit characters to represent emoji "🤣", while golang uses one rune to represent it. So the length of the emoji in java is 2, while 1 in golang.

The hessian protocol says that:

The length is the number of 16-bit characters.

So, it's a bug of golang hessian2, I will try to fix it.

from dubbo-go-hessian2.

fangyincheng avatar fangyincheng commented on June 25, 2024 2

image

from dubbo-go-hessian2.

wongoo avatar wongoo commented on June 25, 2024

@pantianying pls provide a unit test so that we can follow up

from dubbo-go-hessian2.

pantianying avatar pantianying commented on June 25, 2024

@pantianying pls provide a unit test so that we can follow up

https://github.com/pantianying/dubbo-go-hessian2/blob/51e52527f9a1e2e2edb41cc873c1cc591fa96de8/string_test.go#L144

from dubbo-go-hessian2.

pantianying avatar pantianying commented on June 25, 2024

ps:

  1. Java returns the string of emoji, and the go side will scramble.

  2. If Java returns Emoji in a complex map structure, the whole serialization will fail.

  3. go coded emoji, Java can not receive.

from dubbo-go-hessian2.

wongoo avatar wongoo commented on June 25, 2024

New knowledge:

  1. A char is encoded in UTF-8 format in com.caucho.hessian.io.Hessian2Output.printString()
  /**
   * Prints a string to the stream, encoded as UTF-8
   *
   * @param v the string to print.
   */
  public void printString(char []v, int strOffset, int length)
    throws IOException
  {
    int offset = _offset;
    byte []buffer = _buffer;

    for (int i = 0; i < length; i++) {
      if (SIZE <= offset + 16) {
        _offset = offset;
        flushBuffer();
        offset = _offset;
      }

      char ch = v[i + strOffset];

      if (ch < 0x80)
        buffer[offset++] = (byte) (ch);
      else if (ch < 0x800) {
        buffer[offset++] = (byte) (0xc0 + ((ch >> 6) & 0x1f));
        buffer[offset++] = (byte) (0x80 + (ch & 0x3f));
      }
      else {
        buffer[offset++] = (byte) (0xe0 + ((ch >> 12) & 0xf));
        buffer[offset++] = (byte) (0x80 + ((ch >> 6) & 0x3f));
        buffer[offset++] = (byte) (0x80 + (ch & 0x3f));
      }
    }

    _offset = offset;
  }
  1. A UTF-8 character is decoded in com.caucho.hessian.io.Hessian2Input.parseUTF8Char()
  private int parseUTF8Char()
    throws IOException
  {
    int ch = _offset < _length ? (_buffer[_offset++] & 0xff) : read();

    if (ch < 0x80)
      return ch;
    else if ((ch & 0xe0) == 0xc0) {
      int ch1 = read();
      int v = ((ch & 0x1f) << 6) + (ch1 & 0x3f);

      return v;
    }
    else if ((ch & 0xf0) == 0xe0) {
      int ch1 = read();
      int ch2 = read();
      int v = ((ch & 0x0f) << 12) + ((ch1 & 0x3f) << 6) + (ch2 & 0x3f);

      return v;
    }
    else
      throw error("bad utf-8 encoding at " + codeName(ch));
  }

from dubbo-go-hessian2.

wongoo avatar wongoo commented on June 25, 2024

java only support ucs-2, while golang support ucs-4.

from dubbo-go-hessian2.

wongoo avatar wongoo commented on June 25, 2024

i'm working on it, and it may be resolved as early as tomorrow.

from dubbo-go-hessian2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.