Giter Club home page Giter Club logo

Comments (6)

dgm avatar dgm commented on June 23, 2024

The problem or difference from translate_text is that translate_document API returns a base63 encoded string, which when processed by https://github.com/aws/aws-sdk-ruby/blob/9a4278dbe51fd1a7125973772c021dd02d328226/gems/aws-sdk-core/lib/aws-sdk-core/json/parser.rb#L69C11-L69C52 gets changed to ASCII-8BIT. I don't know if the generic implementation in the AWS core library can assume that all Blob Shape things are UTF-8 or not, so it probably cannot be fixed there. I would prefer to have a method override in the Aws::Translate::Types::TranslatedDocument class that forced the encoding but it also appears that class is auto-generated from the api json definitions so I'm at a loss as to how to fix it. Ideally I think the API definitions should include some specification or assumptions about the the character encodings - maybe it is assumed for string types, but Blobs could conceivably be strings or binary so in addition to content-type, it would be nice if the api response also specified the character encoding. But I am not an expert in this matter. :)

from aws-sdk-ruby.

mullermp avatar mullermp commented on June 23, 2024

I think this can possibly be fixed with a plugin/customization in aws-sdk-translate service for specifically this operation and api member. I can look into this on Monday.

from aws-sdk-ruby.

alextwoods avatar alextwoods commented on June 23, 2024

The TranslateDocument API "supports text, HTML, or Word documents as the input document." The output is documented as "The document format matches the source document format." So I think in cases such as a Word doc we would not want to apply an encoding to this string (and instead your application would need to interpret it as binary data).

Possibly we could add a custom plugin that looks at the type and encoding of the input document and apply the same encoding on the response (eg, if the input document is a String with utf-8 encoding, then we can ensure the output document is also a String with utf-8).

from aws-sdk-ruby.

dgm avatar dgm commented on June 23, 2024

Is there a document that explains the high level architecture to the aws-sdk-ruby build? I see code for plugins etc, but it all appears to be auto generated, and I can't find any documentation on how to play within the system...

from aws-sdk-ruby.

alextwoods avatar alextwoods commented on June 23, 2024

We don't have good documentation on how to add plugins. But if you want to add a plugin in your own code, you can do something like:

class FixTranslateDocumentEncoding < Seahorse::Client::Plugin

  class Handler < Seahorse::Client::Handler
      def call(context)
        # detect encoding
        encoding = "UTF-8" # TODO: actually detect it and ensure it doesn't break for non-string inputs
        # call the rest of the stack, this will build the request, sign it, send it and parse the output
        resp = @handler.call(context)
         # modify the response before returning it upwards in the stack
        resp.translated_document.content = resp.translated_document.content.force_encoding(encoding)
        resp
      end
   end

    def add_handlers(handlers, _config)
      # Handler is early in the call stack
      handlers.add(Handler, step: :initialize, operations: [:translate_document])
    end
end

# Add the plugin to the client
Aws::Translate::Client.add_plugin(FixTranslateDocumentEncoding)

This would apply for all instances of the Translate::Client.

from aws-sdk-ruby.

github-actions avatar github-actions commented on June 23, 2024

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

from aws-sdk-ruby.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.