Giter Club home page Giter Club logo

Comments (4)

fhueske avatar fhueske commented on June 20, 2024

Hi @themodernlife, Thanks for reporting this issue. I read you blog post and I'm happy that you are trying to run Scalding on Flink. This is very exciting and a good validation for Cascading-Flink.

Cascading on MR or Tez uses its own implementation for joins. This implementation is generic and capable of handling all supported join types (inner, left outer, right outer, full outer, and custom joins) by handing the result of a full outer join to the Joiner class. For example, an InnerJoin discards all results with an empty left or right side.

Cascading-Flink leverages Flink's internal join implementations which are only available for inner and left joins. Therefore, Cascading-Flink needs to check for the Joiner type to make sure that the Joiner is called with all tuple-pairs that will be in the join result. Hence the restriction to InnerJoin and LeftJoin.

I think this issue can be solved by adding a Scalding dependency to Cascading-Flink and check for the type of the wrapped joiner if a WrappedJoiner is found.

Please report any other problem you find when using Cascading-Flink, with Scalding or without.
Thanks again, Fabian

from cascading-flink.

themodernlife avatar themodernlife commented on June 20, 2024

@fhueske can I suggest a different approach? Depending on Scalding (and associated dependencies) might be a no go for Cascading Java users.

What if instead you just added a configuration parameter, something like flink.joiner.strict (true by default) which throws an exception if the join types don't match, but just ignores any type checking when false?

I think this way Scalding users could say "trust me, I know what I'm doing".

from cascading-flink.

fhueske avatar fhueske commented on June 20, 2024

Hi @themodernlife, sorry for the late reply.
The Flink connector needs to know the type for the wrapped joiner in order to make the right choice when translating the join into Flink's native joins. Hence, I'm not sure if a configuration switch would do the job here.

I just pushed a fix to extract the wrapped joiner via Java reflection without adding a Scalding dependency. I'm not experienced with Scalding, but tested it with a Java mock-up. Please let me know, if that solves your problem. Also, let me know if you face any other issue.

Thanks, Fabian

from cascading-flink.

fhueske avatar fhueske commented on June 20, 2024

I ran a Scalding job with a WrappedJoiner and the reflection approach is working.
I'll close this issue. Please reopen, if the issue is not resolved.

from cascading-flink.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.