Comments (4)
Hi @themodernlife, Thanks for reporting this issue. I read you blog post and I'm happy that you are trying to run Scalding on Flink. This is very exciting and a good validation for Cascading-Flink.
Cascading on MR or Tez uses its own implementation for joins. This implementation is generic and capable of handling all supported join types (inner, left outer, right outer, full outer, and custom joins) by handing the result of a full outer join to the Joiner class. For example, an InnerJoin discards all results with an empty left or right side.
Cascading-Flink leverages Flink's internal join implementations which are only available for inner and left joins. Therefore, Cascading-Flink needs to check for the Joiner type to make sure that the Joiner is called with all tuple-pairs that will be in the join result. Hence the restriction to InnerJoin and LeftJoin.
I think this issue can be solved by adding a Scalding dependency to Cascading-Flink and check for the type of the wrapped joiner if a WrappedJoiner is found.
Please report any other problem you find when using Cascading-Flink, with Scalding or without.
Thanks again, Fabian
from cascading-flink.
@fhueske can I suggest a different approach? Depending on Scalding (and associated dependencies) might be a no go for Cascading Java users.
What if instead you just added a configuration parameter, something like flink.joiner.strict
(true
by default) which throws an exception if the join types don't match, but just ignores any type checking when false
?
I think this way Scalding users could say "trust me, I know what I'm doing".
from cascading-flink.
Hi @themodernlife, sorry for the late reply.
The Flink connector needs to know the type for the wrapped joiner in order to make the right choice when translating the join into Flink's native joins. Hence, I'm not sure if a configuration switch would do the job here.
I just pushed a fix to extract the wrapped joiner via Java reflection without adding a Scalding dependency. I'm not experienced with Scalding, but tested it with a Java mock-up. Please let me know, if that solves your problem. Also, let me know if you face any other issue.
Thanks, Fabian
from cascading-flink.
I ran a Scalding job with a WrappedJoiner
and the reflection approach is working.
I'll close this issue. Please reopen, if the issue is not resolved.
from cascading-flink.
Related Issues (20)
- Copy methods of DefinedTupleSerializer do not handle null properly HOT 1
- Obtaining Flink plan with './bin/flink info' throws ClassCastException HOT 1
- Make BATCH the default execution mode and add a parameter switch to PIPELINED HOT 1
- setNumSinkParts is currently ignored HOT 2
- exclude *TestCase.java from surefire configuration HOT 1
- Example WordCount program should specify SinkMode.REPLACE for outTap HOT 1
- Bump version in master to 0.2-SNAPSHOT (or later) HOT 3
- Create default Configuration in no-arg FlinkFlowProcess constructor HOT 1
- Sorting with GroupBy doesn't work HOT 5
- Support finer-grained control over parallelism HOT 1
- Source and shuffle parallelism settings don't seem to work
- Push a 0.3 release HOT 9
- Use ConcurrentHashMap for Accumulators HOT 2
- FlinkFlowStepStats.getCountersFor should return counter names, not groupCounter names HOT 3
- Pass JobConf to Schemes, not Configuration
- tag for 0.2 release is missing HOT 1
- stepsAreLocal() of FlinkFlow class is always returning false, even when job runs in local.
- Cannot merge pipes from ORC file
- Cascading-Flink not working with EMR Flink 1.3.2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cascading-flink.