Giter Club home page Giter Club logo

Comments (7)

mingmwang avatar mingmwang commented on May 18, 2024

I can work on this improvement.

from arrow-ballista.

mingmwang avatar mingmwang commented on May 18, 2024

Other improvement I can think of is if the shuffle data was colocated on the same host with the shuffle reader, we should allow the reader to read from disk directly(LocalShuffle Reader) instead of reading from remote Rpc.

from arrow-ballista.

metesynnada avatar metesynnada commented on May 18, 2024

@mingmwang Do you need help? I may help with the implementation.

from arrow-ballista.

mingmwang avatar mingmwang commented on May 18, 2024

@metesynnada @andygrove

Sorry, I do not get a chance to look into this. Regarding the issue, I'm not sure we would like to fix it in this way. In my opinion, even read empty partitions, ShuffleReaderExec should not return Error or cause any data quality issue.
We will do some test and see what the specific error it is.

@yahoNanJing

from arrow-ballista.

mingmwang avatar mingmwang commented on May 18, 2024

Let the CompletedTask return the partition stats is quite heavy. Imaging we have 1000 map tasks and 1000 reduce tasks(partition = 1000), the stats will become 1M.

from arrow-ballista.

mingmwang avatar mingmwang commented on May 18, 2024

@thinkharderdev

from arrow-ballista.

Ted-Jiang avatar Ted-Jiang commented on May 18, 2024

I update the tpch-q3 to test, intend to use o_shippriority = 'none' to produce this

select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue,
    o_orderdate,
    o_shippriority
from
    customer,
    orders,
    lineitem
where
        c_mktsegment = 'BUILDING'
  and c_custkey = o_custkey
  and l_orderkey = o_orderkey
  and o_orderdate < date '1995-03-15'
  and o_shippriority = 'none'
group by
    l_orderkey,
    o_orderdate,
    o_shippriority
order by
    revenue desc,
    o_orderdate;
0 rows in set. Query took 11.347 seconds.

@andygrove how could you produce this error?

from arrow-ballista.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.