Giter Club home page Giter Club logo

Comments (4)

jlowe avatar jlowe commented on August 17, 2024 1

you just need to grep the json files

IMO this should not be the default answer. Users don't expect jobs that produce the incorrect output to silently "succeed." Having to specify an extra option and then manually grep afterwards is not user friendly at all. We control the benchmark script, so we should be able to track when queries fail and have the overall Spark application return an error if any queries failed. Ideally it should have these running modes:

  • Default mode: queries are run as they are today. If any query failed (not task attempt failed that succeeded on retry), the overall Spark application returns an error. If the driver needs to write to a json summary file and grep through it itself, then that's what it should do. Queries that fail should never lead to a successful Spark application by default.
  • Hardcore benchmark run mode: max task attempts are configured to one. If any task attempt fails, benchmark performance run is essentially invalid and Spark application fails immediately.

from spark-rapids-benchmarks.

GaryShen2008 avatar GaryShen2008 commented on August 17, 2024

By using --json_summary_folder, the power run will save the status into a json file for each query.
Like our CI job does, you just need to grep the json files and check the query.status[Completed, CompletedWithTaskFailures, Failed].

Close this issue.

from spark-rapids-benchmarks.

GaryShen2008 avatar GaryShen2008 commented on August 17, 2024

I see.
Seems I misunderstood the requirement here. I thought it's just for our benchmark running, which we already added the check step.

For our benchmark, we're not setting spark.task.maxFailures be 1. It should be default as 4.
We still need to report the result when some queries failed or CompletedWithTaskFailures.

We can update the script default to fail the spark job when any query failed but continue to finish all the queries.
And for benchmark running, we'll add one special parameter to disable this feature to make our CI job unblock.

from spark-rapids-benchmarks.

jlowe avatar jlowe commented on August 17, 2024

The benchmark scripts are used by more than just CI. We point users to the benchmark scripts in many of our public presentations, encouraging them to run the benchmarks themselves. Therefore the benchmark scripts need to be as user-friendly as possible.

The script definitely needs to fail if any queries fail, because the benchmark run is clearly invalid at that point. I'm OK if it doesn't fail when there are task failures. Ideally it should complain very loudly when that happens, because that's not a clean benchmark run and will result in lower performance numbers being reported than theoretically should be attainable. Task attempt failures could also indicate OOM situations or other errors that shouldn't be there but where masked by an executor relaunch, and thus we would want to investigate.

from spark-rapids-benchmarks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.