Giter Club home page Giter Club logo

Comments (5)

davidrabinowitz avatar davidrabinowitz commented on June 9, 2024

Can you please elaborate what are your requirements?

from spark-bigquery-connector.

kristopherkane avatar kristopherkane commented on June 9, 2024

We are diagnosing variable job runtime and are looking at a Spark job that does a large read from BigQuery. It is difficult to tell how long the isolated portion of the BigQuery read takes since a stage containing the read might also include something like a broadcast join so the plan view in the Spark History UI doesn't always represent just the BQ portion.

Correction on the connector version, it is 0.30.

The Spark driver logs this by default and I was looking for some other options.

24/03/27 03:31:44 INFO DirectBigQueryRelation: |Querying table xyz.123, parameters sent from Spark:|requiredColumns=[<column>],|filters=[] 24/03/27 03:31:46 INFO ReadSessionCreator: Read session:{"readSessionName":"projects/xyz","readSessionCreationStartTime":"2024-03-27T03:31:44.470062Z","readSessionCreationEndTime":"2024-03-27T03:31:46.047985Z","readSessionPrepDuration":740,"readSessionCreationDuration":837,"readSessionDuration":1577} 24/03/27 03:31:46 INFO ReadSessionCreator: Requested 20000 max partitions, but only received 2 from the BigQuery Storage API for session xyz.123. Notice that the number of streams in actual may be lower than the requested number, depending on the amount parallelism that is reasonable for the table and the maximum amount of parallelism allowed by the system. 24/03/27 03:31:46 INFO BigQueryRDDFactory: Created read session for table 'xyz.123': xyz.123

I don't think readSessionDuration represents the actual time in BQ retrieval. Looks like there has been a lot of work around this recently.

from spark-bigquery-connector.

davidrabinowitz avatar davidrabinowitz commented on June 9, 2024

Are you using filers? Can you please upgrade to version 0.37.0 ? Also, switching to the latest flaor of the connector (spark-3.x-bigquery) may help

from spark-bigquery-connector.

kristopherkane avatar kristopherkane commented on June 9, 2024

Some queries use filters, perhaps most?

A BQ upgrade is on the horizon, I think there is a breaking decimal change sometime after .30 that I haven't looked closely at yet.

Just to be clear, there's nothing on .30 that I can change to DEBUG in a logging config for more activity timing?

from spark-bigquery-connector.

kristopherkane avatar kristopherkane commented on June 9, 2024

https://github.com/GoogleCloudDataproc/spark-bigquery-connector?tab=readme-ov-file#connector-metrics-and-how-to-view-them

Looks pretty good if we can get there.

from spark-bigquery-connector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.