Comments (11)
@nevillelyh - Can you give an estimate of how deep the nested records go?
from hadoop-connectors.
4-5 layers, and some of them are Array or Map.
from hadoop-connectors.
This is also happening to Avro files exported via bigquery console. Should I file a bug to the BQ folks?
from hadoop-connectors.
Hi I'm hitting this issue again with Dataflow Java SDK now that it exports BigQuery as Avro.
Here are some example files that trigger the error:
gs://neville-steel-eu/avro-test/
It might be related to this:
http://stackoverflow.com/questions/24130615/circular-references-not-handled-in-avro
Can anyone look into it?
from hadoop-connectors.
@nevillelyh - Can you provide a full object name in your avro-test bucket that has public read access? I don't have permission to list objects within that bucket.
from hadoop-connectors.
While not a stack overflow, I've managed to produce bad avro files (parse errors that shouldn't be parse errors). I'm currently filing a bug with BQ, but your test case would also be nice in case they are separate bugs.
from hadoop-connectors.
@AngusDavis and I are collaborating internally and I can share with him what he needs -- pretty sure these are the same underlying cause at the end of the day.
from hadoop-connectors.
@AngusDavis I've made the avro file public: gs://neville-steel-eu/avro-debug/000000000000
@dhalperi yes that's my guess too. Thanks for looking into this!
from hadoop-connectors.
@nevillelyh - Thanks, got it.
from hadoop-connectors.
Hi Neville, I believe the underlying bug in the BigQuery Avro file generator has been fixed. Thanks for the report and the reproduction -- this was crucial to successful resolution!
from hadoop-connectors.
@dhalperi, @nevillelyh - Thanks both. Marking this issue closed.
from hadoop-connectors.
Related Issues (20)
- Issue with cached credentials when attempting to use different keyfiles in the same Spark App HOT 1
- Test failures after HADOOP-18724
- Question: How to use gcs-connector on GKE with Workload Identity HOT 1
- BQ storage libray blocked on update to grpc v1.56
- GoogleCloudStorageFileSystem#delete recursive does not page
- Memory issues while running Apache Spark streaming applications on Google Dataproc cluster | OutOfMemoryError Java heap space
- flumk sink hdfs to gcs, all gcs write thread blocked
- how to transfer file from local to gcs bucket using dataproc hadoop in intellij
- GCS Connector fails with StackOverflowError during accessing hadoop credentials
- GhfsStorageStatistics cannot be cast ERROR HOT 9
- Support disabling automatic decompression of gzip files in GCS connector
- gcs-connector 3.0 not working with pyspark HOT 5
- Can ServiceAccountJsonKeyFile be ignored when ServiceAccountPrivateKeyId is set? HOT 1
- Custom implementation of AccessTokenProvider doesn't work according to 2.2.10 documentation. HOT 2
- DirectPath have issues with CloudBuild
- We used a vulnerability scanning tool (SecBinaryCheck)to scan gcs-connector-hadoop3-2.2.6-shaded.jar and found high-risk security vulnerabilities. Is this an issue?? HOT 1
- Writing single file ended with ~9 GCS calls
- GCS Connector sending generationMatch = 0 for existing objects HOT 7
- DirectPath Unauthorized access issues using java-storage
- Unauthorized access on gRPC via DirectPath have issues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hadoop-connectors.