Comments (1)
Cobrix tries to retain the original COBOL schema when reading copybook+data file. That's why all the root level fields are struct types. You can use .option("schema_retention_policy", "collapse_root")
to expand the root level of the schema. If you need a completely flat schema you can use SparkUtils.flattenSchema()
function.
If you want to retain the structured nature of the data the better way to look at it is by exporting to json. Something like df.toJSON.take(10).foreach(println)
The error message is likely related to the fact the the record length does not divide the size of the binary file. Internally Cobrix uses Spark's binaryRecords()
method. It requires that the total size of the file(s) be evenly divided by record size. The record size is available from the logs (the layout positions part).
Usually this error means that the copybook doesn't match the data. For instance, the copybook has missing fields. You can verify this by carefully comparing first several records reported by Spark against values shown in the mainframe.
from cobrix.
Related Issues (20)
- File start/end offset issue for VB file HOT 5
- RDW headers should never be zero (0,0,0,0). Found zero size record at 4078719. HOT 1
- Stream processing with Flink HOT 1
- Not able to run simple cobol app with java HOT 1
- Process ASCII file with fixed length format HOT 5
- US ASCII file with newline character present within data HOT 1
- ebcdic_code_page for German character Γ€,Γ,ΓΌ HOT 9
- Can I get the raw record bytes from ebcdic file w/out parsing HOT 4
- BBBB in copybook HOT 3
- Is it possible to read a nested Binary Field? HOT 1
- Record length option is ignored when generate record id is turued on
- Add CI/CD for automatic releases
- Reading EBCDIC file with multiple structure HOT 1
- DataBricks Unity Catalog and Cobrix HOT 7
- Reading Variable Length File with OCCCURS DEPENDING HOT 12
- NoClassDefFoundError: Could not initialize class za.co.absa.cobrix.cobol.parser.decoders.FloatingPointDecoders$ HOT 3
- Not able to parse the content correctly when copybook has OCCURS X TIMES DEPENDING ON FIELD_NAME HOT 3
- Support for decimal scaling PV HOT 6
- Can't read multiple main headers defined in single copybook HOT 4
- Add support for parsing copybooks given Spark options
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cobrix.