Comments (3)
Hi, could you send an example of such a file with a copybook? It seems odd that the file is both pipe-separated and fixed record length.
from cobrix.
Sample below. Here every record is of length of 228
Index|Customer Id|First Name|Last Name|Company|City|Country|Phone 1|Phone 2|Email|Subscription Date|Website
1|DD37Cf93aecA6Dc|Sheryl|Baxter|Rasmussen Group|East Leonard|Chile|229.077.5154|397.884.0519x718|[email protected]|2020-08-24|http://www.stephenson.com/
2|1Ef7b82A4CAAD10|Preston|Lozano|Vega-Gentry|East Jimmychester|Djibouti|5153435776|686-620-1820x944|[email protected]|2021-04-23|http://www.hobbs.com/
3|6F94879bDAfE5a6|Roy|Berry|Murillo-Perry|Isabelborough|Antigua and Barbuda|+1-539-402-0259|(496)978-3969x58947|[email protected]|2020-03-25|http://www.lawrence.com/
from cobrix.
The file format looks like a pipe-delimited CSV. You can use spark-csv
to convert it into a DataFrame: https://spark.apache.org/docs/latest/sql-data-sources-csv.html
val df = spark.read
.format("csv")
.option("header", "true")
.option("delimiter", "|")
.option("inferSchema", "true")
.load("/path/to/file/or/folder")
df.show()
from cobrix.
Related Issues (20)
- Missing SIgn for few fileds that are negative HOT 5
- PIC S9(10)V USAGE COMP-3 is converted to long instead of Decimal(10,0) HOT 7
- comp-3 values parsing issues HOT 2
- Shade ANTLR runtime in the parser to avoid ANTLR potential incompatibility issues
- Under some circumstances Cobrix selects wrong record reader failing the Spark job
- Add a feature to collapse structs or the output data
- Add support for `_` for key generation
- DataFrame with some columns in EBCDIC HOT 1
- How to read a EBCDIC file with multiple columns HOT 30
- Metadata copying method does not retain existing metadata HOT 3
- EBCDIC to ASCII file conversion HOT 2
- Add support for COMP-3 numbers without the sign nibble HOT 20
- java.lang.AssertionError: assertion failed: Byte array does not have correct length HOT 14
- Add maximum length metadata for 'seg_id0', ... fields
- Add EBCDIC writer HOT 1
- Blank fields are not recognized. HOT 3
- Spaces getting trimmed in Cobrix HOT 3
- Does Cobrix provide a way to convert Text File to EBCDIC Binary fromat Conversion HOT 1
- `seg_id0` is duplicated for the root segment for big files when multiple files are loaded
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cobrix.