Comments (4)
Interesting idea. This could be helpful to diagnose decoding issues.
Although we cannot make Spark show the original bytes in hex, but we can add additional fields to the output dataframe. For instance, if a schema has ID
, FIRST-NAME
and LAST-NAME
and if the debug option is turned on, the schema will contain additional ID_DEBUG
, FIRST-NAME_DEBUG
and LAST-NAME_DEBUG
fields containing HEX values of the original data before decoding.
Please, clarify a couple of things about your use case:
- Do you want to debug a particular column or all columns in the schema?
- The HEX values should correspond to the original data before conversion to ASCII/Unicode, right?
from cobrix.
I was thinking about all the columns and yes the HEX values are original data before conversion. Truly speaking, I was also trying to modify the source code to have that functionality for FixedLengthNested option only right now. I can share the code with you if you want, may be that requires some standardisation. Thanks for your interest, please let me know your email so that I can send these codes for your review.
from cobrix.
Great, thanks for the answers! I think this is a helpful feature and we are going to implement it.
You can send your code as a pull request, but it is not necessary. The feature seems pretty straightforward.
from cobrix.
π @bprasen, finally this very helpful feature is implemented and it is a part of 2.0.5
released today.
from cobrix.
Related Issues (20)
- copybook meta data for RDBMS HOT 5
- ADLS support HOT 1
- Mainframe Condensed data HOT 1
- Is it possible to flatten a nested schema so all values are the root? HOT 5
- COMP-3 field is being read with a value 3 less than expected value HOT 3
- Df to sas7bdat file writer HOT 3
- Installing Cobrix Libraries HOT 1
- record_format VB file fails with length of BDW block is too big HOT 7
- File start/end offset issue #601 HOT 4
- Make project Spark 3.5 compatible.
- File start/end offset issue for VB file HOT 5
- RDW headers should never be zero (0,0,0,0). Found zero size record at 4078719. HOT 1
- Stream processing with Flink HOT 1
- Not able to run simple cobol app with java HOT 1
- Process ASCII file with fixed length format HOT 5
- US ASCII file with newline character present within data HOT 1
- ebcdic_code_page for German character Γ€,Γ,ΓΌ HOT 9
- Can I get the raw record bytes from ebcdic file w/out parsing HOT 4
- BBBB in copybook HOT 3
- Is it possible to read a nested Binary Field? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cobrix.