Giter Club home page Giter Club logo

Comments (5)

yruslan avatar yruslan commented on September 26, 2024

Hi @D3v3sh5ingh, what's your high level offset layout?

For example:
0 - 19 Headers (to be ignored)
20 - 23 BDW
24 - 27 RDW
28 - 99 Payload
100 - 193 RDW
...
32000 Payload
32093 Footer (to be ignored)

from cobrix.

D3v3sh5ingh avatar D3v3sh5ingh commented on September 26, 2024

Hi @yruslan
My high level layout looks like below:
BDW { RDW 45 bytes , RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}
BDW { RDW 1000 bytes .....}
......
BDW { RDW 1000 bytes...., RDW 45 bytes}

45 bytes of header and trailer are inside the BDW as shown above.
We want to remove these 45 bytes of header and trailer present in the file.

from cobrix.

yruslan avatar yruslan commented on September 26, 2024

file_start_offset and file_end_offset work on the level of file, e.g. cases like:
HEDAER {45 bytes} BDW { RDW 1000 bytes, RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}

Since your 45 headers are part of record payload you can't do it using these options. What you can do is you can add the header as a redefine segment in your copybook, and then you can filter it out after you get the dataframe.

The copybook will looks like this:

01   RECORD.
   05  HEDAER.
        10 CONTENT X(45).
   05 PAYLOAD REDEFINES HEADER.
   ... your payload goes at level 10 here

from cobrix.

D3v3sh5ingh avatar D3v3sh5ingh commented on September 26, 2024

Hi ,
This is a sample output for my file . 45 bytes that i want to skip are at the start and at the end only . Not in each record.
If I don't use the file _start_offset and file_end_offset , i am able to get above dataframe as output but I am getting two extra records(Header and Trailer).
But if I use these options with 45 bytes , i face an error ( length of BDW block is too big ) .

IMG-20231130-WA0007

from cobrix.

yruslan avatar yruslan commented on September 26, 2024

Options 'file_start_offset' and 'file_end_offset' only drop bytes from the beginning or at the end of files, not from the payload. This is the expected behavior.

There are no options that allow dropping bytes from inside records, so possible solutions are:

  • If you need to keep these special 45-byte records, you can use the modified copybook solution above.
  • (probably your case) If you want to ignore these special 45-byte records, just remove these records in post-processing, e.g. df.filter(col("COL1").isNotNull)

from cobrix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.