Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

File start/end offset issue for VB file about cobrix HOT 5 OPEN

D3v3sh5ingh commented on September 26, 2024

File start/end offset issue for VB file

from cobrix.

Comments (5)

yruslan commented on September 26, 2024

Hi @D3v3sh5ingh, what's your high level offset layout?

For example:
0 - 19 Headers (to be ignored)
20 - 23 BDW
24 - 27 RDW
28 - 99 Payload
100 - 193 RDW
...
32000 Payload
32093 Footer (to be ignored)

from cobrix.

D3v3sh5ingh commented on September 26, 2024

Hi @yruslan
My high level layout looks like below:
BDW { RDW 45 bytes , RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}
BDW { RDW 1000 bytes .....}
......
BDW { RDW 1000 bytes...., RDW 45 bytes}

45 bytes of header and trailer are inside the BDW as shown above.
We want to remove these 45 bytes of header and trailer present in the file.

from cobrix.

yruslan commented on September 26, 2024

file_start_offset and file_end_offset work on the level of file, e.g. cases like:
HEDAER {45 bytes} BDW { RDW 1000 bytes, RDW 1000 bytes, RDW 1000 bytes , RDW 1000 bytes ....}

Since your 45 headers are part of record payload you can't do it using these options. What you can do is you can add the header as a redefine segment in your copybook, and then you can filter it out after you get the dataframe.

The copybook will looks like this:

01   RECORD.
   05  HEDAER.
        10 CONTENT X(45).
   05 PAYLOAD REDEFINES HEADER.
   ... your payload goes at level 10 here

from cobrix.

D3v3sh5ingh commented on September 26, 2024

Hi ,
This is a sample output for my file . 45 bytes that i want to skip are at the start and at the end only . Not in each record.
If I don't use the file _start_offset and file_end_offset , i am able to get above dataframe as output but I am getting two extra records(Header and Trailer).
But if I use these options with 45 bytes , i face an error ( length of BDW block is too big ) .

from cobrix.

yruslan commented on September 26, 2024

Options 'file_start_offset' and 'file_end_offset' only drop bytes from the beginning or at the end of files, not from the payload. This is the expected behavior.

There are no options that allow dropping bytes from inside records, so possible solutions are:

If you need to keep these special 45-byte records, you can use the modified copybook solution above.
(probably your case) If you want to ignore these special 45-byte records, just remove these records in post-processing, e.g. df.filter(col("COL1").isNotNull)

from cobrix.

Recommend Projects

File start/end offset issue for VB file about cobrix HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent