Giter Club home page Giter Club logo

Exception in thread "main" za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 16: Unable to parse the value of LEVEL. Numeric value expected, but 'CUSTOM-CHANGE-FLAGS-CNT' encountered about cobrix HOT 42 CLOSED

absaoss avatar absaoss commented on May 27, 2024
Exception in thread "main" za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException: Syntax error in the copybook at line 16: Unable to parse the value of LEVEL. Numeric value expected, but 'CUSTOM-CHANGE-FLAGS-CNT' encountered

from cobrix.

Comments (42)

yruslan avatar yruslan commented on May 27, 2024

The first 5 characters of each line are considered comments and are ignored. At line 16 you have 3 <tab> characters instead of 5 spaces. That's the reason for the error.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

thank you. Can you please answer my other questions also. I am working on this and stuck. I just started working on these files. If you help me the tips how the copybook files need to cleaned it helps me a lot.

Thanks a lot
Appreciate your cooperation

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

When I parsed another copybook file schema is parsed but whiie parsing the data file I got the following error'. Please let me know how I can resolve this issue in data file
Exception in thread "main" java.lang.IllegalArgumentException: There are some files in /user/abc_binary that are NOT DIVISIBLE by the RECORD SIZE calculated from the copybook (3835 bytes per record). Check the logs for the names of the files.
at za.co.absa.cobrix.spark.cobol.source.scanners.CobolScanners$.buildScanForFixedLength(CobolScanners.scala:87)
at za.co.absa.cobrix.spark.cobol.source.CobolRelation.buildScan(CobolRelation.scala:85)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:348)

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

This error happens when you try to load a fixed record length file where the record length does not divide the file size. In your case, the file size should be evenly divisible by 3853.

This can happen

  • when a copybook does not completely match the data file
  • when the file is a multisegment variable record length file

In case the file is a multisegment variable record length one, you need to add .option("is_record_sequence", "true"). In this case, the parser will expect 4-byte RDW headers for each record. The fields for that header should not be present in the copybook itself.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Hi,

Thanks a lot for your suggestion.
Please let me know how can I check the log files.
I have added the .option("is_record_sequence", "true") and with this jar I tried execute different file. I got the below error

ERROR FileUtils$: File hdfs://xyz/abc IS NOT divisible by 17163.
Exception in thread "main" java.lang.IllegalArgumentException: There are some files in /user/vabc/binaryfile that are NOT DIVISIBLE by the RECORD SIZE calculated from the copybook (17163 bytes per record). Check the logs for the names of the files.
at za.co.absa.cobrix.spark.cobol.source.scanners.CobolScanners$.buildScanForFixedLength(CobolScanners.scala:87)
at za.co.absa.cobrix.spark.cobol.source.CobolRelation.buildScan(CobolRelation.scala:85)

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Could you please send the snippet of code you use for reading the file, e.g. the line that starts with spark.read(...)?

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Hi below is the .scala class code I am using for parsing the mainframe copybook and data file. Please suggest me what changes I need to make in code or in the copybook or binary file to parse this correctly.

Thanks a lot for checking my issues and helping me to parse the mainframe file.

package com.cobrix

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import za.co.absa.cobrix.spark.cobol.source
import za.co.absa.cobrix.spark.cobol._
import za.co.absa.cobrix.cobol.parser.CopybookParser
import za.co.absa.cobrix.spark.cobol.schema.{CobolSchema, SchemaRetentionPolicy}

import za.co.absa.cobrix.spark.cobol.utils.SparkUtils

object cobrixtest extends Serializable {
def main(args: Array[String]): Unit = {
val sparkConf: SparkConf = new SparkConf().setAppName("cobrixtest")
val v_copybook = args(1)
val v_data = args(0)
println(v_copybook)
val spark: SparkSession = SparkSession.builder.config(sparkConf).enableHiveSupport().getOrCreate()
import spark.implicits._
val cobolDataframe = spark
.read
.format("cobol")
.option("generate_record_id", false) // this adds the file id and record id
.option("is_record_sequence", "true") // reader to use 4 byte record headers to extract records from a mainframe file
.option("schema_retention_policy","collapse_root") //removes the root record header
.option("copybook", v_copybook)
.load(v_data)
cobolDataframe.printSchema()
cobolDataframe.show(300,false)
}

}

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Interesting. This error should not happen on variable record length files. Which version of Cobrix are you using?

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

I am using 0.4.2 cobrix version libraries. Also I was using the scala version 2.11.8. spark 2.1.1. Also this what I have. I can use lower version of scala.
Please let me know what version of cobrix need to be used. Also what version of scala and spark need to be used.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

From my perspective everything looks good: the program, the version of Cobrix, Spark and Scala. The strange thing is that the error message you are having occurs only when reading fixed record length files. But .option("is_record_sequence", "true") should turn on variable record length reader that don't throw that particular error.

Is it possible to get an example data file and a copybook that causes that to reproduce the error at our side?

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

I have placed the jar file on the worker node and the copybook and binary files are in hdfs. Is this correct. Please confirm.
I am getting below errors. for that I have added the line
000550 05 FILLER PIC X(04). AMTR010
in the copy book. but the same error is coming.

java.lang.IllegalStateException: RDW headers should never be zero (0,100,0,0). Found zero size record at 0.
at za.co.absa.cobrix.cobol.parser.decoders.BinaryUtils$.extractRdwRecordSize(BinaryUtils.scala:305)
at za.co.absa.cobrix.spark.cobol.reader.index.IndexGenerator$.getNextRecordSize(IndexGenerator.scala:136)
at za.co.absa.cobrix.spark.cobol.reader.index.IndexGenerator$.sparseIndexGenerator(IndexGenerator.scala:58)

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Can you please let me know how to fix the above issue. I am stuck here.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

For the 4 byte RDW header there should not be an entry in the copybook. So, please remove the FILLER.

But from what I can see from the values of the RDW header(0,100, 0, 0), is that it is possible that your RDW headers are big endian. To load files that have big endian RDW use this option: .option("is_rdw_big_endian", "true")

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thank you so much. I tried what you recommended..
I have removed the filler from the copybook.
I have added .option("is_rdw_big_endian", "true") and ran it.
Same error appears again.
are there any options left for me to try for parsing my data file.
If I am able to parse these files that helps me a lot.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

I would like to help, but unfortunately mainframe data files are very versatile. In order to parse a mainframe file we need to understand how records are placed in the data, which headers does the data file have, be sure that the copybook properly matches the data file, etc.

We have tried different combinations of options and I'm out of suggestions that can be just tried and checked. If you have a small example of a similar file and the corresponding copybook, we cold look at it and try to figure out what is needed to parse it properly.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thank you so much for your time. According to my company policies I cannot share the data. I am not a mainframe guy so I cannot generate myself sample data.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

how can i run the unit tests za.co.absa.cobrix.spark.cobol with all the unit tests with data files in the data folder. Can they run local or do we need to move all files to the hdfs and then run them. In hdfs how do we run the tests. Please help to how to check the log files also how to run the unit tests.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

All unit tests can be ran using mvn test or mvn clean test at project's root directory. It will run everything in local mode, no need to copy files to HDFS.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Hi, I am getting below error when I am packaging at the maven life cycle. Please let me know How to resolve this

01:26:47.163 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
01:26:47.523 ERROR org.apache.hadoop.util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
at org.apache.hadoop.util.Shell.(Shell.java:386)
at org.apache.hadoop.util.StringUtils.(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
at org.apache.hadoop.security.Groups.(Groups.java:93)
at org.apache.hadoop.security.Groups.(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2427)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2427)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2427)
at org.apache.spark.SparkContext.(SparkContext.scala:295)

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

This is a known issue of running Spark on Windows.
https://stackoverflow.com/a/39525952/1038282

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thank you for your reply.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

I sent files to your email. Can you do the favor of checking them and let me know the issue for fix. Or what should I do to make the files parsing.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Received the files, will take a look. It will take some time. Likely will get back to you for more questions. So far the copybook seems quite complex and the record structure of the file is not obvious. No guarantees I'd be able to figure it out.

What could also be very helpful if you can get the first couple of records of this file in a parsed format, like csv. It will be easier for me to figure out where one record ends and the other record begins.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Hi
I have a quick question. Is cobrix supports nested occurs. If yes how many levels it supports.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Yes, this should be supported with arbitrary number of levels. We didn't specifically tested this scenario, but the code is generic enough to cover this case.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

For fixed width file when I am trying to parse I am getting the below error. Can you please help me how to fix this.

/* 495328 /
/
495329 / mutableRow.update(0, value);
/
495330 / }
/
495331 /
/
495332 / return mutableRow;
/
495333 / }
/
495334 */ }

org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection has grown past JVM limit of 0xFFFF
at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
at org.codehaus.janino.util.ClassFile.addConstantFieldrefInfo(ClassFile.java:342)
at org.codehaus.janino.UnitCompiler.writeConstantFieldrefInfo(UnitCompiler.java:11109)

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

It looks like you are reaching a limit of JVM on Spark's code generation stage. Try creating a Spark session with codegen turned off. Something like this:

  val spark: SparkSession = SparkSession.builder()
    .master("local[*]")
    .appName("Example")
    .config("spark.sql.codegen.wholeStage", value = false)
    .getOrCreate()

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thanks a lot. Your suggestion worked. Fixed width files are parsing. But I have another quick question. For fixed width file If I have filler at the end in the copybook file it is unable to parse the data file and data file has blanks at the end. Is there any fix for this. Please suggest me how to overcome this.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

It is great to hear that files are parsing now, at least partially. But I'm sorry, not completely following what is the issue. Could you please describe this using a simplified example?

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Below is the example for fixed width file having fillers at the end.
our copybook ends like
15 CUSTOM-STATUS PIC X(01).
15 FILLER PIC X(15).
15 FILLER PIC X(773).

and data file had blanks at the end.
I am unable to parse this file as a fixed width file. blanks at the end is not handled getting the errolr NOT DIVISIBLE .....

If the file does not have FILLER at the end in the copybook I am able to parse the file.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Do I understand it right that the file has 773 bytes at the end that should be ignored?

There is a feature planned to be introduced - file headers and footers. Using this new feature you can specify how many bytes to ignore at the beginning and at the end of a file. Will le t you know when thie feature is available.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Hi Ruslan,

I am trying to parse copybook which has 4000 columns and their binary file.
I am again getting the thousands of lines of code on the screen and JVM error
after adding .config("spark.sql.codegen.wholeStage", value = false).
My spark version is 2.1.1
For me the solution is only to work with RDDs. Is there any other solution.
I read online that spark 2.3 has the fix for this.
Please let me know what are my options.
Also I have another question. Is the cobrix works with spark 2.3 and spark 2.4 versions.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Yes, newer versions of Spark handle wide column dataframes (with thouthands of columns) much better.
Yes, you can use Spark 2.3 and Spark 2.4, as long as you use the version with Scala 2.11 (not Scala 2.12).

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thank you. But is there any way to work with spark 2.1 to fix this issue.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Not sure. It depends on your exact use case. Handling wide dataframes definitely got better in 2.3. But the codegen error you got seems odd, So really hard to tell.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

The issue with 773 spaces is related to #87

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

We renamed FILLER to something else it worked. Thanks for your time.

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Also my big lines of code scrolling in output for huge number of columns in copybook is solved when I used spark 2.4. But I was looking for something in 2.1.1

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

I have field in copybook with packed decimal fields having length as S9(X)V9(8). The values after parsing for these kinds of fields are coming as 0E-8 instead of 0. Is there any way we can fix this. Please advise.

from cobrix.

yruslan avatar yruslan commented on May 27, 2024

Short answer: 0 and 0E-8 are the same values. They are just displayed differently on a screen depending on a tool you use.

The picture of S9(10)V9(8). converts to a Spark decimal(10,8) value. It is a fixed point decimal type. I presume that for Spark methods, like df.show(), the scientific format is chosen so it would be clear for a viewer that the column has a decimal type.

What is your output format (Parquet, JSON, CSV, etc)? Does the scientific notation present in files themselves?

from cobrix.

geethab123 avatar geethab123 commented on May 27, 2024

Thank you. I will check the files.

from cobrix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.