Giter Club home page Giter Club logo

Comments (5)

JyotiRSharma avatar JyotiRSharma commented on May 21, 2024

But if I run the Main.scala in intelliJ on my base windows machine, it executes with no problem.

22/01/11 12:28:53 INFO Main$: Logging configured!
22/01/11 12:28:55 INFO Main$: Data Validator
22/01/11 12:28:55 INFO ConfigParser$: Parsing `D:\Spark\old\test_config.yaml`
22/01/11 12:28:55 INFO ConfigParser$: Attempting to load `D:\Spark\old\test_config.yaml` from file system
22/01/11 12:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/11 12:29:10 INFO ValidatorConfig: substituteVariables()
22/01/11 12:29:10 INFO Main$: Checking Cli Outputs htmlReport: None jsonReport: None
22/01/11 12:29:10 INFO Main$: filename: None append: false
22/01/11 12:29:10 INFO Main$: filename: None append: true
22/01/11 12:29:10 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:29 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/01/11 12:29:33 INFO Main$: Running sparkChecks
22/01/11 12:29:33 INFO ValidatorConfig: Running Quick Checks...
22/01/11 12:29:33 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:38 INFO ValidatorTable: Results: [1000,68]
22/01/11 12:29:38 INFO ValidatorTable: Total Rows Processed: 1000
22/01/11 12:29:38 ERROR RowBased: Quick check for NullCheck on salary failed, 68 errors in 1000 rows errorCountThreshold: 0
22/01/11 12:29:38 INFO ValidatorTable: keyColumns: registration_dttm, id
22/01/11 12:29:40 INFO ValidatorConfig: Running Costly Checks...
22/01/11 12:29:40 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:40 ERROR Main$: data-validator failed!
DATA_VALIDATOR_STATUS=FAIL

Process finished with exit code -1

Note: I hardcoded the config file in ConfigParser.scala

  def parseFile(filename: String, cliMap: Map[String, String]): Either[Error, ValidatorConfig] = {
    val filename = "D:\\Spark\\old\\test_config.yaml"
    logger.info(s"Parsing `$filename`")

And also, I hardcoded Spark to run locally in the Main.scala like:

  def runChecks(mainConfig: CmdLineOptions, origConfig: ValidatorConfig): (Boolean, Boolean) = {
    val varSub = new VarSubstitution
    implicit val spark = SparkSession.builder.appName("data-validator").master("local").enableHiveSupport().getOrCreate()

Environment Details of base machine:
OS: Windows 10 x64
Java version:

C:\Users\appde>java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)

sbt version:

D:\Spark\old\data-validator-master>sbt version
[info] welcome to sbt 1.5.7 (Oracle Corporation Java 1.8.0_281)

from data-validator.

colindean avatar colindean commented on May 21, 2024

What happens when you set the threshold, e.g.

numKeyCols: 2
numErrorsToReport: 742

tables:
  - parquetFile: /home/jyoti/Spark/userdata1.parquet
    checks:
      - type: nullCheck
        column: salary
        threshold: 0
        # or
        threshold: "0"

It should be optional, though. We've almost always specified it.

from data-validator.

colindean avatar colindean commented on May 21, 2024

Actually, I found it: /opt/spark/spark-3.1.2-bin-hadoop3.2.

DV doesn't support Spark 3 yet, so all bets are off.

But try something: apply this patch to change the literals:

From 0739e46d2c7ec01d908274aa3a83edd7263fc73a Mon Sep 17 00:00:00 2001
From: Colin Dean <[email protected]>
Date: Tue, 11 Jan 2022 11:09:50 -0500
Subject: [PATCH] Use a long literal when creating a Spark SQL literal

---
 .../com/target/data_validator/validator/ValidatorBase.scala   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala b/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
index d815a78..ef8d1eb 100644
--- a/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
+++ b/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
@@ -136,8 +136,8 @@ object ValidatorBase extends LazyLogging {
   private val backtick = "`"
   val I0: Literal = Literal.create(0, IntegerType)
   val D0: Literal = Literal.create(0.0, DoubleType)
-  val L0: Literal = Literal.create(0, LongType)
-  val L1: Literal = Literal.create(1, LongType)
+  val L0: Literal = Literal.create(0L, LongType)
+  val L1: Literal = Literal.create(1L, LongType)
 
   def isValueColumn(v: String): Boolean = v.startsWith(backtick)
 
-- 
2.34.1

from data-validator.

colindean avatar colindean commented on May 21, 2024

We may support Spark 3 after #84.

from data-validator.

JyotiRSharma avatar JyotiRSharma commented on May 21, 2024

Thanks Colin, I will check it out... 😃

from data-validator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.