Comments (5)
But if I run the Main.scala in intelliJ on my base windows machine, it executes with no problem.
22/01/11 12:28:53 INFO Main$: Logging configured!
22/01/11 12:28:55 INFO Main$: Data Validator
22/01/11 12:28:55 INFO ConfigParser$: Parsing `D:\Spark\old\test_config.yaml`
22/01/11 12:28:55 INFO ConfigParser$: Attempting to load `D:\Spark\old\test_config.yaml` from file system
22/01/11 12:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/01/11 12:29:10 INFO ValidatorConfig: substituteVariables()
22/01/11 12:29:10 INFO Main$: Checking Cli Outputs htmlReport: None jsonReport: None
22/01/11 12:29:10 INFO Main$: filename: None append: false
22/01/11 12:29:10 INFO Main$: filename: None append: true
22/01/11 12:29:10 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:29 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
22/01/11 12:29:33 INFO Main$: Running sparkChecks
22/01/11 12:29:33 INFO ValidatorConfig: Running Quick Checks...
22/01/11 12:29:33 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:38 INFO ValidatorTable: Results: [1000,68]
22/01/11 12:29:38 INFO ValidatorTable: Total Rows Processed: 1000
22/01/11 12:29:38 ERROR RowBased: Quick check for NullCheck on salary failed, 68 errors in 1000 rows errorCountThreshold: 0
22/01/11 12:29:38 INFO ValidatorTable: keyColumns: registration_dttm, id
22/01/11 12:29:40 INFO ValidatorConfig: Running Costly Checks...
22/01/11 12:29:40 INFO ValidatorParquetFile: Reading parquet file: D:\Spark\old\DemoTime\userdata1.parquet
22/01/11 12:29:40 ERROR Main$: data-validator failed!
DATA_VALIDATOR_STATUS=FAIL
Process finished with exit code -1
Note: I hardcoded the config file in ConfigParser.scala
def parseFile(filename: String, cliMap: Map[String, String]): Either[Error, ValidatorConfig] = {
val filename = "D:\\Spark\\old\\test_config.yaml"
logger.info(s"Parsing `$filename`")
And also, I hardcoded Spark to run locally in the Main.scala like:
def runChecks(mainConfig: CmdLineOptions, origConfig: ValidatorConfig): (Boolean, Boolean) = {
val varSub = new VarSubstitution
implicit val spark = SparkSession.builder.appName("data-validator").master("local").enableHiveSupport().getOrCreate()
Environment Details of base machine:
OS: Windows 10 x64
Java version:
C:\Users\appde>java -version
java version "1.8.0_291"
Java(TM) SE Runtime Environment (build 1.8.0_291-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.291-b10, mixed mode)
sbt version:
D:\Spark\old\data-validator-master>sbt version
[info] welcome to sbt 1.5.7 (Oracle Corporation Java 1.8.0_281)
from data-validator.
What happens when you set the threshold
, e.g.
numKeyCols: 2
numErrorsToReport: 742
tables:
- parquetFile: /home/jyoti/Spark/userdata1.parquet
checks:
- type: nullCheck
column: salary
threshold: 0
# or
threshold: "0"
It should be optional, though. We've almost always specified it.
from data-validator.
Actually, I found it: /opt/spark/spark-3.1.2-bin-hadoop3.2
.
DV doesn't support Spark 3 yet, so all bets are off.
But try something: apply this patch to change the literals:
From 0739e46d2c7ec01d908274aa3a83edd7263fc73a Mon Sep 17 00:00:00 2001
From: Colin Dean <[email protected]>
Date: Tue, 11 Jan 2022 11:09:50 -0500
Subject: [PATCH] Use a long literal when creating a Spark SQL literal
---
.../com/target/data_validator/validator/ValidatorBase.scala | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala b/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
index d815a78..ef8d1eb 100644
--- a/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
+++ b/src/main/scala/com/target/data_validator/validator/ValidatorBase.scala
@@ -136,8 +136,8 @@ object ValidatorBase extends LazyLogging {
private val backtick = "`"
val I0: Literal = Literal.create(0, IntegerType)
val D0: Literal = Literal.create(0.0, DoubleType)
- val L0: Literal = Literal.create(0, LongType)
- val L1: Literal = Literal.create(1, LongType)
+ val L0: Literal = Literal.create(0L, LongType)
+ val L1: Literal = Literal.create(1L, LongType)
def isValueColumn(v: String): Boolean = v.startsWith(backtick)
--
2.34.1
from data-validator.
We may support Spark 3 after #84.
from data-validator.
Thanks Colin, I will check it out... 😃
from data-validator.
Related Issues (20)
- Lock dependencies with sbt-dependency-lock
- Move off of olafurpg/setup-scala in Actions HOT 2
- ColumnSumCheck should treat an unexpected type as a test failure, not exception HOT 4
- SQL variable substitution fails when result is a double
- createKeySelect log msg is potentially redundant and should not be at error level
- Range check configuration at debug log level
- Add support for variable substitution for `minNumRows` in `rowCount` check
- Move back to GitHub Actions HOT 1
- distinctCountCheck as a validator
- Test fails for ConfigVarSpec HOT 7
- Attempt to send email should be retried if it fails HOT 2
- Use a better email abstraction
- Thresholds parsed as JSON floats are ignored HOT 1
- Allow format+options to be passed before Hive query
- Use sbt-scalafmt instead of installing scalafmt with Coursier
- Enable a configuration check using com.target.data_validator.ConfigParser#main
- Rename "tables" concept
- Release builds for newer versions of Scala and Spark HOT 3
- Improve end-to-end testing and include the ValidatorSpecifiedFormatLoader in config parsing tests
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-validator.