Comments (8)
You can use the .computeDataUpTo
method on OpWorkflow
(instead of .train
) for this.
from transmogrifai.
Thanks for replying quickly!
So I wrote
val df = new OpWorkflow()
.setInputDataset(passengersData)
.setResultFeatures(pred)
.computeDataUpTo(checkedFeatures)
But as I get a dataframe instead of a model (in the previous code), I cannot access the feature importances nor the statistics that I would access using
model.modelInsights(pred)
or
val metadata = fittedWorkflow.getOriginStageOf(checkedFeatures).getMetadata()
val summaryData = SanityCheckerSummary.fromMetadata(metadata.getSummaryMetadata())
Is there a way to get those informations (selected features + correlation statistics, etc) with no model training?
from transmogrifai.
I'm not sure this is possible. Going through ModelInsights
is by far the easiest solution.
In order to reduce training times, you can override and reduce the grid of models and hyperparameters trained, as such:
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.1)).build())
BinaryClassificationModelSelector.withCrossValidation(modelsAndParameters = models)
from transmogrifai.
// Automated feature validation and selection
val checkedFeatures = survived.sanityCheck(featureVector, removeBadFeatures = true)
// Setting up a TransmogrifAI workflow and training the model
val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures( checkedFeatures).train()
println("Model summary:\n" + model.modelInsights(checkedFeatures))
Should do it - basically the feature you pass in as a result will be the final one computed up to and you can get insights up to whatever level the DAG computes.
from transmogrifai.
Basically the workflow will compute the dag necessary to produce the feature passed in as a resultFeature. If you change the result feature to be the output of the sanityChecker it will only do those computations.
from transmogrifai.
Model Insights will work without a model run (the model part will just be empty. If you then want to add in a model using the already fit sanity checker you can do it like this:
from transmogrifai.
Thank you, unfortunately, I'm confronted with another issue.
I had the same problem as the one met in this issue
#540
(I'm running on Cloudera with this Spark version: 2.4.0-cdh6.3.4)
So I tried to overwrite the Jackson module scala dependency with the last one that implements EitherModule (2.7.3):
My build.sbt file
name := "test-transmogrif"
version := "1.0"
scalaVersion := "2.11.12"
resolvers += Resolver.jcenterRepo
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.0" % "provided",
"org.apache.spark" %% "spark-mllib" % "2.4.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.4.0" % "provided",
"com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.7.0",
"com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.3"
)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
My running script:
val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.1)).build())
val prediction = BinaryClassificationModelSelector.withTrainValidationSplit(modelsAndParameters = models).setInput(target, checkedFeatures).getOutput()
val workflow = new OpWorkflow().setInputDataset(dataReader).setResultFeatures(prediction)
val model = workflow.train()
My project/plugins.sbt file
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
But I had this log when I ran spark-submit on the mounted jar (with sbt assembly) at the workflow.train() step:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Class com.fasterxml.jackson.module.scala.OpDefaultScalaModule$ does not implement the requested interface com.fasterxml.jackson.module.scala.modifiers.SeqTypeModifierModule
at com.fasterxml.jackson.module.scala.modifiers.SeqTypeModifierModule$class.$init$(SeqTypeModifierModule.scala:10)
at com.fasterxml.jackson.module.scala.OpDefaultScalaModule.<init>(OpDefaultScalaModule.scala:28)
at com.fasterxml.jackson.module.scala.OpDefaultScalaModule$.<init>(OpDefaultScalaModule.scala:58)
at com.fasterxml.jackson.module.scala.OpDefaultScalaModule$.<clinit>(OpDefaultScalaModule.scala)
at com.salesforce.op.utils.json.JsonUtils$.configureMapper(JsonUtils.scala:159)
at com.salesforce.op.utils.json.JsonUtils$.com$salesforce$op$utils$json$JsonUtils$$jsonMapper(JsonUtils.scala:133)
at com.salesforce.op.utils.json.JsonUtils$.toJsonString(JsonUtils.scala:97)
at com.salesforce.op.utils.json.JsonLike$class.toJson(JsonUtils.scala:179)
at com.salesforce.op.evaluators.BinaryClassificationMetrics.toJson(OpBinaryClassificationEvaluator.scala:179)
at com.salesforce.op.utils.json.JsonLike$class.toString(JsonUtils.scala:186)
at com.salesforce.op.evaluators.BinaryClassificationMetrics.toString(OpBinaryClassificationEvaluator.scala:179)
at com.salesforce.op.evaluators.OpBinaryClassificationEvaluator.evaluateAll(OpBinaryClassificationEvaluator.scala:120)
at com.salesforce.op.evaluators.OpBinaryClassificationEvaluator.evaluateAll(OpBinaryClassificationEvaluator.scala:56)
at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:94)
at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:91)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296)
at com.salesforce.op.stages.impl.selector.HasEval$class.evaluate(ModelSelectorNames.scala:91)
at com.salesforce.op.stages.impl.selector.ModelSelector.evaluate(ModelSelector.scala:71)
at com.salesforce.op.stages.impl.selector.ModelSelector.fit(ModelSelector.scala:166)
at com.salesforce.op.stages.impl.selector.ModelSelector.fit(ModelSelector.scala:71)
at com.salesforce.op.utils.stages.FitStagesUtil$$anonfun$20.apply(FitStagesUtil.scala:264)
at com.salesforce.op.utils.stages.FitStagesUtil$$anonfun$20.apply(FitStagesUtil.scala:263)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at com.salesforce.op.utils.stages.FitStagesUtil$.com$salesforce$op$utils$stages$FitStagesUtil$$fitAndTransformLayer(FitStagesUtil.scala:263)
...
from transmogrifai.
So transmogrifai is built on spark '2.4.5'. And the best way to deal with this is to try explicitly excluding the dependency you dont want to pull in.
from transmogrifai.
Related Issues (20)
- Did the documentation site's domain name expire? HOT 2
- cannot be cast to [Lcom.salesforce.op.stages.impl.feature.TextStats; HOT 5
- Model saving and loading behavior changed since #475 HOT 1
- MultiClassClassificationModelsToTry and BinaryClassificationModelsToTry not contains OpMultilayerPerceptronClassifier HOT 2
- Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String at com.salesforce.op.features.types.FeatureTypeSparkConverter$$anonfun$2.apply(FeatureTypeSparkConverter.scala:146) HOT 9
- Testing something HOT 1
- Unnecessary codec factory initialization in readAsString HOT 1
- Release drafter
- UV Computation HOT 2
- Normalize special characters in string
- CDH 6.3.2 not worked,throw NoClassDefFoundError( com.fasterxml.jackson.module.scala.modifiers.EitherModule) HOT 3
- Failed to run titanic example, got java.lang.AbstractMethodError HOT 2
- build fails on AArch64, Fedora 33 HOT 1
- Changing imputation for nulls in DateToUnitCircleTransformer
- Make RecordInsightsLOCO perform reasonable calculation on numeric features and fix the name to reflect actual calculation. HOT 1
- The effect of random seeds on results ? HOT 5
- Migrating Documentation Page to Docusaurus 2
- Two cache miss case
- āšā¸ā¸´ā¸
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ī¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transmogrifai.