Giter Club home page Giter Club logo

Comments (8)

nicodv avatar nicodv commented on June 7, 2024 1

You can use the .computeDataUpTo method on OpWorkflow (instead of .train) for this.

from transmogrifai.

krzischp avatar krzischp commented on June 7, 2024

Thanks for replying quickly!
So I wrote

val df = new OpWorkflow()
  .setInputDataset(passengersData)
  .setResultFeatures(pred)
  .computeDataUpTo(checkedFeatures)

But as I get a dataframe instead of a model (in the previous code), I cannot access the feature importances nor the statistics that I would access using

model.modelInsights(pred)

or

val metadata = fittedWorkflow.getOriginStageOf(checkedFeatures).getMetadata()
val summaryData = SanityCheckerSummary.fromMetadata(metadata.getSummaryMetadata())

Is there a way to get those informations (selected features + correlation statistics, etc) with no model training?

from transmogrifai.

nicodv avatar nicodv commented on June 7, 2024

I'm not sure this is possible. Going through ModelInsights is by far the easiest solution.

In order to reduce training times, you can override and reduce the grid of models and hyperparameters trained, as such:

val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.1)).build())
BinaryClassificationModelSelector.withCrossValidation(modelsAndParameters = models)

from transmogrifai.

leahmcguire avatar leahmcguire commented on June 7, 2024
// Automated feature validation and selection
val checkedFeatures = survived.sanityCheck(featureVector, removeBadFeatures = true)

// Setting up a TransmogrifAI workflow and training the model
val model = new OpWorkflow().setInputDataset(passengersData).setResultFeatures( checkedFeatures).train()

println("Model summary:\n" + model.modelInsights(checkedFeatures))
Should do it - basically the feature you pass in as a result will be the final one computed up to and you can get insights up to whatever level the DAG computes.

from transmogrifai.

leahmcguire avatar leahmcguire commented on June 7, 2024

Basically the workflow will compute the dag necessary to produce the feature passed in as a resultFeature. If you change the result feature to be the output of the sanityChecker it will only do those computations.

from transmogrifai.

leahmcguire avatar leahmcguire commented on June 7, 2024

Model Insights will work without a model run (the model part will just be empty. If you then want to add in a model using the already fit sanity checker you can do it like this:

https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/OpWorkflowTest.scala#L336

from transmogrifai.

krzischp avatar krzischp commented on June 7, 2024

Thank you, unfortunately, I'm confronted with another issue.

I had the same problem as the one met in this issue
#540
(I'm running on Cloudera with this Spark version: 2.4.0-cdh6.3.4)
So I tried to overwrite the Jackson module scala dependency with the last one that implements EitherModule (2.7.3):

My build.sbt file

name := "test-transmogrif"

version := "1.0"

scalaVersion := "2.11.12"

resolvers += Resolver.jcenterRepo

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.0" % "provided",
  "org.apache.spark" %% "spark-mllib" % "2.4.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided",
  "com.salesforce.transmogrifai" %% "transmogrifai-core" % "0.7.0",
  "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.7.3"
)

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}

My running script:

val lr = new OpLogisticRegression()
val models = Seq(lr -> new ParamGridBuilder().addGrid(lr.regParam, Array(0.1)).build())
val prediction = BinaryClassificationModelSelector.withTrainValidationSplit(modelsAndParameters = models).setInput(target, checkedFeatures).getOutput()
val workflow = new OpWorkflow().setInputDataset(dataReader).setResultFeatures(prediction)
val model = workflow.train()

My project/plugins.sbt file

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")

But I had this log when I ran spark-submit on the mounted jar (with sbt assembly) at the workflow.train() step:

Exception in thread "main" java.lang.IncompatibleClassChangeError: Class com.fasterxml.jackson.module.scala.OpDefaultScalaModule$ does not implement the requested interface com.fasterxml.jackson.module.scala.modifiers.SeqTypeModifierModule

        at com.fasterxml.jackson.module.scala.modifiers.SeqTypeModifierModule$class.$init$(SeqTypeModifierModule.scala:10)

        at com.fasterxml.jackson.module.scala.OpDefaultScalaModule.<init>(OpDefaultScalaModule.scala:28)

        at com.fasterxml.jackson.module.scala.OpDefaultScalaModule$.<init>(OpDefaultScalaModule.scala:58)

        at com.fasterxml.jackson.module.scala.OpDefaultScalaModule$.<clinit>(OpDefaultScalaModule.scala)

        at com.salesforce.op.utils.json.JsonUtils$.configureMapper(JsonUtils.scala:159)

        at com.salesforce.op.utils.json.JsonUtils$.com$salesforce$op$utils$json$JsonUtils$$jsonMapper(JsonUtils.scala:133)

        at com.salesforce.op.utils.json.JsonUtils$.toJsonString(JsonUtils.scala:97)

        at com.salesforce.op.utils.json.JsonLike$class.toJson(JsonUtils.scala:179)

        at com.salesforce.op.evaluators.BinaryClassificationMetrics.toJson(OpBinaryClassificationEvaluator.scala:179)

        at com.salesforce.op.utils.json.JsonLike$class.toString(JsonUtils.scala:186)

        at com.salesforce.op.evaluators.BinaryClassificationMetrics.toString(OpBinaryClassificationEvaluator.scala:179)

        at com.salesforce.op.evaluators.OpBinaryClassificationEvaluator.evaluateAll(OpBinaryClassificationEvaluator.scala:120)

        at com.salesforce.op.evaluators.OpBinaryClassificationEvaluator.evaluateAll(OpBinaryClassificationEvaluator.scala:56)

        at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:94)

        at com.salesforce.op.stages.impl.selector.HasEval$$anonfun$1.apply(ModelSelectorNames.scala:91)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

        at scala.collection.immutable.List.foreach(List.scala:392)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

        at scala.collection.immutable.List.map(List.scala:296)

        at com.salesforce.op.stages.impl.selector.HasEval$class.evaluate(ModelSelectorNames.scala:91)

        at com.salesforce.op.stages.impl.selector.ModelSelector.evaluate(ModelSelector.scala:71)

        at com.salesforce.op.stages.impl.selector.ModelSelector.fit(ModelSelector.scala:166)

        at com.salesforce.op.stages.impl.selector.ModelSelector.fit(ModelSelector.scala:71)

        at com.salesforce.op.utils.stages.FitStagesUtil$$anonfun$20.apply(FitStagesUtil.scala:264)

        at com.salesforce.op.utils.stages.FitStagesUtil$$anonfun$20.apply(FitStagesUtil.scala:263)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)

        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)

        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)

        at com.salesforce.op.utils.stages.FitStagesUtil$.com$salesforce$op$utils$stages$FitStagesUtil$$fitAndTransformLayer(FitStagesUtil.scala:263)
...

from transmogrifai.

leahmcguire avatar leahmcguire commented on June 7, 2024

So transmogrifai is built on spark '2.4.5'. And the best way to deal with this is to try explicitly excluding the dependency you dont want to pull in.

from transmogrifai.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.