Giter Club home page Giter Club logo

high-performance-spark-examples's People

Contributors

holdenk avatar jiminhsieh avatar maddatascience avatar mahmoudhanafy avatar rachelwarren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

high-performance-spark-examples's Issues

Project failing to build

Could be my own unfamiliarity with Scala, but I'm currently unable to build this project as-is using SBT. I'm getting the stack trace pasted below from IntelliJ (I get something similar when running 'sbt compile' from the CLI).

SBT 'high-performance-spark-examples' project refresh failed
    Error:Error:Error while importing SBT project:<br/>...<br/><pre>[info] Resolving org.scala-sbt#apply-macro;0.13.13 ...
[info] Resolving org.spire-math#json4s-support_2.10;0.6.0 ...
[info] Resolving org.codehaus.plexus#plexus-component-annotations;1.5.5 ...
[info] Resolving javax.annotation#jsr250-api;1.0 ...
[info] Resolving com.thoughtworks.paranamer#paranamer;2.6 ...
[info] Resolving com.typesafe#config;1.2.0 ...
[info] Resolving org.scala-sbt#test-agent;0.13.13 ...
[info] Resolving org.scala-sbt#classfile;0.13.13 ...
[info] Resolving org.scala-sbt#completion;0.13.13 ...
[info] Resolving org.scala-sbt#test-interface;1.0 ...
[info] Resolving com.jcraft#jsch;0.1.50 ...
[info] Resolving org.scala-lang#scala-compiler;2.10.6 ...
[info] Resolving org.scala-sbt#interface;0.13.13 ...
[info] Resolving javax.inject#javax.inject;1 ...
[info] Resolving org.scala-sbt#logging;0.13.13 ...
[trace] Stack trace suppressed: run 'last *:ssExtractDependencies' for the full output.
[trace] Stack trace suppressed: run 'last *:update' for the full output.
[error] (*:ssExtractDependencies) sbt.ResolveException: download failed: org.mortbay.jetty#jetty;6.1.26!jetty.zip
[error] (*:update) sbt.ResolveException: download failed: org.mortbay.jetty#jetty;6.1.26!jetty.zip
[error] Total time: 19 s, completed Jul 10, 2017 10:47:35 AM</pre><br/>See complete log in <a href="file:/Users/adam/Library/Logs/IdeaIC2017.1/sbt.last.log">file:/Users/adam/Library/Logs/IdeaIC2017.1/sbt.last.log</a>

QuantileOnlyArtisanalTest test("Secondary Sort") error

val r    = SecondarySort.groupByKeyAndSortBySecondaryKey(data, 3)
val rSorted = r.collect().sortWith(lt = (a, b) => a._1.toDouble > b._1.toDouble)
    assert(r.collect().zipWithIndex.forall {
      case (((key, list), index)) => rSorted(index)._1.equals(key)
    })

Actually r is not ordered, so it's not correct to compare r with rSorted

port to SBT 1.*

it's difficult to work in intellij with sbt 0.13(it's a long time to "dump project structure from sbt"). plugin like sbt-spark-package seems not work in sbt 1.* .

The feedback on a code bug at /goldilocks/GoldilocksSecondarySort.scala

where does the val list head :: rest appear from in the following function ?

def groupSorted[K,S,V]( it : Iterator[((K, S), V)] ) : Iterator[(K, List[(S, V)])] = {
        val res = List[ (K, ArrayBuffer[(S, V)]) ]()
        it.foldLeft(res)(
            (list, next) => list match {
                case Nil => val ((firstKey, secondKey), value) = next
                            List((firstKey, ArrayBuffer((secondKey, value))))
                case head :: rest => 
                       val (curKey, valueBuf) = head
                       val ((firstKey, secondKey), value) = next
                       if (!firstKey.equals(curKey) ) {
                            (firstKey, ArrayBuffer((secondKey, value))) :: list
          	        } else {
                            valueBuf.append((secondKey, value))
                            list
                        }
      	    }
        ).map { case (key, buf) => (key, buf.toList) }.iterator
  }

Failed to find a default value for inputCol

There seems to be a problem with https://github.com/high-performance-spark/high-performance-spark-examples/blob/master/src/main/scala/com/high-performance-spark-examples/ml/CustomPipeline.scala#L125. For example, if we try testing it thus

    val indexer = new SimpleIndexer()
    indexer.setInputCol("inputColumn")
    indexer.setOutputCol("categoryIndex")
    val model = indexer.fit(ds)
    val predicted = model.transform(ds)

the indexer has the inputCol set, but the model (that is, object that's returned by indexer.fit) does not. So when we try model.transform, it complains that it "Failed to find a default value for inputCol".

Following https://stackoverflow.com/questions/40847625/spark-custom-estimator-access-to-paramt, @conorbmurphy and I were able to "solve" the problem by hard coding the column names within the class with

setDefault(inputCol, "inputColumn")
setDefault(outputCol, "categoryIndex")

but that's hardly the right solution. What should happen is the paramMap should be copied into the SimpleIndexerModel when it's generated in the fit method, but we can't figure out how to do that since paramMap is protected.

example 6-8 in SecondarySort.scala error

object CoPartitioningLessons corresponding to book sample Example 6-8 Example 6-9. Both functions use two different Partitioner to show coLocate and copartition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.