links

Just a bunch of useful links

Scala

Scala Design Patterns - great stuff, how you do (or don't) traditional Java / OOP patterns in Scala
The Human Side of Scala - great post on styling Scala for readability
Sneaking Scala Through the Back Door - how to promote Scala in an organization
Effective Scala - Twitter's guide to writing good Scala code
Between Zero & Hero - tips and tricks for the intermediate Scala developer
Type of Types - an unfinished tutorial on the Scala type system
Monads are not Metaphors - a great explanation of monads
Important compiler flags
Recursive Types - signatures like class Foo[T <: Foo[T]], useful for inheritance and proper return types. Tho if you hit this, there are probably better ways of solving the problem, ie via composition.

Serialization

Simple Binary Encoding - supposedly 20-50x faster than Google Protobuf !!
Comparison of Cap'n Proto, SBE, FlatBuffers from the Cap'n Proto people
Jawn - @d3's new fast JSON parser, parses to multiple ASTs including rojoma-json, spray-json, argonaut
Extracting case class param names using Macros
Fast-Serialization - a drop in replacement for Java Serialization but much faster

Concurrency, Actors

CKite - Raft Scala implementation, Finagle, MapDB etc.
Wake - A Java event-driven framework from Microsoft (!)
Dirigiste - dynamic scalable / smarter Threadpools
Scala-gopher - a #golang-style CSP / channels implementation for Scala. Other niceties: defer()
Retry for futures. Also, SafeFuture CancellableFuture etc - very useful
Execute Futures serially - in nonblocking fashion
Scala.Rx - "Reactive variables" - smart variables who auto-update themselves when the values they depend on change
Monifu - a nice set of wrappers around j.u.c.Atomic, as well as super-lightweight cancellable tasks and futures utilities. Accompanying blog post.
Kamon - great looking Actor monitoring using bytecode weaving? no code change required.
Actor Provisioning pattern - if you have a long, failure-prone initialization procedure for an actor, this trait splits out the work, to say another actor and dispatcher
Akka cluster ordered provisioning and shutdown
Running an Akka cluster with Docker Containers
Ask, Tell, and Per-Request Actors - why one company moved from Ask/Futures to per-request
Dos and Donts deploying Akka in Production - an excellent read, full of advice even for non-Akka JVM apps

Async Database Libs

Asyncpools - Akka-based async connection pool for Slick. Akka 2.2 / Scala 2.10.
Postgresql-Async - Netty-based async drivers for PostgreSQL and MySQL

Caching

Cacheable - a clever memoization / caching library (with Guava, Redis, Memcached or EHCache backends) using Scala 2.10 macros to remember function parameters

Big Data Processing

Great list of Big Data Projects
Debasish G's list of streaming papers and algorithms - esp stuff on CountMinSketch and HyperLogLog
Summingbird - For any dataset that can be aggregated using a monoid, promises to unify Storm, Hadoop, and in the future, Akka and Spark with a single DSL. Also has a neat library of monoids built in.
Making Zookeeper Resilient, an excellent blog post from Pinterest
Fast SQL Query Parser in Scala - based on the Scala-LMS project, compiles a query down to C!
Probability Monad - super useful for stats or random data generation
stringmetric - Approximate string matching and phonetic algorithms
Factorie - a Scala library for Natural Language Processing

Spark

Jaws - Spark SQL REST server, includes query cancellation, logs, load balancing. Based originally on my own spark-jobserver
Supplemental Spark Projects - lots of other interesting projects, including IPython notebooks, dataframe stuff, stream + historical data processing, and more.

Infrastructure

Elastic Mesos - create Mesos clusters on AWS with ZK, HDFS

Geospatial and Graph

GeoTrellis - distributed raster processing, adding Vector/geom support, Akka Cluster and Spark implementations!
Spatial framework for Hadoop - PostGIS-like operators / UDFs for Hive. We want this for Spark!
trails - parser combinators for graph traversal. Supports Tinker/Blueprints/Neo4j APIs.
scala-graph - in-memory graph API based on scala collections. Work in progress.

Collections, Numeric Processing, Fast Loops

Breeze, Spire, and Saddle - Scala numeric libraries
- spire-ops - a set of macros for no-overhead implicit operator enrichment
ScalaXY - collection of macros for performant for loops, extension methods etc
Squants - The Scala API for Quantities, Units of Measure and Dimensional Analysis
FastTuple - a dynamic (runtime-defined) C-style struct library, with support for off-heap storage. Would work really well for in-memory queries.
- and the excellent blog covers all of the on- and off-heap access and allocation patterns on the JVM very thoroughly.
Unboxing, Runtime Specialization - a cool post on how to do really fast aggregations using unboxed integers
product-collections - useful library for working with collections of tuples
SuperFastHash - also see Murmur3

Big Data Storage

Phantom - Scala DSL for Cassandra, supports CQL3 collections, CQL generation from data models, async API based on Datastax driver
Athena - Asynchronous Cassandra client built on Akka-IO
CCM - easily build local Cassandra clusters for testing!
Stubbed Cassandra - super useful for testing C* apps
Pithos - an S3-API-compatible object store for Cassandra
Sirius - Akka-based in-memory fast key-value store for JVM objects, with Paxos consistency, persistence/txn logs, HA recovery
Storehaus - Twitter's key-value wrapper around Redis, MySql, and other stores. Has a neat merge() functionality for aggregation of values, lists, etc.
MapDB - Not a database, but rather a database engine with tunable consistency / ACIDness; support for off-heap memory; fast performance; indexing and other features.
HPaste - a nice Scala client for HBase
OctopusDB paper - interesting idea of using a WAL of RDF triples as the primary storage, with secondary views of row or column orientation

Web / REST / General

Scalaj-http - really simple REST API. Although, the latest Spray-client has been vastly simplified as well.
REPL as a service - would be kick ass if integrated into Spark
IScala - Scala backend for IPython. Looks promising. There is also Scala Notebook but it's more of a research project.
Scaposer - i18n / .po file library
Adding Reflection to Scala Macros - example of using reflection in an annotation macro to add automatic ByteBuffer serialization to case classes :)
Scaldi - A lightweight dependency injection library, with Akka integration
How to use Typesafe Config across multiple environments
lamma.io - the easiest date generation library
Pimpathon - a set of useful pimp-my-library extensions
Scala-rainbow - super simple terminal color output, easier than Console.XXX

Build, Tooling

Run Scala scripts with dependencies - ie you don't need a project file
sbt-assembly 0.10.2 supports adding a shell script to your jar to make it executable! No more "java ...." to start your Scala program, and no more ps ax | grep java | grep ....
Other useful SBT plugins - sbt-sonatype, sbt-pom-reader, sbt-sound, plugins page
SCoverage - statement coverage tool, much more useful than line-based or branch-based tools. Has SBT plugin. Blog post on why it's an improvement.
sbt-jmh - Plugin for running SBT projects with the JMH profiling tool
SBT updates - Tool for discovering updated versions of SBT dependencies
Thyme and Parsley - microbenchmarking and profiling tools, seems useful
ScalaStyle - Scala style checker / linter
scala-abide - an official linter from Typesafe
utest - a small micro test framework
lions share - a neat JVM heap and GC analysis tool, with charts and SBT integration.

SBuild seems like a promising replacement for SBT. Still Scala, but much much simpler, more like Scala version of Make. With MVN dependency and ScalaTest support.

JVM Other

Quick dumping your JVM heap using GDB -- too bad it doesn't work on OSX.
jHiccup -- "Hiccup" or GC pause analysis tool
Bintray - friendlier alternative to Sonatype OSS / Maven central. Also see bintray-sbt plugin.

Monitoring

Grafana and Graphene - great replacement UIs for the clunky default Graphite UI

Databases

Indexing and OLAP

Adaptive Radix Trees - cache friendly indexing for in-memory databases
Quotient Cubes - semantic grouping and rollup algorithm for OLAP cubes. Ruby implementation.
Top K queries and cubes
Scalable In-memory Aggregation - column-oriented, in memory with bitmap indexing and memoization

ML and Data Science

LearnDS - A set of IPython notebooks for learning data science

Distributed Systems

Raft Visualization - great 5-min visualization of the distributed consensus protocol

Sublime Text

I love Sublime and use it for everything, even Scala! Going to put my Sublime stuff in a separate page.

Best Practices and Design

Semver - Semantic versioning, how to deal with dev workflows and corner cases -- a must read
Pragmatic RESTful API Design - really good stuff
Blameless Post-Mortems - why they are crucial to good culture
GitHub Flow - how github.com does continuous deploys, uses pull requests for an automated, process-free development workflow. Some gems include naming branches descriptively and using github.com to browse the work currently in progress by looking at active branches.
Pull Requests and other good Github Practices

Other Random Stuff

A list of great docs
JQ - JSON processor for the shell. Super useful with RESTful servers.
Underscore-CLI - a Node-JS based command line JSON parser
MacroPy - Scala-like macros, case classes, pattern matching, parser combos for Python (!!)
Scala 2.11 vs Swift - Apple's new iOS language is often compared to Scala.
Rust By Example - also the guide on their site is pretty good.
Real World OCaml
Gherkin - a Lisp implemented in bash !!
Nimrod - a neat, compile-straight-to-binary, static systems language with beautiful Python-like syntax, union types, generics, macros, first-class functions. What Go should have been.
Bret Victor - A set of excellent essays and talks from a great visual designer

ahjohannessen / links Goto Github PK

links's Introduction