Giter Club home page Giter Club logo

fetch's Introduction

Fetch

Join the chat at https://gitter.im/47deg/fetch Maven Central License Latest version Scala.js GitHub Issues

A library for Simple & Efficient data access in Scala and Scala.js


Installation

Add the following dependency to your project's build file.

For Scala 2.12.x through 3.x:

"com.47deg" %% "fetch" % "3.1.2"

Or, if using Scala.js (1.8.x):

"com.47deg" %%% "fetch" % "3.1.2"

Remote data

Fetch is a library for making access to data both simple and efficient. Fetch is especially useful when querying data that has a latency cost, such as databases or web services.

Define your data sources

To tell Fetch how to get the data you want, you must implement the DataSource typeclass. Data sources have fetch and batch methods that define how to fetch such a piece of data.

Data Sources take two type parameters:

  1. Identity is a type that has enough information to fetch the data
  2. Result is the type of data we want to fetch
import cats.data.NonEmptyList
import cats.effect.Concurrent

trait DataSource[F[_], Identity, Result]{
  def data: Data[Identity, Result]
  def CF: Concurrent[F]
  def fetch(id: Identity): F[Option[Result]]
  def batch(ids: NonEmptyList[Identity]): F[Map[Identity, Result]]
}

Returning Concurrent instances from the fetch methods allows us to specify if the fetch must run synchronously or asynchronously, and use all the goodies available in cats and cats-effect.

We'll implement a dummy data source that can convert integers to strings. For convenience, we define a fetchString function that lifts identities (Int in our dummy data source) to a Fetch.

import cats._
import cats.data.NonEmptyList
import cats.effect._
import cats.implicits._

import fetch._

def latency[F[_] : Sync](milis: Long): F[Unit] =
  Sync[F].delay(Thread.sleep(milis))

object ToString extends Data[Int, String] {
  def name = "To String"

  def source[F[_] : Async]: DataSource[F, Int, String] = new DataSource[F, Int, String]{
    override def data = ToString

    override def CF = Concurrent[F]

    override def fetch(id: Int): F[Option[String]] = for {
      _ <- CF.delay(println(s"--> [${Thread.currentThread.getId}] One ToString $id"))
      _ <- latency(100)
      _ <- CF.delay(println(s"<-- [${Thread.currentThread.getId}] One ToString $id"))
    } yield Option(id.toString)

    override def batch(ids: NonEmptyList[Int]): F[Map[Int, String]] = for {
      _ <- CF.delay(println(s"--> [${Thread.currentThread.getId}] Batch ToString $ids"))
      _ <- latency(100)
      _ <- CF.delay(println(s"<-- [${Thread.currentThread.getId}] Batch ToString $ids"))
    } yield ids.toList.map(i => (i, i.toString)).toMap
  }
}

def fetchString[F[_] : Async](n: Int): Fetch[F, String] =
  Fetch(n, ToString.source)

Creating a runtime

Since we'll use IO from the cats-effect library to execute our fetches, we'll need an IORuntime for executing our IO instances.

import cats.effect.unsafe.implicits.global //Gives us an IORuntime in places it is normally not provided

Normally, in your applications, this is provided by IOApp, and you should not need to import this except in limited scenarios such as test environments that do not have Cats Effect integration. For more information, and particularly on why you would usually not want to make one of these yourself, see this post by Daniel Spiewak

Creating and running a fetch

Now that we can convert Int values to Fetch[F, String], let's try creating a fetch.

def fetchOne[F[_] : Async]: Fetch[F, String] =
  fetchString(1)

Let's run it and wait for the fetch to complete. We'll use IO#unsafeRunTimed for testing purposes, which will run an IO[A] to Option[A] and return None if it didn't complete in time:

import scala.concurrent.duration._

Fetch.run[IO](fetchOne).unsafeRunTimed(5.seconds)
// --> [173] One ToString 1
// <-- [173] One ToString 1
// res0: Option[String] = Some(value = "1")

As you can see in the previous example, the ToStringSource is queried once to get the value of 1.

Batching

Multiple fetches to the same data source are automatically batched. For illustrating this, we are going to compose three independent fetch results as a tuple.

def fetchThree[F[_] : Async]: Fetch[F, (String, String, String)] =
  (fetchString(1), fetchString(2), fetchString(3)).tupled

When executing the above fetch, note how the three identities get batched, and the data source is only queried once.

Fetch.run[IO](fetchThree).unsafeRunTimed(5.seconds)
// --> [172] Batch ToString NonEmptyList(1, 2, 3)
// <-- [172] Batch ToString NonEmptyList(1, 2, 3)
// res1: Option[(String, String, String)] = Some(value = ("1", "2", "3"))

Note that the DataSource#batch method is not mandatory. It will be implemented in terms of DataSource#fetch if you don't provide an implementation.

object UnbatchedToString extends Data[Int, String] {
  def name = "Unbatched to string"

  def source[F[_]: Async] = new DataSource[F, Int, String] {
    override def data = UnbatchedToString

    override def CF = Concurrent[F]

    override def fetch(id: Int): F[Option[String]] = 
      CF.delay(println(s"--> [${Thread.currentThread.getId}] One UnbatchedToString $id")) >>
      latency(100) >>
      CF.delay(println(s"<-- [${Thread.currentThread.getId}] One UnbatchedToString $id")) >>
      CF.pure(Option(id.toString))
  }
}

def unbatchedString[F[_]: Async](n: Int): Fetch[F, String] =
  Fetch(n, UnbatchedToString.source)

Let's create a tuple of unbatched string requests.

def fetchUnbatchedThree[F[_] : Async]: Fetch[F, (String, String, String)] =
  (unbatchedString(1), unbatchedString(2), unbatchedString(3)).tupled

When executing the above fetch, note how the three identities get requested in parallel. You can override batch to execute queries sequentially if you need to.

Fetch.run[IO](fetchUnbatchedThree).unsafeRunTimed(5.seconds)
// --> [172] One UnbatchedToString 1
// --> [173] One UnbatchedToString 2
// <-- [172] One UnbatchedToString 1
// --> [172] One UnbatchedToString 3
// <-- [173] One UnbatchedToString 2
// <-- [172] One UnbatchedToString 3
// res2: Option[(String, String, String)] = Some(value = ("1", "2", "3"))

Parallelism

If we combine two independent fetches from different data sources, the fetches can be run in parallel. First, let's add a data source that fetches a string's size.

object Length extends Data[String, Int] {
  def name = "Length"

  def source[F[_] : Async] = new DataSource[F, String, Int] {
    override def data = Length

    override def CF = Concurrent[F]

    override def fetch(id: String): F[Option[Int]] = for {
      _ <- CF.delay(println(s"--> [${Thread.currentThread.getId}] One Length $id"))
      _ <- latency(100)
      _ <- CF.delay(println(s"<-- [${Thread.currentThread.getId}] One Length $id"))
    } yield Option(id.size)

    override def batch(ids: NonEmptyList[String]): F[Map[String, Int]] = for {
      _ <- CF.delay(println(s"--> [${Thread.currentThread.getId}] Batch Length $ids"))
      _ <- latency(100)
      _ <- CF.delay(println(s"<-- [${Thread.currentThread.getId}] Batch Length $ids"))
    } yield ids.toList.map(i => (i, i.size)).toMap
  }
}

def fetchLength[F[_] : Async](s: String): Fetch[F, Int] =
  Fetch(s, Length.source)

And now we can easily receive data from the two sources in a single fetch.

def fetchMulti[F[_] : Async]: Fetch[F, (String, Int)] =
  (fetchString(1), fetchLength("one")).tupled

Note how the two independent data fetches run in parallel, minimizing the latency cost of querying the two data sources.

Fetch.run[IO](fetchMulti).unsafeRunTimed(5.seconds)
// --> [173] One Length one
// --> [172] One ToString 1
// <-- [172] One ToString 1
// <-- [173] One Length one
// res3: Option[(String, Int)] = Some(value = ("1", 3))

Deduplication & Caching

The Fetch library supports deduplication and optional caching. By default, fetches that are chained together will share the same cache backend, providing some deduplication.

When fetching an identity twice within the same Fetch, such as a batch of fetches or when you flatMap one fetch into another, subsequent fetches for the same identity are cached. Let's try creating a fetch that asks for the same identity twice, by using flatMap (in a for-comprehension) to chain the requests together:

def fetchTwice[F[_] : Async]: Fetch[F, (String, String)] = for {
  one <- fetchString(1)
  two <- fetchString(1)
} yield (one, two)

While running it, notice that the data source is only queried once. The next time the identity is requested, it's served from the internal cache.

val runFetchTwice = Fetch.run[IO](fetchTwice)
runFetchTwice.unsafeRunTimed(5.seconds)
// --> [173] One ToString 1
// <-- [173] One ToString 1
// res4: Option[(String, String)] = Some(value = ("1", "1"))

This will still fetch the data again, however, if we call it once more:

runFetchTwice.unsafeRunTimed(5.seconds)
// --> [172] One ToString 1
// <-- [172] One ToString 1
// res5: Option[(String, String)] = Some(value = ("1", "1"))

If we want to cache between multiple individual fetches, you should use Fetch.runCache or Fetch.runAll to return the cache for reusing later. Here is an example where we fetch four separate times, and explicitly share the cache to keep the deduplication functionality:

//We get the cache from the first run and pass it to all subsequent fetches
val runFetchFourTimesSharedCache = for {
  (cache, one) <- Fetch.runCache[IO](fetchString(1))
  two <- Fetch.run[IO](fetchString(1), cache)
  three <- Fetch.run[IO](fetchString(1), cache)
  four <- Fetch.run[IO](fetchString(1), cache)
} yield (one, two, three, four)
runFetchFourTimesSharedCache.unsafeRunTimed(5.seconds)
// --> [173] One ToString 1
// <-- [173] One ToString 1
// res6: Option[(String, String, String, String)] = Some(
//   value = ("1", "1", "1", "1")
// )

As you can see above, the cache will now work between calls and can be used to deduplicate requests over a period of time. Note that this does not support any kind of automatic cache invalidation, so you will need to keep track of which values you want to re-fetch if you plan on sharing the cache.

For more in-depth information, take a look at our documentation.

Copyright

Fetch is designed and developed by 47 Degrees

Copyright (C) 2016-2023 47 Degrees. http://47deg.com

fetch's People

Contributors

47degdev avatar 47erbot avatar alejandrohdezma avatar antoniomateogomez avatar benfradet avatar calvellido avatar cb372 avatar daenyth avatar davesmith00047 avatar diesalbla avatar fedefernandez avatar franciscodr avatar gatorcse avatar github-actions[bot] avatar gitter-badger avatar israelperezglez avatar jkmcclellan avatar jordiolivares avatar juanpedromoreno avatar kubukoz avatar lambdista avatar maureenelsberry avatar paulpdaniels avatar pepegar avatar peterneyens avatar purrgrammer avatar raulraja avatar scala-steward avatar sloshy avatar williamho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fetch's Issues

Provide examples of usage with other libraries in the documentation

It'd be interesting to write a few tutorials about using Fetch with libraries for reading data (from databases, HTTP services and beyond) that fit well with Fetch. A few come to mind, feel free to add more:

  • Doobie for DB access (transacting to Eval makes it really easy to integrate)
  • GitHub4s for accesing the GitHub API

Don't cast when we can use the type system better

Due to my lack of familiarity with Scala's type system I've used casts in a few places. I'm not sure which ones we can avoid but I'd be desirable to use the type system better instead of casting and losing type safety to a certain degree.

Scalaz integration

  • FetchMonadError[Task] and other implicits
  • Applicative and Monad instances for Fetch
  • Applicative instance for Task with a non-sequential ap

SOE with deep stack.

A user has privately reported the following exception when running fetch over a large workflow.
The issue seems to be in the inspection step.

 java.lang.StackOverflowError
                at scala.Tuple2.productElement(Tuple2.scala:20)
                at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:64)
                at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:211)
                at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:168)
                at scala.Tuple2.hashCode(Tuple2.scala:20)
                at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:206)
                at scala.collection.immutable.HashMap.elemHashCode(HashMap.scala:80)
                at scala.collection.immutable.HashMap.computeHash(HashMap.scala:89)
                at scala.collection.immutable.HashMap.get(HashMap.scala:54)
                at fetch.InMemoryCache.get(cache.scala:41)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:380)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:385)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:377)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:51)
                at cats.free.FreeTopExt$.modify(freeinspect.scala:52)
                at fetch.FetchInterpreters$$anon$4.apply(interpreters.scala:386)

Update to latest cats and monix

Now that we have cats 0.7.2 and monix final 2.0.0 released. Please consider upgrade the dependencies and publish a stable version. I tried it myself but got stuck on all the tailRecM stuff on the new cats Monad.

More flexible error handling

Currently, for running a fetch to a target monad M[_], we need an instance of MonadError[M, Throwable]. Should we leave the error type open to user-defined types instead of hard-coding Throwable?

Improve and document the environment information gathering

We need to store accurate information about the execution plan of a fetch. Which operations where performed (fetch one, fetch many, concurrent fetch), whether the data was served from the cache or not, the rounds of execution (steps in the compuation that read data) performed and so on. We do this to some extent now but is not very useful in its current state. We should document what you can do with the environment and improve the docs about diagnosing fetch failures, which an improved reporting will make easier. Will also ease the implementation of #11.

Create PDF documentation

PDF documentation would be useful when there's no internet connection.

I've created some scripts which work well enough for cats, dogs and fetch, as shown below:
https://github.com/frgomes/debian-bin/blob/master/bash_30pdf.sh
https://github.com/frgomes/debian-bin/blob/master/bash_30httrack.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_cats.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_dogs.sh
https://github.com/frgomes/debian-bin/blob/master/bash_31makepdf_fetch.sh

I'm not suggesting you guys adopt these scripts.
However, anyone willing to do that may eventually find some pointers here.

Pass state to the data sources in a type-safe way

Currently, DataSource#fetch looks like this:

trait DataSource[Identity, Result] {
  def fetch(ids: NonEmptyList[Identity]): Query[Map[Identity, Result]]
}

The fetch method of a data source just receives a non empty list of identities. If we want to inyect some state to our data sources (for example an HTTP client to the data sources that make HTTP calls, a connection pool to the data sources that query a database, and so on) we must use the mechanisms that Scala gives us (implicits et al) since it's not directly supported by the library.

It may make sense to support passing state in a type-safe way to the data sources and provide the concrete values when running a Fetch, much like Haxl does. Not sure about how it'd look like yet, but when running a fetch we'd have to provide an additional value with the inyected state. Can the type system make sure that we are providing the state for every data source used inside a fetch? Should this be supported by the libary?

Make README.md a mirror of the documentation

The current README just points to the docs with a link but users already on Github could careless about jumping extra hoops. Also the file won't be indexed by Scaladex or crawlers that use the README as the doc defacto standard. Ideally post tut processing the file copied inside the jekyll site would also replace the README.md at the root and will auto generate an index to be placed a top the file to jump to sections.

Make sure that the splitted batches are run concurrently

Right now, when a data source configures a maximum batch size and a request to such data source in batch is splitted in multiple batches, the queries are run sequentially. We want to perform all the batch requests to the data source at the same time, not sequentially.

Acknowledgments

I'm not sure I'm correct, but this lib seems to be inspired by similar projects like Clump and Stitch. Could you please add an "Acknowledgments" section to the documentation mentioning it?

BTW, the generic approach for any monad and the typeclasses for sources are very nice! ๐Ÿ‘

Thanks! :)

Support data sources that can only be queried asynchronously

Since currently data sources assume you can return a computation that synchronously gives you the result, the usefulness of the library is very limited in Scala.js. It would be desirable to support both synchronous and asynchronous data sources, maybe using monix-eval's Task type instead of Eval in data source's fetch methods? Task is more general than Eval, although we'd lose the ability to run fetches synchronously.

Docs out of sync

The current published documentation is out of sync and does not match the API as it stands in the latest release. It also points to an old cats version where several combinators and imports seem to have changed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.