scala-graphql-caliban-workshop

references

preface

goals of this workshop
- introduction into GraphQL
  - motivation
  - schema
  - query, mutation, subscription
  - data loaders
workshop plan
- task1: implement query for returning all customers (with a size limit)
- task2: implement subscription for deleted customers
- task3: implement getting orders in batch

introduction

BFF - backend for frontend API
- different clients need different sets of data
  - Web, Iphone, Android, Tv
- instead of the frontend application aggregating data by calling multiple sources - create a BFF layer
  - layer does the following:
    - receive request from the client application
    - call multiple backend services
    - format and response according to what is needed by the client
    - respond to the client
- pros
  - simplify the frontend logic
  - avoid over-fetching or under-fetching
  - reduce the number of network calls from the client perspective
real world data representation
- best way: graph-like data structure
  - data model is usually a graph of objects with relations between them
- why think of data in terms of resources (in URLs) or tables?

graphql

GraphQL = Graph Query Language
is a new API standard that provides
- more efficient,
- powerful
- flexible
- alternative to REST
can be written in any programming language
has two major parts
- structure = strongly typed schema
  - schema = graph of fields that have types
    - all possible data objects that can be read (or updated)
  - client uses the schema to know what are the capabilities of API
- behavior = resolver functions
  - each field in a GraphQL schema is backed by a resolver function
  - resolver function defines what data to fetch for its field
  - resolver function represents the instructions on how and where to access raw data
takes the custom endpoint idea to an extreme
- the whole server = single smart endpoint
- multiple round-trip problem
  - client has to communicate with the server multiple times to gather all data
  - API server knows how to answer questions about a single resource
  - solution: GraphQL
5 key characteristics
1. hierarchical: queries = hierarchies of data definitions
2. view-centric: built to satisfy frontend requirements
3. strongly-typed: typed context (schema) + queries are executed within this context
4. introspective: type system (schema) itself is queryable
5. version-free: tools for the continuous evolution
vs REST
- don't need any documentation like swagger
- REST = over-fetching (to many fields) and under-fetching (many calls to get data you need)
  - client cannot specify which fields to select
- REST: each endpoint represents a resource
  - multiple network requests
- REST: problem with versioning
  - usually means new endpoints
  - GraphQL: add new fields and types without removing the old ones
    - clients can continue to use older features
      - can also incrementally update their code to use new features
    - important for mobile clients
      - you cannot control the version of the API they are using
    - GraphQL offers a way to deprecate (and hide)
- REST: turn into a mix of regular REST endpoints plus custom endpoints crafted for performance reasons
summary
- pros
  - decouples clients from servers
    - allows both of them to evolve and scale independently
  - efficiency
  - simplify client logic: client communicate with the GraphQL service
    - GraphQL service communicates with the different services
- cons
  - malicious queries
    - complex queries
      - solution: analyze AST complexity
    - large result size
      - solution: limit depth
    - long execution time
      - solution: limit the execution time
      - solution: stream the results
  - n+1 problem
    - solution: data loader
  - caching is no longer simple
    - network layer - unsuitable as there is a common URL for all operations
    - solution: granularity (per field)
  - hard to return simple Map

schema

something like swagger: graphql/graphiql
- https://api.spacex.land/graphql/

example

type Starship {
  id: ID!
  name: String! // non nullable
  appearsIn: [Episode!]! // list of objects
  length(unit: LengthUnit = METER): Float // argument
}

good practices
- usually make the types of fields non-null
- however, make all root fields nullable
  - in this case, nullability means that something went wrong but we’re allowing it to show other fields
scalar
- don't have any sub-fields
- predefined: ID, Boolean, Int, String

operations

three types of operations
- queries (READ operations)
- mutations (WRITE-then-READ operations)
  - queries that have side effects
- subscriptions
  - stream of responses
  - used for real-time data monitoring
  - require the use of a data-transport channel that supports continuous pushing of data
    - usually done with WebSockets

query

example

{
  hero {
    name
    friends {
      name
    }
  }
}

steps
1. validate the request against its schema
2. traverse the tree of fields and invoke the resolver functions
fragment
- example
```
{
  leftComparison: hero(episode: EMPIRE) {
    ...comparisonFields // spread that fragment
  }
  rightComparison: hero(episode: JEDI) {
    ...comparisonFields // spread that fragment
  }
}

fragment comparisonFields on Character {
  name
  appearsIn
  friends {
    name
  }
}
```
- are the composition units of the language
- are the reusable piece of any GraphQL operation
- split operations into smaller parts
- data required by an application = sum of the data required by individual components
  - makes a fragment the perfect match for a component
  - represent the data requirements for a single component and then compose them

mutation

is always a WRITE operation followed by a READ operation
vs queries
- queries will be done in parallel
- mutations sequentially
  - if an API consumer sends two mutation fields, the first is guaranteed to finish before the second begins

example

mutation CreateReviewForEpisode($ep: Episode!, $review: ReviewInput!) {
  createReview(episode: $ep, review: $review) {
    stars
    commentary
  }
}

{ // variables
  "ep": "JEDI",
  "review": {
    "stars": 5,
    "commentary": "This is a great movie!"
  }
}

subscription

client should NOT use subscriptions to stay up to date with backend
- use poll intermittently with queries
- re-execute queries on demand when a user performs a relevant action (such as clicking a button)
use subscriptions for
- small, incremental changes to large objects
  - polling for a large object is expensive
    - especially when most of the object's fields rarely change
  - fetch the object's initial state with a query
    - server can proactively push updates to individual fields
- low-latency, real-time updates
  - example: a chat application

data loaders

problem

querying for authors and books
authors "has many" books

we would like to achieve two SQL calls

SELECT *
FROM authors;
-- pretend this returns 3 authors
SELECT *
FROM books
WHERE author_id in (1, 2, 3); -- an array of the author's ids

query

{
  query { // 1 call
    authors {
      name
      books { // each book resolver would only get it’s own parent author: N calls
        title
      }
    }
  }
}

in REST: ORM will help

in graphQl: each resolver function really only knows about its own parent object

ORM won’t have the luxury of a list of author IDs anymore

result: N+1 calls

SELECT *
FROM authors;

SELECT *
FROM books
WHERE author_id in (1);

SELECT *
FROM books
WHERE author_id in (2);

SELECT *
FROM books
WHERE author_id in (3);

solutions
- batching
  - delay asking the database until we will have all appropriate IDs
- caching
  - no application-level caching shared among requests
  - rather simple memoization in the context of a single request
- example library: DataLoader (JavaScript utility library)

client cache

responses from REST are easy to cache
- dictionary nature
  - specific URL gives certain data
  - use the URL itself as the cache key

in graphQl: graph cache

no URL-like primitive (globally unique identifier for a given object)

best practice: expose such an identifier for clients to use

{
  starship(id:"3003") { // id field provides a globally unique key
    id
    name
  }
  droid(id:"2001") {
    id
    name
    friends {
      id
      name
    }
  }
}

pagination

two scenarios
- UX concern: too many items to display
  - mental overload for the user to see them all at once
- performance concern: too many items to load
  - it would overload our backend, the connection, or the client to load all of the items at once
types of pagination from the UX perspective
- numbered pages
  - example: book, Google search
  - expect it to be consistent over some period of time
  - sql
```
SELECT * FROM posts ORDER BY created_at LIMIT 10 OFFSET 20; // page 3, with a page size of 10
SELECT COUNT(*) FROM posts; // total number of entries or pages in the results
```
  - drawbacks
    - only for mostly static content
    - however, usually items are added and removed while the user is navigating
      - leads to skipping items
      - or displaying the same item twice
        
        new item was added at the top of the list
        
        skip and limit approach to show the item at the boundary between pages twice
- sequential pages like Reddit
  - aren’t numbered
  - content changes so rapidly - no point in page numbers at all
  - specify the place in the list we want to begin, and how many items we want to fetch
    - it doesn’t matter how many items were added to the top of the list
    - we have a constant pointer to the specific spot where we left off
      - pointer is called a cursor
      - cursor is a piece of data
        
        generally some kind of ID
        
        represents a location in a paginated list
  - example
```
SELECT * FROM posts
WHERE created_at < $after
ORDER BY created_at LIMIT $page_size;

https://www.reddit.com/?count=25&after=t3_49i88b
```
  - good practice: encoded cursor with some metadata or a timestamp (instead of a row ID)
    - resilient to row deletion
    - we don’t want the query to fail if a specific item is removed
- infinite scroll like Twitter
  - illusion of one very long page
- modern apps today use either the second or third approach
  - app’s content is constantly changing
  - doesn’t make sense to create the illusion of numbered pages

relay cursor connections

generic specification for how a GraphQL server should expose paginated data
generalized concepts we were talking about above
- friends(first:2 after:$opaqueCursor) // vs friends(first:2 after:$friendId)
  - cursors are opaque and their format should not be relied upon
    - suggestion: base64 encoding
  - additional flexibility for pagination model changes
    - user just uses opaque cursors

example

request

{
  user {
    id
    name
    friends(first: 10, after: "opaqueCursor") {
      edges { // each edge has a reference to the user object of the friend, and a cursor
        cursor // every item in the paginated list has its own cursor
        node {
          id
          name
        }
      }
      pageInfo {
        hasNextPage
      }
    }
  }
}

notice that if we want to, we can ask for 10 friends starting from the middle of the list we last fetched

response

{
  "data": {
    "products": {
      "pageInfo": {
        "hasNextPage": true,
        "hasPreviousPage": false
      },
      "edges": [
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc4ODg3NiwibGFzdF92YWx1ZSI6IjEwMDc5Nzg4ODc2In0=",
          "node": {
            "id": "1",
            "name": "Michal"
          }
        },
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5MzQyMCwibGFzdF92YWx1ZSI6IjEwMDc5NzkzNDIwIn0=",
          "node": {
            "id": "2",
            "name": "Marcin"
          }
        },
        {
          "cursor": "eyJsYXN0X2lkIjoxMDA3OTc5NDM4MCwibGFzdF92YWx1ZSI6IjEwMDc5Nzk0MzgwIn0=",
          "node": {
            "id": "3",
            "name": "Anna"
          }
        }
      ]
    }
  },
  ...
}

glossary
- connection - paginated field on an object
  - example: friends field on a user
- edge - metadata about one object in the paginated list
  - includes a cursor to allow pagination starting from that object
- node - actual object
- pageInfo - info about more pages of data to fetch

security

critical threat: resource-exhaustion attacks (DOS attacks) with overly complex queries
- are not specific to GraphQL
- example: query for deeply nested relationships
  - (user –> friends –> friends –> friends …)
- example use field aliases to ask for the same field many times
```
{
  empireHero: hero(episode: EMPIRE) {
    name
  }
  jediHero: hero(episode: JEDI) {
    name
  }
}
```
solution
- cost analysis on the query
- enforce limits on the amount of data
- timeouts

caliban

features
- minimize boilerplate
- purely functional (strongly typed, explicit errors)
- user friendly
- schema / resolver separation
vs sangria
- sangria: lots of boilerplate (macros to the rescue)
- sangria: future based (effects are better)
- sangria: schema and resolved tied together
schema is derived automatically from the case classes
- mangolia - used for create schema for traits and case classes
  - provide a schema for custom types
```
implicit val nesSchema: Schema[NonEmptyString] = Schema.stringSchema.contramap(_.value)
```
    - many useful types: https://github.com/niqdev/caliban-extras
```
val api = graphQL(resolver)
println(api.render) // prints derived schema
```

schema deriving examples

case classes

case class Pug(name: String, nicknames: List[String], pictureUrl: Option[String])

is transformed into

type: Pug {
    name: String,
    nicknames: [String!]!,
    pictureUrl: String // optionality -> option
}

enums

sealed trait Color
case object FAWN extends Color

is transformed into

type: enum Color { FAWN }

arguments

case class PugName(name: String)
case class Queries(pug: PugName => Option[Pug])

is transformed into

type: Queries { pug(name: String!): Pug }

n+1 problem
- solution: ZQuery
  - parallelize queries
  - cache identical queries
  - batch queries if batching function provided

builtin wrappers

val api = graphQL(...) @@
  maxDdepth(30) @@
  maxFields(200) @@
  timeout(10 seconds) @@
  printSlowQueries(1 second)

mtumilowicz / scala-graphql-caliban-workshop Goto Github PK

scala-graphql-caliban-workshop's Introduction

scala-graphql-caliban-workshop

preface

introduction

graphql

schema

operations

query

mutation

subscription

data loaders

client cache

pagination

relay cursor connections

security

caliban

scala-graphql-caliban-workshop's People

Contributors

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org