Giter Club home page Giter Club logo

bootsnap's Introduction

Bootsnap

Beta-quality. See the last section of this README.

Bootsnap is a library that plugs into a number of Ruby and (optionally) ActiveSupport and YAML methods to optimize and cache expensive computations. See the How Does This Work section for more information.

Quick Performance Overview

  • Discourse reports a boot time reduction of approximately 50%, from roughly 6 to 3 seconds on one machine;
  • One of our smaller internal apps also sees a reduction of 50%, from 3.6 to 1.8 seconds;
  • The core Shopify platform -- a rather large monolithic application -- boots about 75% faster, dropping from around 25s to 6.5s.

Usage

Add bootsnap to your Gemfile:

gem 'bootsnap'

Next, add this to your boot setup immediately after require 'bundler/setup' (i.e. as early as possible: the sooner this is loaded, the sooner it can start optimizing things)

require 'bootsnap'
Bootsnap.setup(
  cache_dir:            'tmp/cache', # Path to your cache
  development_mode:     ENV['MY_ENV'] == 'development',
  load_path_cache:      true,        # Should we optimize the LOAD_PATH with a cache?
  autoload_paths_cache: true,        # Should we optimize ActiveSupport autoloads with cache?
  disable_trace:        false,       # Sets `RubyVM::InstructionSequence.compile_option = { trace_instruction: false }`
  compile_cache_iseq:   true,        # Should compile Ruby code into ISeq cache?
  compile_cache_yaml:   true         # Should compile YAML into a cache?
)

Protip: You can replace require 'bootsnap' with BootLib::Require.from_gem('bootsnap', 'bootsnap') using this trick. This will help optimize boot time further if you have an extremely large $LOAD_PATH.

How does this work?

Bootsnap is a library that plugs into a number of Ruby and (optionally) ActiveSupport and YAML methods. These methods are modified to cache results of expensive computations, and can be grouped into two broad categories:

  • Path Pre-Scanning
    • Kernel#require and Kernel#load are modified to eliminate $LOAD_PATH scans.
    • ActiveSupport::Dependencies.{autoloadable_module?,load_missing_constant,depend_on} are overridden to eliminate scans of ActiveSupport::Dependencies.autoload_paths.
  • Compilation caching
    • RubyVM::InstructionSequence.load_iseq is implemented to cache the result of ruby bytecode compilation.
    • YAML.load_file is modified to cache the result of loading a YAML object in MessagePack format (or Marshal, if the message uses types unsupported by MessagePack).

Path Pre-Scanning

(This work is a minor evolution of bootscale).

Upon initialization of bootsnap or modification of the path (e.g. $LOAD_PATH), Bootsnap::LoadPathCache will fetch a list of requirable entries from a cache, or, if necessary, perform a full scan and cache the result.

Later, when we run (e.g.) require 'foo', ruby would iterate through every item on our $LOAD_PATH ['x', 'y', ...], looking for x/foo.rb, y/foo.rb, and so on. Bootsnap instead looks at all the cached requirables for each $LOAD_PATH entry and substitutes the full expanded path of the match ruby would have eventually chosen.

If you look at the syscalls generated by this behaviour, the net effect is that what would previously look like this:

open  x/foo.rb # (fail)
# (imagine this with 500 $LOAD_PATH entries instead of two)
open  y/foo.rb # (success)
close y/foo.rb
open  y/foo.rb
...

becomes this:

open y/foo.rb
...

Exactly the same strategy is employed for methods that traverse ActiveSupport::Dependencies.autoload_paths if the autoload_paths_cache option is given to Bootsnap.setup.

The following diagram flowcharts the overrides that make the *_path_cache features work.

Flowchart explaining Bootsnap

Bootsnap classifies path entries into two categories: stable and volatile. Volatile entries are scanned each time the application boots, and their caches are only valid for 30 seconds. Stable entries do not expire -- once their contents has been scanned, it is assumed to never change.

The only directories considered "stable" are things under the Ruby install prefix (RbConfig::CONFIG['prefix'], e.g. /usr/local/ruby or ~/.rubies/x.y.z), and things under the Gem.path (e.g. ~/.gem/ruby/x.y.z). Everything else is considered "volatile".

In addition to the Bootsnap::LoadPathCache::Cache source, this diagram may help clarify how entry resolution works:

How path searching works

It's also important to note how expensive LoadErrors can be. If ruby invokes require 'something', but that file isn't on $LOAD_PATH, it takes 2 * $LOAD_PATH.length filesystem accesses to determine that. Bootsnap caches this result too, raising a LoadError without touching the filesystem at all.

Compilation Caching

(A simpler implementation of this concept can be found in yomikomu).

Ruby has complex grammar and parsing it is not a particularly cheap operation. Since 1.9, Ruby has translated ruby source to an internal bytecode format, which is then executed by the Ruby VM. Since 2.2, Ruby exposes an API that allows caching that bytecode. This allows us to bypass the relatively-expensive compilation step on subsequent loads of the same file.

We also noticed that we spend a lot of time loading YAML documents during our application boot, and that MessagePack and Marshal are much faster at deserialization than YAML, even with a fast implementation. We use the same strategy of compilation caching for YAML documents, with the equivalent of Ruby's "bytecode" format being a MessagePack document (or, in the case of YAML documents with types unsupported by MessagePack, a Marshal stream).

These compilation results are stored using xattrs on the source files. This is likely to change in the future, as it has some limitations (notably precluding Linux support except where the user feels like changing mount flags). However, this is a very performant implementation.

Whereas before, the sequence of syscalls generated to require a file would look like:

open    /c/foo.rb -> m
fstat64 m
close   m
open    /c/foo.rb -> o
fstat64 o
fstat64 o
read    o
read    o
...
close   o

With bootsnap, we get:

open      /c/foo.rb -> n
fstat64   n
fgetxattr n
fgetxattr n
close     n

Bootsnap writes two xattrs attached to each file read:

  • user.aotcc.value, the binary compilation result; and
  • user.aotcc.key, a cache key to determine whether user.aotcc.value is still valid.

The key includes several fields:

  • version, hardcoded in bootsnap. Essentially a schema version;
  • compile_option, which changes with RubyVM::InstructionSequence.compile_option does;
  • data_size, the number of bytes in user.aotcc.value, which we need to read it into a buffer using fgetxattr(2);
  • ruby_revision, the version of Ruby this was compiled with; and
  • mtime, the last-modification timestamp of the source file when it was compiled.

If the key is valid, the result is loaded from the value. Otherwise, it is regenerated and clobbers the current cache.

This diagram may help illustrate how it works:

Compilation Caching

Putting it all together

Imagine we have this file structure:

/
├── a
├── b
└── c
    └── foo.rb

And this $LOAD_PATH:

["/a", "/b", "/c"]

When we call require 'foo' without bootsnap, Ruby would generate this sequence of syscalls:

open    /a/foo.rb -> -1
open    /b/foo.rb -> -1
open    /c/foo.rb -> n
close   n
open    /c/foo.rb -> m
fstat64 m
close   m
open    /c/foo.rb -> o
fstat64 o
fstat64 o
read    o
read    o
...
close   o

With bootsnap, we get:

open      /c/foo.rb -> n
fstat64   n
fgetxattr n
fgetxattr n
close     n

If we call require 'nope' without bootsnap, we get:

open    /a/nope.rb -> -1
open    /b/nope.rb -> -1
open    /c/nope.rb -> -1
open    /a/nope.bundle -> -1
open    /b/nope.bundle -> -1
open    /c/nope.bundle -> -1

...and if we call require 'nope' with bootsnap, we get...

# (nothing!)

Trustworthiness

We use the *_path_cache features in production and haven't experienced any issues in a long time.

The compile_cache_* features work well for us in development on macOS, but probably don't work on Linux at all.

disable_trace should be completely safe, but we don't really use it because some people like to use tools that make use of trace instructions.

feature where we're using it
load_path_cache everywhere
autoload_path_cache everywhere
disable_trace nowhere, but it's safe unless you need tracing
compile_cache_iseq development, unlikely to work on Linux
compile_cache_yaml development, unlikely to work on Linux

bootsnap's People

Contributors

burke avatar jules2689 avatar samsaffron avatar nullpointer2017 avatar therusskiy avatar shiroyasha avatar pushrax avatar pior avatar wvanbergen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.