Giter Club home page Giter Club logo

Hi. 👋

I'm a Software Engineer with Datadog, based in Munich, Germany. 🥨 Ex-Skyscanner.

Find me on these networks

Stack Overflow LinkedIn Twitter Meetup

Buy me a coffee

If you found any of my work useful, you might want to consider buying me a coffee.

Buy Me A Coffee

Blog posts

I occasionally write engineering blog posts. Sometimes my colleagues blog about our work, too.

Roll up to speed up: Improving OpenTSDB query performance

How to improve the query performance for Skyscanner's OpenTSDB cluster and enabling queries that previously were impossible to serve by reducing the resolution of historic data.

👉 Roll up to speed up: Improving OpenTSDB query performance

The problem that wasn’t there — and the Bosun alerts that were

By Annette Wilson

Annette blogged about phantom alerts that our alerting solution Bosun would fire every so often, paging on-call engineers, but turn out to be false every time. The alert condition which was met and triggered the alert, would recover on the next evaluation, only split seconds later. Subsequent investigation and resubmitting the exact same query wouldn't show any sign of a problem, let alone the alert condition being met.

It had been annoying us for two years, but it also happened infrequently enough that investigation any efforts were regularly abandoned without meaningful results until years later. It was mysterious and interesting enough to still blog about it, though. Also, we really wanted to sleep comfortably again without being woken up by a false alert looming. The blog post describes the problem and in an addendum how I finally found the root cause.

TL;DR - Expand here to show the root cause if you don't like exciting stories

Our initial suspicion of a bug in Bosun turned out incorrect. When our timeseries database OpenTSDB serves a query, it uses 8 scanners to return all the required data from HBase asynchronously and proceeds to merge them before returning the result to the client.

The scanners write the results to a map. The datastructure used to generate the key for tese results, however, wasn't thread-safe and in a rare race condition could return the same key for two scanners which meant that one overwrote the other's results. Bosun had incomplete data and the alert went into an unknown state, paging the on-call engineer.

The unspectacular fix can be seen in OpenTSDB/opentsdb#1754.

👉 The problem that wasn’t there — and the Bosun alerts that were

Björn Marschollek's Projects

backstage icon backstage

Backstage is an open platform for building developer portals

bosun icon bosun

Time Series Alerting Framework

byimpf icon byimpf

Script to check for and book vaccination appointments in Bavarian vaccination centres

flickr-collage icon flickr-collage

A simple command line interface for Flickr. It builds a collage of top-rated images for 10 search terms

mvg-info icon mvg-info

Small tool to fetch disruptions to Munich's public transport services

opentsdb icon opentsdb

A scalable, distributed Time Series Database.

osmaps-radius icon osmaps-radius

an open street maps version of http://obeattie.github.io/gmaps-radius

pdjs icon pdjs

JavaScript wrapper for the PagerDuty API

spacefold icon spacefold

Use Pub/Sub pattern inside your React applications easily

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.