Giter Club home page Giter Club logo

fastapi-observability's Introduction

FastAPI with Observability

Observe the FastAPI application with three pillars of observability on Grafana:

  1. Traces with Tempo and OpenTelemetry Python SDK
  2. Metrics with Prometheus and Prometheus Python Client
  3. Logs with Loki

Observability Architecture

Table of contents

Quick Start

  1. Install Loki Docker Driver

    docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
  2. Build application image and start all services with docker-compose

    docker-compose build
    docker-compose up -d
  3. Send requests with siege and curl to the FastAPI app

    bash request-script.sh
    bash trace.sh

    Or you can use Locust to send requests:

    # install locust first with `pip install locust` if you don't have it
    locust -f locustfile.py --headless --users 10 --spawn-rate 1 -H http://localhost:8000

    Or you can send requests with k6:

    k6 run --vus 1 --duration 300s k6-script.js
  4. Check predefined dashboard FastAPI Observability on Grafana http://localhost:3000/ login with admin:admin

    Dashboard screenshot:

    FastAPI Monitoring Dashboard

    The dashboard is also available on Grafana Dashboards.

Explore with Grafana

Grafana provides a great solution, which could observe specific actions in service between traces, metrics, and logs through trace ID and exemplar.

Observability Correlations

Image Source: Grafana

Metrics to Traces

Get Trace ID from an exemplar in metrics, then query in Tempo.

Query: histogram_quantile(.99,sum(rate(fastapi_requests_duration_seconds_bucket{app_name="app-a", path!="/metrics"}[1m])) by(path, le))

Metrics to Traces

Traces to Logs

Get Trace ID and tags (here is compose.service) defined in Tempo data source from span, then query with Loki.

Traces to Logs

Logs to Traces

Get Trace ID from log (regex defined in Loki data source), then query in Tempo.

Logs to Traces

Detail

FastAPI Application

For a more complex scenario, we use three FastAPI applications with the same code in this demo. There is a cross-service action in /chain endpoint, which provides a good example of how to use OpenTelemetry SDK and how Grafana presents trace information.

Traces and Logs

We use OpenTelemetry Python SDK to send trace info with gRCP to Tempo. Each request span contains other child spans when using OpenTelemetry instrumentation. The reason is that instrumentation will catch each internal asgi interaction (opentelemetry-python-contrib issue #831). If you want to get rid of the internal spans, there is a workaround in the same issue #831 by using a new OpenTelemetry middleware with two overridden methods for span processing.

We use OpenTelemetry Logging Instrumentation to override the logger format with another format with trace id and span id.

# fastapi_app/utils.py

def setting_otlp(app: ASGIApp, app_name: str, endpoint: str, log_correlation: bool = True) -> None:
    # Setting OpenTelemetry
    # set the service name to show in traces
    resource = Resource.create(attributes={
        "service.name": app_name, # for Tempo to distinguish source
        "compose_service": app_name # as a query criteria for Trace to logs
    })

    # set the tracer provider
    tracer = TracerProvider(resource=resource)
    trace.set_tracer_provider(tracer)

    tracer.add_span_processor(BatchSpanProcessor(
        OTLPSpanExporter(endpoint=endpoint)))

    if log_correlation:
        LoggingInstrumentor().instrument(set_logging_format=True)

    FastAPIInstrumentor.instrument_app(app, tracer_provider=tracer)

The following image shows the span info sent to Tempo and queried on Grafana. Trace span info provided by FastAPIInstrumentor with trace ID (17785b4c3d530b832fb28ede767c672c), span id(d410eb45cc61f442), service name(app-a), custom attributes(service.name=app-a, compose_service=app-a) and so on.

Span Information

Log format with trace id and span id, which is overridden by `LoggingInstrumentor``

%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] [trace_id=%(otelTraceID)s span_id=%(otelSpanID)s resource.service.name=%(otelServiceName)s] - %(message)s

The following image is what the logs look like.

Log With Trace ID And Span ID

Span Inject

If you want other services to use the same Trace ID, you have to use inject function to add current span information to the header. Because OpenTelemetry FastAPI instrumentation only takes care of the asgi app's request and response, it does not affect any other modules or actions like sending HTTP requests to other servers or function calls.

# fastapi_app/main.py

from opentelemetry.propagate import inject

@app.get("/chain")
async def chain(response: Response):

    headers = {}
    inject(headers)  # inject trace info to header

    async with httpx.AsyncClient() as client:
        await client.get(f"http://localhost:8000/", headers=headers,)
    async with httpx.AsyncClient() as client:
        await client.get(f"http://{TARGET_ONE_HOST}:8000/io_task", headers=headers,)
    async with httpx.AsyncClient() as client:
        await client.get(f"http://{TARGET_TWO_HOST}:8000/cpu_task", headers=headers,)

    return {"path": "/chain"}

Metrics

Use Prometheus Python Client to generate OpenTelemetry format metric with exemplars and expose on /metrics for Prometheus.

In order to add an exemplar to metrics, we retrieve the trace id from the current span for the exemplar and add the trace id dict to the Histogram or Counter metrics.

# fastapi_app/utils.py

from opentelemetry import trace
from prometheus_client import Histogram

REQUESTS_PROCESSING_TIME = Histogram(
    "fastapi_requests_duration_seconds",
    "Histogram of requests processing time by path (in seconds)",
    ["method", "path", "app_name"],
)

# retrieve trace id for exemplar
span = trace.get_current_span()
trace_id = trace.format_trace_id(
      span.get_span_context().trace_id)

REQUESTS_PROCESSING_TIME.labels(method=method, path=path, app_name=self.app_name).observe(
      after_time - before_time, exemplar={'TraceID': trace_id}
)

Because exemplars is a new datatype proposed in OpenMetrics, /metrics have to use CONTENT_TYPE_LATEST and generate_latest from prometheus_client.openmetrics.exposition module instead of prometheus_client module. Otherwise using the wrong generate_latest the exemplars dict behind Counter and Histogram will never show up, and using the wrong CONTENT_TYPE_LATEST will cause Prometheus scraping to fail.

# fastapi_app/utils.py

from prometheus_client import REGISTRY
from prometheus_client.openmetrics.exposition import CONTENT_TYPE_LATEST, generate_latest

def metrics(request: Request) -> Response:
    return Response(generate_latest(REGISTRY), headers={"Content-Type": CONTENT_TYPE_LATEST})

Metrics with exemplars

Metrics With Exemplars

Auto Instrumentation

In previous sections, we used OpenTelemetry Python SDK to add trace info to spans, logs, and metrics. However, we can use OpenTelemetry Auto Instrumentation to automatically instrument FastAPI application with OpenTelemetry Python SDK.

For using automatic instrumentation, we need to install at least three packages:

  1. opentelemetry-distro: OpenTelemetry API and SDK
  2. opentelemetry-exporter: Exporter for traces and metrics which have to correspond to the exporter setting of the agent, we use opentelemetry-exporter-otlp here
  3. opentelemetry-instrumentation: Instrumentation for libraries and frameworks, we use opentelemetry-instrumentation-fastapi here
    1. Or you can install opentelemetry-bootstrap with pip install opentelemetry-bootstrap -a install, which will install the instrumentation packages automatically based on your installed libraries and frameworks.

There are two ways to set the configuration of OpenTelemetry Instrumentation for Python:

  1. Environment variables
  2. CLI arguments

In this project, we have a FastAPI application app-c without using any OpenTelemetry SDK in code to show how to use automatic instrumentation. And we use environment variables to set the automatic instrumentation configuration:

# docker-compose.yml
app-c:
  build: ./fastapi_app_plain/
  environment:
    APP_NAME: "app-c"
    OTEL_PYTHON_LOG_CORRELATION: "true"
    OTEL_TRACES_EXPORTER: "otlp"
    OTEL_EXPORTER_OTLP_ENDPOINT: "http://tempo:4317"
    OTEL_METRICS_EXPORTER: "none"
    OTEL_SERVICE_NAME: "app-c"
    OTEL_RESOURCE_ATTRIBUTES: "compose_service=app-c"
  command: "opentelemetry-instrument python main.py"

Here is a sample command with CLI arguments:

opentelemetry-instrument \
    --traces_exporter otlp \
    --metrics_exporter none \
    --service_name fastapi \
    --exporter_otlp_endpoint http://otel-collector:4317 \
    python main.py

With automatic instrumentation, we can get the same trace info as the previous section. The logs will be overridden with trace id and span id when enabling OTEL_PYTHON_LOG_CORRELATION environment variable, and there is more detail on the document. However, we cannot add exemplars to metrics with automatic instrumentation, so if you want to use exemplars, you have to use manual instrumentation with OpenTelemetry Python SDK.

Prometheus - Metrics

Collects metrics from applications.

Prometheus Config

Define all FastAPI applications metrics scrape jobs in etc/prometheus/prometheus.yml.

...
scrape_configs:
  - job_name: 'app-a'
    scrape_interval: 5s
    static_configs:
      - targets: ['app-a:8000']
  - job_name: 'app-b'
    scrape_interval: 5s
    static_configs:
      - targets: ['app-b:8000']
  - job_name: 'app-c'
    scrape_interval: 5s
    static_configs:
      - targets: ['app-c:8000']

Grafana Data Source

Add an Exemplars which uses the value of TraceID label to create a Tempo link.

Grafana data source setting example:

Data Source of Prometheus: Exemplars

Grafana data sources config example:

name: Prometheus
type: prometheus
typeName: Prometheus
access: proxy
url: http://prometheus:9090
password: ''
user: ''
database: ''
basicAuth: false
isDefault: true
jsonData:
exemplarTraceIdDestinations:
   - datasourceUid: tempo
      name: TraceID
httpMethod: POST
readOnly: false
editable: true

Tempo - Traces

Receives spans from applications.

Grafana Data Source

Trace to logs setting:

  1. Data source: target log source
  2. Tags: key of tags or process level attributes from the trace, which will be log query criteria if the key exists in the trace
  3. Map tag names: Convert existing key of tags or process level attributes from trace to another key, then used as log query criteria. Use this feature when the values of the trace tag and log label are identical but the keys are different.

Grafana data source setting example:

Data Source of Tempo: Trace to logs

Grafana data sources config example:

name: Tempo
type: tempo
typeName: Tempo
access: proxy
url: http://tempo
password: ''
user: ''
database: ''
basicAuth: false
isDefault: false
jsonData:
nodeGraph:
   enabled: true
tracesToLogs:
   datasourceUid: loki
   filterBySpanID: false
   filterByTraceID: true
   mapTagNamesEnabled: false
   tags:
      - compose_service
readOnly: false
editable: true

Loki - Logs

Collect logs with Loki Docker Driver from all services.

Loki Docker Driver

  1. Use YAML anchor and alias feature to set logging options for each service.
  2. Set Loki Docker Driver options
    1. loki-url: loki service endpoint
    2. loki-pipeline-stages: processes multiline log from FastAPI application with multiline and regex stages (reference)
x-logging: &default-logging # anchor(&): 'default-logging' for defines a chunk of configuration
  driver: loki
  options:
    loki-url: 'http://localhost:3100/api/prom/push'
    loki-pipeline-stages: |
      - multiline:
          firstline: '^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}'
          max_wait_time: 3s
      - regex:
          expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},d{3}) (?P<message>(?s:.*))$$'
# Use $$ (double-dollar sign) when your configuration needs a literal dollar sign.

version: "3.4"

services:
   foo:
      image: foo
      logging: *default-logging # alias(*): refer to 'default-logging' chunk 

Grafana Data Source

Add a TraceID derived field to extract the trace id and create a Tempo link from the trace id.

Grafana data source setting example:

Data Source of Loki: Derived fields

Grafana data source config example:

name: Loki
type: loki
typeName: Loki
access: proxy
url: http://loki:3100
password: ''
user: ''
database: ''
basicAuth: false
isDefault: false
jsonData:
derivedFields:
   - datasourceUid: tempo
      matcherRegex: (?:trace_id)=(\w+)
      name: TraceID
      url: $${__value.raw}
      # Use $$ (double-dollar sign) when your configuration needs a literal dollar sign.
readOnly: false
editable: true

Grafana

  1. Add Prometheus, Tempo, and Loki to the data source with config file etc/grafana/datasource.yml.
  2. Load predefined dashboard with etc/dashboards.yaml and etc/dashboards/fastapi-observability.json.
# grafana in docker-compose.yaml
grafana:
   image: grafana/grafana:10.1.0
   volumes:
      - ./etc/grafana/:/etc/grafana/provisioning/datasources # data sources
      - ./etc/dashboards.yaml:/etc/grafana/provisioning/dashboards/dashboards.yaml # dashboard setting
      - ./etc/dashboards:/etc/grafana/dashboards # dashboard json files directory

Reference

  1. FastAPI Traces Demo
  2. Waber - A Uber-like (Car-Hailing APP) cloud-native application with OpenTelemetry
  3. Intro to exemplars, which enable Grafana Tempo’s distributed tracing at massive scale
  4. Trace discovery in Grafana Tempo using Prometheus exemplars, Loki 2.0 queries, and more
  5. The New Stack (TNS) observability app
  6. Don’t Repeat Yourself with Anchors, Aliases and Extensions in Docker Compose Files
  7. How can I escape a $ dollar sign in a docker compose file?
  8. Tempo Trace to logs tags discussion
  9. Starlette Prometheus
  10. Tempo Example

fastapi-observability's People

Contributors

blueswen avatar choncba avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.