jaegertracing / jaeger Goto Github PK

View Code? Open in Web Editor NEW

19.9K 331.0 2.4K 27.54 MB

CNCF Jaeger, a Distributed Tracing Platform

Home Page: https://www.jaegertracing.io/

License: Apache License 2.0

Makefile 0.93% Go 95.18% Shell 1.48% Python 1.81% HTML 0.06% Dockerfile 0.26% Jsonnet 0.29% JavaScript 0.01%

distributed-tracing cncf tracing observability jaeger opentelemetry hacktoberfest

jaeger's Introduction

Jaeger - a Distributed Tracing System

graph TD
    SDK["OpenTelemetry SDK"] --> |HTTP or gRPC| COLLECTOR
    COLLECTOR["Jaeger Collector"] --> STORE[Storage]
    COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
    COLLECTOR --> |gRPC/sampling| SDK
    PLUGIN --> STORE
    QUERY[Jaeger Query Service] --> STORE
    QUERY --> |gRPC| PLUGIN
    UI[Jaeger UI] --> |HTTP| QUERY
    subgraph Application Host
        subgraph User Application
            SDK
        end
    end

Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and donated to Cloud Native Computing Foundation. It can be used for monitoring microservices-based distributed systems:

Distributed context propagation
Distributed transaction monitoring
Root cause analysis
Service dependency analysis
Performance / latency optimization

Get Involved

Jaeger is an open source project with open governance. We welcome contributions from the community, and we would love your help to improve and extend the project. Here are some ideas for how to get involved. Many of them do not even require any coding.

Features

High Scalability

Jaeger backend is designed to have no single points of failure and to scale with the business needs. For example, any given Jaeger installation at Uber is typically processing several billions of spans per day.

Relationship with OpenTelemetry

The Jaeger and OpenTelemetry projects have different goals. OpenTelemetry aims to provide APIs and SDKs in multiple languages to allow applications to export various telemetry data out of the process, to any number of metrics and tracing backends. The Jaeger project is primarily the tracing backend that receives tracing telemetry data and provides processing, aggregation, data mining, and visualizations of that data. For more information please refer to a blog post Jaeger and OpenTelemetry.

Jaeger was originally designed to support the OpenTracing standard. The terminology is still used in Jaeger UI, but the concepts have direct mapping to the OpenTelemetry data model of traces.

Capability	OpenTracing concept	OpenTelemetry concept
Represent traces as directed acyclic graphs (not just trees)	span references	span links
Strongly typed span attributes	span tags	span attributes
Strongly typed events/logs	span logs	span events

Jaeger project recommends OpenTelemetry SDKs for instrumentation, instead of now-deprecated Jaeger SDKs.

Multiple storage backends

Jaeger can be used with a growing a number of storage backends:

It natively supports two popular open source NoSQL databases as trace storage backends: Cassandra and Elasticsearch.
It integrates via a gRPC API with other well known databases that have been certified to be Jaeger compliant: TimescaleDB via Promscale, ClickHouse.
There is embedded database support using Badger and simple in-memory storage for testing setups.
ScyllaDB can be used as a drop-in replacement for Cassandra since it uses the same data model and query language.
There are ongoing community experiments using other databases, such as InfluxDB, Amazon DynamoDB, YugabyteDB(YCQL).

Modern Web UI

Jaeger Web UI is implemented in Javascript using popular open source frameworks like React. Several performance improvements have been released in v1.0 to allow the UI to efficiently deal with large volumes of data and to display traces with tens of thousands of spans (e.g. we tried a trace with 80,000 spans).

Cloud Native Deployment

Jaeger backend is distributed as a collection of Docker images. The binaries support various configuration methods, including command line options, environment variables, and configuration files in multiple formats (yaml, toml, etc.).

The recommended way to deploy Jaeger in a production Kubernetes cluster is via the Jaeger Operator.

The Jaeger Operator provides a CLI to generate Kubernetes manifests from the Jaeger CR. This can be considered as an alternative source over plain Kubernetes manifest files.

The Jaeger ecosystem also provides a Helm chart as an alternative way to deploy Jaeger.

Observability

All Jaeger backend components expose Prometheus metrics by default (other metrics backends are also supported). Logs are written to standard out using the structured logging library zap.

Security

Third-party security audits of Jaeger are available in https://github.com/jaegertracing/security-audits. Please see Issue #1718 for the summary of available security mechanisms in Jaeger.

Backwards compatibility with Zipkin

Although we recommend instrumenting applications with OpenTelemetry, if your organization has already invested in the instrumentation using Zipkin libraries, you do not have to rewrite all that code. Jaeger provides backwards compatibility with Zipkin by accepting spans in Zipkin formats (Thrift or JSON v1/v2) over HTTP. Switching from Zipkin backend is just a matter of routing the traffic from Zipkin libraries to the Jaeger backend.

Version Compatibility Guarantees

Occasionally, CLI flags can be deprecated due to, for example, usability improvements or new functionality. In such situations, developers introducing the deprecation are required to follow these guidelines.

In short, for a deprecated CLI flag, you should expect to see the following message in the --help documentation:

(deprecated, will be removed after yyyy-mm-dd or in release vX.Y.Z, whichever is later)

A grace period of at least 3 months or two minor version bumps (whichever is later) from the first release containing the deprecation notice will be provided before the deprecated CLI flag can be deleted.

For example, consider a scenario where v1.28.0 is released on 01-Jun-2021 containing a deprecation notice for a CLI flag. This flag will remain in a deprecated state until the later of 01-Sep-2021 or v1.30.0 where it can be removed on or after either of those events. It may remain deprecated for longer than the aforementioned grace period.

Go Version Compatibility Guarantees

The Jaeger project attempts to track the currently supported versions of Go, as defined by the Go team. Removing support for an unsupported Go version is not considered a breaking change.

Starting with the release of Go 1.21, support for Go versions will be updated as follows:

Soon after the release of a new Go minor version N, updates will be made to the build and tests steps to accommodate the latest Go minor version.
Soon after the release of a new Go minor version N, support for Go version N-2 will be removed and version N-1 will become the minimum required version.

Related Repositories

Documentation

Published: https://www.jaegertracing.io/docs/
Source: https://github.com/jaegertracing/documentation

Instrumentation Libraries

Jaeger project recommends OpenTelemetry SDKs for instrumentation, instead of Jaeger's native SDKs that are now deprecated.

Deployment

Jaeger Operator for Kubernetes

Components

Building From Source

See CONTRIBUTING.

Contributing

See CONTRIBUTING.

Thanks to all the people who already contributed!

Maintainers

Rules for becoming a maintainer are defined in the GOVERNANCE document. Below are the official maintainers of the Jaeger project. Please use @jaegertracing/jaeger-maintainers to tag them on issues / PRs.

Some repositories under jaegertracing org have additional maintainers.

Emeritus Maintainers

We are grateful to our former maintainers for their contributions to the Jaeger project.

Project Status Meetings

The Jaeger maintainers and contributors meet regularly on a video call. Everyone is welcome to join, including end users. For meeting details, see https://www.jaegertracing.io/get-in-touch/.

Roadmap

See https://www.jaegertracing.io/docs/roadmap/

Get in Touch

Have questions, suggestions, bug reports? Reach the project community via these channels:

Slack chat room #jaeger (need to join CNCF Slack for the first time)
jaeger-tracing mail group
GitHub issues and discussions

Adopters

Jaeger as a product consists of multiple components. We want to support different types of users, whether they are only using our instrumentation libraries or full end to end Jaeger installation, whether it runs in production or you use it to troubleshoot issues in development.

Please see ADOPTERS.md for some of the organizations using Jaeger today. If you would like to add your organization to the list, please comment on our survey issue.

License

jaeger's People

Contributors

Stargazers

Watchers

Forkers

tomzhang neven7 kwojcik cc13ny pavolloffay benfleis yuanli1 liuzhenhzong truedays swanncroiset objectiser vincentchen wenxiaoqian andrew8305 hyqgod blameswood airclear rzs840707 ycaihua houzhihoujue suhuaguo kintondo wadia andreccls zmyer cancheung oibe joelbnu mreddy8182 enterstudio inotnako emiljanogj tigertian etsangsplk seregayoga luischeng okam fish-red chsjiang pandakf zhangf911 yuekui2 beiyexertz ivecode honghuac cluo sjanulonoks john-turner black-adder codejuan oneoaas-golib softliumin shiftwinting wycharry koalacxr arthursxl8 wjfshuren sunguo mwarkentin satryacode jodezer alexxnica kryndex wuman wenbo2018 mh-park come-maiz xvmoon yuanfeng0905 delkyd mrutkows pzx888 roujdami fengzhihao isitwhoisit mnaga nickydo goller jkchendataman ninefive conner dekins de-robat thassan1977 isaachier daniel-007 caniszczyk aryanugroho code4high calmking hgfeaon andrepinto jljlpch adrianmartinmulesoft bharat-p sematext zariel mukteshkrmishra parhelium hugoren

jaeger's Issues

adaptive sampling

Hi,

Is the adaptive sampling inside collector supported? I could not figure out how I could set/change sampling strategy via collector. Please help.

Thanks.
-kui

zipkin.thrift error tag value set as a string

When using zipkin.thrift error tag is set as a string value (maybe every value is set as a string). This may affect future data analysis pipeline.

Fix for UI was done here https://github.com/uber/jaeger-ui/pull/32

Query startup should fail when -query.static-files has no trailing slash

It turns out that -query.static-files requires trailing slash. Without it instead of index.html a HTTP 404 is returned and it's not clear why. It would be good to fail at startup if it's missing.

Allow agent to use a static list to connect to multiple collectors

Revisit Duration indexing and querying logic

Currently we are using an archaic methodology for storing and querying against duration (against hourly buckets that are not timestamps). We should revisit this and fix up both the code and the cassandra schema

Hotrod example doesn't work

Cloned the jaeger repo
make install_examples -- worked
cd examples/hotrod -- worked
go run ./main.go all -- failed

/Users/Chris/go/src/github.com/uber/jaeger/examples/hotrod/cmd/customer.go:28:2: code in directory /Users/Chris/go/src/github.com/uber-go/zap expects import "go.uber.org/zap"

Not familiar with Go so could be missing something totally obvious ... but thought i would log this as an issue as other noobs like me will likely hit this too.

Make RPC metrics less HTTP-centric

From jaegertracing/jaeger-client-java#172 (comment):

name:       jaeger-rpc.{requests, request_latency}
service:    myservice 
endpoint:   fooendpoint 
result:     {ok,err}
error:      {4xx,5xx,app-error,sys-error,rate-limit,timeout,...}
transport:  {http,tchannel,...}
direction:  {inbound,outbound}

Integrate with Kubernetes

https://github.com/fabric8io/kubernetes-zipkin

jaeger-collector does not respect flag value 0.

Current Behavior

jaeger-collector stores spans with span context flags set to 0.

Expected Behavior

jaeger-collector should not store spans with span contexts where the flag value is 0 because these spans failed the sampling tests.
This protects the jaeger-collector from buggy clients that assume that the collector only stores sampled spans.

Reproduction steps

To reproduce, build jaeger, and run locally as such

make build_ui
cd cmd/standalone/
go run ./main.go --span-storage.type=memory --query.static-files=../../jaeger-ui-build/build/

Then use yab to make a request as such. Use jaeger.thrift from here.

yab -A trace-id:9 -y payload.yaml

payload.yaml contains the following

---
method: "Collector::submitBatches"
peer: "localhost:14267"
request:
  batches:
    -
      process:
        service_name: flags-test
        tags:
          -
            key: jaeger.hostname
            vStr: localhost
            vType: string
      spans:
        -
          duration: 1294
          flags: 0
          operation_name: arbitrary
          parent_span_id: 0
          span_id: 1
          start_time: 1494509316000000
          tags:
            -
              key: span.kind
              vStr: server
              vType: string
          trace_id_high: 0
          trace_id_low: "${trace-id:1}"
service: jaeger-collector
thrift: jaeger.thrift

Note that flags is set to 0, yet the trace is retrievable using the UI at http://localhost:16686/trace/9

Jaeger agent does not fit serverless FAAS (AWS lambda etc)

The pattern of using Jaeger Agents runnning on the same host does not fit (our) usecase of running (python) functions as FAAS in AWS Lambda.
The requirement of running a seperate process is a showstopper.
Is it possible to have the agent running in another host? or in some work-around this issue in some way?

Btw. looking forward to OsCon workshop!!! :-)

Improve Cassandra deployment for Kubernetes / OpenShift

During the initial Docker/Kubernetes/OpenShift , we took some shortcuts regarding Cassandra and they need to be sorted out.

Points to solve:

Use a more appropriate Cassandra Docker image
Wrap the Docker entry point to run a nodetool drain on shutdown, as suggested by @jsanda
Figure out the best approach regarding the replication strategy and datacenter configuration

Strange behavior for output logs

I try to add JSON content to logs like that :

span.LogFields(log.String("get_platform", string(resp)))

[...]

data := string(resp)
glog.V(0).Infof("Data: %s", data)
span.LogFields(log.String("response", data))

Output logs are :

Data: [{"biosUUID": "422523e2-4bf4-917f-a3ba-e65574cc18ef", "description": "4ef78da2-3552-11e7-99f4-0800270a9ecb", "disks": [{"capacity": 10, 
[...]

On UI, first log (get_platform) seems to be correctly printed, with key = "get_platform", and a value in JSON.
But second and third logs, "response" and "virtual_machines", key are "event" and it's not JSON content but string as values.
I don't really understand the different behavior with the same code. Only responses are differents.

Any idea how can't i fix that ?
Thanks

Investigate adding a new tag index that has both operation name and service name.

Verify that all packages have tests

When a package has .go files but no test files, it is not counted towards the code coverage, giving misleading total. The proposal is to do something similar to yarpc/yarpc-go#1043:

we already use .nocover files to skip packages with this file from running coverage
we should add another validation script that will fail if a package has .go files, no .nocover file, and has no tests

Provide a single Docker image to run full backend for local testing

Desired workflow:

start docker image
run hotrod
see traces in the UI

Action items:

Dockerfile for a single image running all backend and UI
- Consolidate all default ports into a single range
Add integration tests that validate the docker image is actually operational
- curl the query service with debug flag, it should create a trace
- curl it again looking for that trace by correlation ID
- curl UI endpoint for trace HTML and maybe do a dumb string search for the correlation ID
Publish the image to DockerHub

Create integration tests for zipkin.thrift over http

Add integration tests for zipkin.thrift over http using popular zipkin instrumentation libraries.

Change buckets list calculation

Currently buckets is a hardcode const string - (0,1,2,3,4,5,6,7,8,9).

This should be calculated at runtime. This will modify some queries and the new query behavior will need to be tested.

Provide docker image for agent, collector, and query service

Assuming there is a cassandra instance we can point to, we need to provide working docker images for:

The Jaeger Agent
The Jaeger Collector
The Jaeger Query Service
This task is complete when proof can be provided of all 3 services run, connect to a cassandra instance, and can consume traffic/display UI results.

Support jaeger-debug-id header even when there is an established trace context

Some command line tools like yab are capable of starting a trace before sending the request. They also allow setting additional headers like jaeger-debug-id, but the Jaeger client libs ignore (at least Java client does) it since there is already a trace context.

make sure all clients respect the header regardless of the presence of the trace
add a test for this behavior to end-to-end test

How to start from config file?

Hi, we are doing some research on jaeger for tracing.
Is there a way to boot from a conf file?

seems no way to specify multiple collectors when starting agent?

Hi,

Trying to figure out if we could use Jaeger in our product tracing collecting. So far getting working by setting a single collector via -collector.host-port. It seems the way to setup multiple collectors is by some auto discovery mechanism. But I have no idea how this works? Please help.

Thanks.
-Kui

Adaptive sampling strategies not updated in certain situations

Under certain conditions, the collector might not return adaptive sampling strategies for certain operations which in turn means the operation will continue to use the existing sampling strategy and not be updated.

https://github.com/uber/jaeger-client-go/blob/master/sampler.go#L361 doesn't update existing strategies that are not apart of the passed in strategies.

Issue exists in all clients

How to start Jaeger?

Sorry for hijacking issues but this looks like the only way to contact the team.

I don't know much about go. Could you share some basic details on how to start jaeger daemon/UI?

Thanks in advance!

REST API for collecting spans

REST API will be necessary for collecting span from a web browser.

see jaegertracing/jaeger-client-node#109

Collector always requires Cassandra

I'm trying to run the collector module individually, and it seems it always requires cassandra, even if the -span-storage.type memory option is passed.

$ CGO_ENABLED=0 GOOS=linux installsuffix=cgo go build -o ./cmd/collector/jaeger-collector-linux ./cmd/collector/main.go
$ ./cmd/collector/jaeger-collector-linux -span-storage.type memory
{"level":"fatal","ts":1492096339.4118035,"caller":"collector/main.go:55","msg":"Unabled to set up builder","error":"MemoryStore is not provided","stacktrace":"github.com/uber/jaeger/vendor/go.uber.org/zap.Stack\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/field.go:209\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).check\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:273\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).Fatal\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:208\nmain.main\n\t/home/jpkroehling/Projects/go/src/github.com/uber/jaeger/cmd/collector/main.go:55"}

Changing it to be almost 1-1 with the server bootstrap code from the standalone module, the message becomes:

{"level":"fatal","ts":1492096398.6600523,"caller":"collector/main.go:52","msg":"Unabled to set up builder","error":"Cassandra not configured","stacktrace":"github.com/uber/jaeger/vendor/go.uber.org/zap.Stack\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/field.go:209\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).check\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:273\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).Fatal\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:208\nmain.main\n\t/home/jpkroehling/Projects/go/src/github.com/uber/jaeger/cmd/collector/main.go:52"}

Hotrod ui doesn't launch correctly when using go get

The hotrod UI 404s with the following

go get github.com/uber/jaeger/examples/hotrod && hotrod all

Wire format change

I would like to propose wire format change. There are two things in my mind:

there is uber in headers: uber-trace-Id for span context and baggage prefix uberctx. This does not tell anything about jaeger. This could be changed to jaeger-
parentId is redundant when using the new model

The change will be problematic due to a large deployment of mixed services (jaeger and zipkin thrift used simultaneously).

I'm not sure how this could be done in the safest manner. Maybe implement new default inejector/extractor which would first extract/inject old and new headers and then after some time remove old headers.

This is also related to #121

jaeger-agent zipkin compatibility

The jaeger-agent exposes the Zipkin thrift service only on a UDP port. Because thrift doesn't support a UDP transport out of a box it makes using Jaeger for Zipkin instrumented services difficult. It also makes migrating from Zipkin to Jaeger a nonstarter.

We should explore adding a HTTP or TCP thrift Zipkin receiver to jaeger-agent for compatibility and have tests with existing Zipkin instrumentation. Alternatively, we can refactor Zipkin conversions to a separate agent to reduce complexity in jaeger-agent

Make target for recreating mocks in the storage package and sub-packages

As storage interfaces are modified, we will require a make target for regenerating mocks.

Some flags seem to be unused

e.g. log-level and runtime-metrics-frequency in flags.go.

Is this project stalled?

It has been 11 days since last updated.

Support Elasticsearch as additional backend

is there any plan to support Elasticsearch as additional backend? I will be useful for those who want to leverage there existing elasticsearch cluster. Zipkins and Hawkular APM uses elasticsearch as backend.

Change auto-flush time interval in all clients to 1sec

For some reason we set it to 10 seconds in some libs. This is an umbrella issue to check that all clients are sending at least every second (they will send more often if the data exceeds 64k)

Modify the limit behavior in our queries

Since reverting away from SASI indices, we need to artificially increase our limit number during queries and then shed off excessive traceIDs after results from cassandra.

HotRod - traces can't be seen on the UI

It seems I got "everything" running, except that I see a message like no peers available on the standalone logs.

The server has been started like this: go run cmd/standalone/main.go -span-storage.type memory -query.port 3001
The UI has been started with npm start
And the example with cd examples/hotrod/ ; go run main.go all

After switching the LogSpans to true and clicking on Japanese Deserts, I can see this on the example's logs:

[I] 2017-04-04T17:20:54Z HTTP request received service=frontend method=GET url=/dispatch?customer=731&nonse=0.29873183383130686
[I] 2017-04-04T17:20:54Z Getting customer service=frontend component=customer_client customer_id=731
[I] 2017-04-04T17:20:54Z HTTP request received service=customer method=GET url=/customer?customer=731
[I] 2017-04-04T17:20:54Z Loading customer service=customer component=mysql customer_id=731
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:598cecebbc62b0d6:221186770f1b11e7:1 service=customer
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:221186770f1b11e7:5c56cc1caa97d99d:1 service=customer
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:221186770f1b11e7:5c56cc1caa97d99d:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:5c56cc1caa97d99d:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found customer service=frontend customer=&{ID:731 Name:Japanese Deserts Location:728,326}
[I] 2017-04-04T17:20:54Z Finding nearest drivers service=frontend component=driver_client location=728,326
[I] 2017-04-04T17:20:54Z Searching for nearby drivers service=driver location=728,326
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 1
count till error 0
[E] 2017-04-04T17:20:54Z redis timeout service=driver driver_id=T752378C error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[E] 2017-04-04T17:20:54Z Retrying GetDriver after error service=driver retry_no=1 error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 4
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 3
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 1
count till error 0
[E] 2017-04-04T17:20:54Z redis timeout service=driver driver_id=T726422C error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[E] 2017-04-04T17:20:54Z Retrying GetDriver after error service=driver retry_no=1 error=redis timeout
count till error 4
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 3
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Search successful service=driver num_drivers=10
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found drivers service=frontend drivers=[{DriverID:T708733C Location:464,677} {DriverID:T780476C Location:523,750} {DriverID:T752378C Location:226,57} {DriverID:T761341C Location:719,409} {DriverID:T787199C Location:780,824} {DriverID:T745302C Location:897,908} {DriverID:T726422C Location:699,836} {DriverID:T713475C Location:4,833} {DriverID:T748433C Location:784,369} {DriverID:T743274C Location:603,664}]
count till error 1
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=226,57 dropoff=728,326
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=464,677 dropoff=728,326
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=523,750 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=464%2C677
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=523%2C750
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=226%2C57
+2.189036e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:687b9b4a7623cadb:630382228a29730c:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:687b9b4a7623cadb:630382228a29730c:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:630382228a29730c:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=719,409 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=719%2C409
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:114d4d6403d035ab:29cbce8d0c38fc9e:1 service=route
+2.000000e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:114d4d6403d035ab:29cbce8d0c38fc9e:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29cbce8d0c38fc9e:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=780,824 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=780%2C824
+6.719845e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2ebf99de852ba41:3c20c4208c588fc1:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2ebf99de852ba41:3c20c4208c588fc1:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:3c20c4208c588fc1:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=897,908 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=897%2C908
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29566611ca466286:69cbfbac4eb59726:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29566611ca466286:69cbfbac4eb59726:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:69cbfbac4eb59726:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=699,836 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=699%2C836
+6.614381e+000
+3.800338e+000
+7.494734e+000
+8.702753e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:66957a3805a13c83:475bf32b7d802d8b:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:66957a3805a13c83:475bf32b7d802d8b:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:475bf32b7d802d8b:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=4,833 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=4%2C833
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:614d46e0a8120d8e:3680afd13cd2692c:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:614d46e0a8120d8e:3680afd13cd2692c:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:3680afd13cd2692c:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=784,369 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=784%2C369
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:6c76f6eb56c42165:36e83173bdfa4604:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:6c76f6eb56c42165:36e83173bdfa4604:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:36e83173bdfa4604:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=603,664 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=603%2C664
+5.915291e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2965df47a1576fe6:12c1f6e4f2ee0c6d:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2965df47a1576fe6:12c1f6e4f2ee0c6d:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:12c1f6e4f2ee0c6d:49db0efb135614c6:1 service=frontend
+5.578001e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:47b1c5655a3e34ce:34dc248a89e3d5f0:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:47b1c5655a3e34ce:34dc248a89e3d5f0:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:34dc248a89e3d5f0:49db0efb135614c6:1 service=frontend
+3.835140e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:13a899ccba5f141a:9383933e7be8001:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:13a899ccba5f141a:9383933e7be8001:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:9383933e7be8001:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found routes service=frontend routes=[{driver:T708733C route:0xc42047d410 err:<nil>} {driver:T780476C route:0xc420239a10 err:<nil>} {driver:T752378C route:0xc42014c960 err:<nil>} {driver:T787199C route:0xc4202f7ad0 err:<nil>} {driver:T726422C route:0xc4202f7f50 err:<nil>} {driver:T761341C route:0xc42047dad0 err:<nil>} {driver:T745302C route:0xc420176420 err:<nil>} {driver:T713475C route:0xc42016c480 err:<nil>} {driver:T748433C route:0xc42014d980 err:<nil>} {driver:T743274C route:0xc4201772f0 err:<nil>}]
[I] 2017-04-04T17:20:54Z Dispatch successful service=frontend driver=T708733C eta=2m0s
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:49db0efb135614c6:0:1 service=frontend

And this can be seen on the standalone's logs:

{"level":"error","ts":1491326454.9346092,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.934651,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9348862,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.935086,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9354107,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9355452,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326455.935414,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326455.9359097,"msg":"Could not submit zipkin batch","error":"no peers available"}

I'm not sure the error message is expected, but even if it is, I can't seem to be able to see the traces on the standalone's UI (and the service select box is empty).

Document HTTP endpoints in query and collector services

We don't really define our API anywhere, and we should do so. This is probably a good issue to bring up the idea of using swagger for self-documentation.

[storage] limitations of Cassandra search on LIMIT and complex queries

When querying for traces using serviceName, operationName and a tag with the default LIMIT of 20, some results might be omitted.

This is because of this logic which does the following:

Retrieve all traceIDs matching the operation name
Retrieve all traceIDs matching tags
Intersect 1 & 2

Because Cassandra doesn't guarantee ordering, this could eliminate results.

I propose that we do the following instead (or in addition to what we do now),

Retrieve all traceIds matching tags
Filter by operation name

The reason for retrieving traceIds matching tags first targets the use case when somebody is searching for a jaeger-debug-id or some other tag with low cardinality, guaranteeing them a result when it exists.

Documentation site organization

Consider using custom theme for readthedocs like http://rexray.readthedocs.io/ (via mkdocs)

[Experimental] Provide GraphQL query service

Instead of an arbitrary "RESTful" HTTP API service, I think a better solution would be to implement the query service over GraphQL.

Here is my sample implementation of what the service would like:

https://launchpad.graphql.com/wxn5zk8mz
HTTP Endpoint: https://wxn5zk8mz.lp.gql.zone/graphql

Some of the benefits:

Clients can request what they want, resulting in smaller payloads and less unused data. Really beneficial in the list view of traces, we don't need to guess what a client needs as designs/needs may change.
Documentation for free (via introspection) and strongly typed API backed by a spec: https://facebook.github.io/graphql/
A descriptive data model to be more expressive with complex queries. (more robust the URL encoded params)
GraphQL queries can be issues via HTTP, TChannel, or any communication protocol.
Simple computed fields and object type relationships

This will help solve the following issues as well:
https://github.com/uber/jaeger/issues/158
https://github.com/uber/jaeger/issues/123

Migrating from Zipkin Kafka stream

hi, we currently have a kafka cluster which stores zipkin spans. Does jaeger collector support consume zipkin data from kafka and transform to jaeger data model into cassandra?

Save baggage in span

Whenever a baggage item is set, we would like to record the baggage inside the span where it was set. This allows increased visibility into which service set the baggage at what point in the call graph.

Go jaegertracing/jaeger-client-go#153
Java jaegertracing/jaeger-client-java#189
Python jaegertracing/jaeger-client-python#54
Node jaegertracing/jaeger-client-node#129

Guaranteed throughput samplers should report the actual lower bound in the tags

Currently several implementations are reporting tags sampling.type=lowerbound, sampling.param=$rate when lower bound sampler is triggered, where $rate is the sampling rate of the probabilistic sampler. We should change it to report the actual lower bound value, for consistency.

Node client
Go client
Python client
Java client

Provide example cassandra3 schema

There does not exist a cql file that fits the profile of the existing cassandra/spanstore package logic.

Large traces cause problems

Currently jaeger has problems with traces which exceed few hundred spans.
I noticed following:

searching for traces fetches very large payloads - basically /api/traces returns all matching traces along with spans, tags and logs, but the UI shows only a summary like number of spans per service, duration, etc. The search response should contain only the details required to render the screen. Currently the UI becomes very sluggish when the search result contains a large trace or does not render the results at all.
showing trace view for a very large trace has similar performance issues - the response payload is very large, the UI is sluggish or does not render.
- In this case it would make sense to only fetch some part of the trace duration - e.g. first 200 spans. Or top 200 spans when counting distance from the root span. Some way to expand additional spans could allow to gradually navigate the trace. Or maybe some span filters would do the job.
- another idea that comes to mind would involve dedicating some bits in the span id for "distance from root" to make it possible to search for the top spans efficiently
- one more way to improve it would be to fetch spans without tags/log and fetch those only when "tags" or "logs" section on a span is "expanded"
paylods for large traces are pretty large (10MB, 100MB and more) and the browser has problems with processing them

I was able to view traces that had more than 1k spans (it was laggy and took some time), but not ~5k.

The traces I encountered were usually created by long iterative processes - e.g. recalculating something on thousands of records.
Another more pathological reason is bad communication design - an example would be a process which emits thousands of messages instead of using some batching.

Related post on jaeger-tracing group: here

Default logging shouldn't be sampled

Logs are sampled, and any more than ~100 logs lines per second would be dropped.
Change this so that all logs are output.

Span context key and baggage prefix using 'uber'

Wondering if the span context key and baggage prefix used for propagation could be changed from using 'uber' to being 'jaeger'?

For example:
https://github.com/uber/jaeger-client-node/blob/master/src/constants.js#L74-L77
https://github.com/uber/jaeger-client-java/blob/master/jaeger-core/src/main/java/com/uber/jaeger/propagation/TextMapCodec.java#L37-L40

UI from standalone image does not work

UI from all-in-one image fails to start.

docker run -it --rm --network=host jaegertracing/all-in-one

Stack trace from browser:

main.7c74e600.js:2 Uncaught Error: React.PropTypes type checking code is stripped in production.
    at r (main.7c74e600.js:2)
    at Object.r (main.7c74e600.js:59)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:25)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)

Where can I find "schema.cql" file ?

hello , I want to run the "Jaeger" locally , but I can not fing "schema.cql" .

Provide a publisher mechanism to pass stored span data to a HTTP endpoint for post-processing

Provide a mechanism for distributing reported span data from the Jaeger server. This can be used to trigger post processing of the tracing information.

As discussed here: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/jaeger-tracing/2ClHbZ3BwBg/tnGVfmQbBwAJ

near-zero intrusive tracing for golang

Dapper paper explains near-zero intrusive tracing for language like c++, java. For golang, what I know is passing down the tracing needed info via context. This means the context has to be explicitly passed down, without any interruption.This makes migrating old code base to tracing difficult, as there is no simple way to enforce that context is correctly passing down. Any suggestion on better way to achieve this?