Giter Club home page Giter Club logo

jaeger's Introduction

Stand With Ukraine

Slack chat Project+Community stats Unit Tests Coverage Status FOSSA Status OpenSSF Scorecard OpenSSF Best Practices CLOMonitor Artifact Hub

Jaeger - a Distributed Tracing System

graph TD
    SDK["OpenTelemetry SDK"] --> |HTTP or gRPC| COLLECTOR
    COLLECTOR["Jaeger Collector"] --> STORE[Storage]
    COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
    COLLECTOR --> |gRPC/sampling| SDK
    PLUGIN --> STORE
    QUERY[Jaeger Query Service] --> STORE
    QUERY --> |gRPC| PLUGIN
    UI[Jaeger UI] --> |HTTP| QUERY
    subgraph Application Host
        subgraph User Application
            SDK
        end
    end
Loading

Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing platform created by Uber Technologies and donated to Cloud Native Computing Foundation. It can be used for monitoring microservices-based distributed systems:

  • Distributed context propagation
  • Distributed transaction monitoring
  • Root cause analysis
  • Service dependency analysis
  • Performance / latency optimization

See also:

Jaeger is hosted by the Cloud Native Computing Foundation (CNCF) as the 7th top-level project (graduated in October 2019). If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Jaeger plays a role, read the CNCF Jaeger incubation announcement and Jaeger graduation announcement.

Get Involved

Jaeger is an open source project with open governance. We welcome contributions from the community, and we would love your help to improve and extend the project. Here are some ideas for how to get involved. Many of them do not even require any coding.

Features

High Scalability

Jaeger backend is designed to have no single points of failure and to scale with the business needs. For example, any given Jaeger installation at Uber is typically processing several billions of spans per day.

Relationship with OpenTelemetry

The Jaeger and OpenTelemetry projects have different goals. OpenTelemetry aims to provide APIs and SDKs in multiple languages to allow applications to export various telemetry data out of the process, to any number of metrics and tracing backends. The Jaeger project is primarily the tracing backend that receives tracing telemetry data and provides processing, aggregation, data mining, and visualizations of that data. For more information please refer to a blog post Jaeger and OpenTelemetry.

Jaeger was originally designed to support the OpenTracing standard. The terminology is still used in Jaeger UI, but the concepts have direct mapping to the OpenTelemetry data model of traces.

Capability OpenTracing concept OpenTelemetry concept
Represent traces as directed acyclic graphs (not just trees) span references span links
Strongly typed span attributes span tags span attributes
Strongly typed events/logs span logs span events

Jaeger project recommends OpenTelemetry SDKs for instrumentation, instead of now-deprecated Jaeger SDKs.

Multiple storage backends

Jaeger can be used with a growing a number of storage backends:

  • It natively supports two popular open source NoSQL databases as trace storage backends: Cassandra and Elasticsearch.
  • It integrates via a gRPC API with other well known databases that have been certified to be Jaeger compliant: TimescaleDB via Promscale, ClickHouse.
  • There is embedded database support using Badger and simple in-memory storage for testing setups.
  • ScyllaDB can be used as a drop-in replacement for Cassandra since it uses the same data model and query language.
  • There are ongoing community experiments using other databases, such as InfluxDB, Amazon DynamoDB, YugabyteDB(YCQL).

Modern Web UI

Jaeger Web UI is implemented in Javascript using popular open source frameworks like React. Several performance improvements have been released in v1.0 to allow the UI to efficiently deal with large volumes of data and to display traces with tens of thousands of spans (e.g. we tried a trace with 80,000 spans).

Cloud Native Deployment

Jaeger backend is distributed as a collection of Docker images. The binaries support various configuration methods, including command line options, environment variables, and configuration files in multiple formats (yaml, toml, etc.).

The recommended way to deploy Jaeger in a production Kubernetes cluster is via the Jaeger Operator.

The Jaeger Operator provides a CLI to generate Kubernetes manifests from the Jaeger CR. This can be considered as an alternative source over plain Kubernetes manifest files.

The Jaeger ecosystem also provides a Helm chart as an alternative way to deploy Jaeger.

Observability

All Jaeger backend components expose Prometheus metrics by default (other metrics backends are also supported). Logs are written to standard out using the structured logging library zap.

Security

Third-party security audits of Jaeger are available in https://github.com/jaegertracing/security-audits. Please see Issue #1718 for the summary of available security mechanisms in Jaeger.

Backwards compatibility with Zipkin

Although we recommend instrumenting applications with OpenTelemetry, if your organization has already invested in the instrumentation using Zipkin libraries, you do not have to rewrite all that code. Jaeger provides backwards compatibility with Zipkin by accepting spans in Zipkin formats (Thrift or JSON v1/v2) over HTTP. Switching from Zipkin backend is just a matter of routing the traffic from Zipkin libraries to the Jaeger backend.

Version Compatibility Guarantees

Occasionally, CLI flags can be deprecated due to, for example, usability improvements or new functionality. In such situations, developers introducing the deprecation are required to follow these guidelines.

In short, for a deprecated CLI flag, you should expect to see the following message in the --help documentation:

(deprecated, will be removed after yyyy-mm-dd or in release vX.Y.Z, whichever is later)

A grace period of at least 3 months or two minor version bumps (whichever is later) from the first release containing the deprecation notice will be provided before the deprecated CLI flag can be deleted.

For example, consider a scenario where v1.28.0 is released on 01-Jun-2021 containing a deprecation notice for a CLI flag. This flag will remain in a deprecated state until the later of 01-Sep-2021 or v1.30.0 where it can be removed on or after either of those events. It may remain deprecated for longer than the aforementioned grace period.

Go Version Compatibility Guarantees

The Jaeger project attempts to track the currently supported versions of Go, as defined by the Go team. Removing support for an unsupported Go version is not considered a breaking change.

Starting with the release of Go 1.21, support for Go versions will be updated as follows:

  1. Soon after the release of a new Go minor version N, updates will be made to the build and tests steps to accommodate the latest Go minor version.
  2. Soon after the release of a new Go minor version N, support for Go version N-2 will be removed and version N-1 will become the minimum required version.

Related Repositories

Documentation

Instrumentation Libraries

Jaeger project recommends OpenTelemetry SDKs for instrumentation, instead of Jaeger's native SDKs that are now deprecated.

Deployment

Components

Building From Source

See CONTRIBUTING.

Contributing

See CONTRIBUTING.

Thanks to all the people who already contributed!

Maintainers

Rules for becoming a maintainer are defined in the GOVERNANCE document. Below are the official maintainers of the Jaeger project. Please use @jaegertracing/jaeger-maintainers to tag them on issues / PRs.

Some repositories under jaegertracing org have additional maintainers.

Emeritus Maintainers

We are grateful to our former maintainers for their contributions to the Jaeger project.

Project Status Meetings

The Jaeger maintainers and contributors meet regularly on a video call. Everyone is welcome to join, including end users. For meeting details, see https://www.jaegertracing.io/get-in-touch/.

Roadmap

See https://www.jaegertracing.io/docs/roadmap/

Get in Touch

Have questions, suggestions, bug reports? Reach the project community via these channels:

Adopters

Jaeger as a product consists of multiple components. We want to support different types of users, whether they are only using our instrumentation libraries or full end to end Jaeger installation, whether it runs in production or you use it to troubleshoot issues in development.

Please see ADOPTERS.md for some of the organizations using Jaeger today. If you would like to add your organization to the list, please comment on our survey issue.

License

Copyright (c) The Jaeger Authors. Apache 2.0 License.

jaeger's People

Contributors

afzal442 avatar akagami-harsh avatar albertteoh avatar annanay25 avatar ashmita152 avatar badiib avatar black-adder avatar bobrik avatar burmanm avatar davit-y avatar dependabot[bot] avatar esnible avatar flamingsaint avatar guo0693 avatar hellspawn679 avatar isaachier avatar james-ryans avatar jkowall avatar joe-elliott avatar jpkrohling avatar ledor473 avatar mh-park avatar mmorel-35 avatar objectiser avatar pavolloffay avatar pushkarm029 avatar renovate-bot avatar rubenvp8510 avatar vprithvi avatar yurishkuro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jaeger's Issues

adaptive sampling

Hi,

Is the adaptive sampling inside collector supported? I could not figure out how I could set/change sampling strategy via collector. Please help.

Thanks.
-kui

Revisit Duration indexing and querying logic

Currently we are using an archaic methodology for storing and querying against duration (against hourly buckets that are not timestamps). We should revisit this and fix up both the code and the cassandra schema

Hotrod example doesn't work

  • Cloned the jaeger repo
  • make install_examples -- worked
  • cd examples/hotrod -- worked
  • go run ./main.go all -- failed

/Users/Chris/go/src/github.com/uber/jaeger/examples/hotrod/cmd/customer.go:28:2: code in directory /Users/Chris/go/src/github.com/uber-go/zap expects import "go.uber.org/zap"

Not familiar with Go so could be missing something totally obvious ... but thought i would log this as an issue as other noobs like me will likely hit this too.

jaeger-collector does not respect flag value 0.

Current Behavior

jaeger-collector stores spans with span context flags set to 0.

Expected Behavior

jaeger-collector should not store spans with span contexts where the flag value is 0 because these spans failed the sampling tests.
This protects the jaeger-collector from buggy clients that assume that the collector only stores sampled spans.

Reproduction steps

To reproduce, build jaeger, and run locally as such

make build_ui
cd cmd/standalone/
go run ./main.go --span-storage.type=memory --query.static-files=../../jaeger-ui-build/build/

Then use yab to make a request as such. Use jaeger.thrift from here.

yab -A trace-id:9 -y payload.yaml

payload.yaml contains the following

---
method: "Collector::submitBatches"
peer: "localhost:14267"
request:
  batches:
    -
      process:
        service_name: flags-test
        tags:
          -
            key: jaeger.hostname
            vStr: localhost
            vType: string
      spans:
        -
          duration: 1294
          flags: 0
          operation_name: arbitrary
          parent_span_id: 0
          span_id: 1
          start_time: 1494509316000000
          tags:
            -
              key: span.kind
              vStr: server
              vType: string
          trace_id_high: 0
          trace_id_low: "${trace-id:1}"
service: jaeger-collector
thrift: jaeger.thrift

Note that flags is set to 0, yet the trace is retrievable using the UI at http://localhost:16686/trace/9

Jaeger agent does not fit serverless FAAS (AWS lambda etc)

The pattern of using Jaeger Agents runnning on the same host does not fit (our) usecase of running (python) functions as FAAS in AWS Lambda.
The requirement of running a seperate process is a showstopper.
Is it possible to have the agent running in another host? or in some work-around this issue in some way?

Btw. looking forward to OsCon workshop!!! :-)

Improve Cassandra deployment for Kubernetes / OpenShift

During the initial Docker/Kubernetes/OpenShift , we took some shortcuts regarding Cassandra and they need to be sorted out.

Points to solve:

  • Use a more appropriate Cassandra Docker image
  • Wrap the Docker entry point to run a nodetool drain on shutdown, as suggested by @jsanda
  • Figure out the best approach regarding the replication strategy and datacenter configuration

Strange behavior for output logs

I try to add JSON content to logs like that :

span.LogFields(log.String("get_platform", string(resp)))

[...]

data := string(resp)
glog.V(0).Infof("Data: %s", data)
span.LogFields(log.String("response", data))

Output logs are :

Data: [{"biosUUID": "422523e2-4bf4-917f-a3ba-e65574cc18ef", "description": "4ef78da2-3552-11e7-99f4-0800270a9ecb", "disks": [{"capacity": 10, 
[...]

On UI, first log (get_platform) seems to be correctly printed, with key = "get_platform", and a value in JSON.
But second and third logs, "response" and "virtual_machines", key are "event" and it's not JSON content but string as values.
I don't really understand the different behavior with the same code. Only responses are differents.

Any idea how can't i fix that ?
Thanks
2017-05-23-154644_1263x548_scrot

Verify that all packages have tests

When a package has .go files but no test files, it is not counted towards the code coverage, giving misleading total. The proposal is to do something similar to yarpc/yarpc-go#1043:

  • we already use .nocover files to skip packages with this file from running coverage
  • we should add another validation script that will fail if a package has .go files, no .nocover file, and has no tests

Provide a single Docker image to run full backend for local testing

Desired workflow:

  1. start docker image
  2. run hotrod
  3. see traces in the UI

Action items:

  • Dockerfile for a single image running all backend and UI
    • Consolidate all default ports into a single range
  • Add integration tests that validate the docker image is actually operational
    • curl the query service with debug flag, it should create a trace
    • curl it again looking for that trace by correlation ID
    • curl UI endpoint for trace HTML and maybe do a dumb string search for the correlation ID
  • Publish the image to DockerHub

Change buckets list calculation

Currently buckets is a hardcode const string - (0,1,2,3,4,5,6,7,8,9).

This should be calculated at runtime. This will modify some queries and the new query behavior will need to be tested.

Provide docker image for agent, collector, and query service

Assuming there is a cassandra instance we can point to, we need to provide working docker images for:

  • The Jaeger Agent
  • The Jaeger Collector
  • The Jaeger Query Service
  • This task is complete when proof can be provided of all 3 services run, connect to a cassandra instance, and can consume traffic/display UI results.

Support jaeger-debug-id header even when there is an established trace context

Some command line tools like yab are capable of starting a trace before sending the request. They also allow setting additional headers like jaeger-debug-id, but the Jaeger client libs ignore (at least Java client does) it since there is already a trace context.

  • make sure all clients respect the header regardless of the presence of the trace
  • add a test for this behavior to end-to-end test

seems no way to specify multiple collectors when starting agent?

Hi,

Trying to figure out if we could use Jaeger in our product tracing collecting. So far getting working by setting a single collector via -collector.host-port. It seems the way to setup multiple collectors is by some auto discovery mechanism. But I have no idea how this works? Please help.

Thanks.
-Kui

Adaptive sampling strategies not updated in certain situations

Under certain conditions, the collector might not return adaptive sampling strategies for certain operations which in turn means the operation will continue to use the existing sampling strategy and not be updated.

https://github.com/uber/jaeger-client-go/blob/master/sampler.go#L361 doesn't update existing strategies that are not apart of the passed in strategies.

Issue exists in all clients

How to start Jaeger?

Sorry for hijacking issues but this looks like the only way to contact the team.

I don't know much about go. Could you share some basic details on how to start jaeger daemon/UI?

Thanks in advance!

Collector always requires Cassandra

I'm trying to run the collector module individually, and it seems it always requires cassandra, even if the -span-storage.type memory option is passed.

$ CGO_ENABLED=0 GOOS=linux installsuffix=cgo go build -o ./cmd/collector/jaeger-collector-linux ./cmd/collector/main.go
$ ./cmd/collector/jaeger-collector-linux -span-storage.type memory
{"level":"fatal","ts":1492096339.4118035,"caller":"collector/main.go:55","msg":"Unabled to set up builder","error":"MemoryStore is not provided","stacktrace":"github.com/uber/jaeger/vendor/go.uber.org/zap.Stack\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/field.go:209\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).check\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:273\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).Fatal\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:208\nmain.main\n\t/home/jpkroehling/Projects/go/src/github.com/uber/jaeger/cmd/collector/main.go:55"}

Changing it to be almost 1-1 with the server bootstrap code from the standalone module, the message becomes:

{"level":"fatal","ts":1492096398.6600523,"caller":"collector/main.go:52","msg":"Unabled to set up builder","error":"Cassandra not configured","stacktrace":"github.com/uber/jaeger/vendor/go.uber.org/zap.Stack\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/field.go:209\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).check\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:273\ngithub.com/uber/jaeger/vendor/go.uber.org/zap.(*Logger).Fatal\n\t/mnt/storage/jpkroehling/Projects/go/src/github.com/uber/jaeger/vendor/go.uber.org/zap/logger.go:208\nmain.main\n\t/home/jpkroehling/Projects/go/src/github.com/uber/jaeger/cmd/collector/main.go:52"}

Wire format change

I would like to propose wire format change. There are two things in my mind:

  • there is uber in headers: uber-trace-Id for span context and baggage prefix uberctx. This does not tell anything about jaeger. This could be changed to jaeger-
  • parentId is redundant when using the new model

The change will be problematic due to a large deployment of mixed services (jaeger and zipkin thrift used simultaneously).

I'm not sure how this could be done in the safest manner. Maybe implement new default inejector/extractor which would first extract/inject old and new headers and then after some time remove old headers.

This is also related to #121

jaeger-agent zipkin compatibility

The jaeger-agent exposes the Zipkin thrift service only on a UDP port. Because thrift doesn't support a UDP transport out of a box it makes using Jaeger for Zipkin instrumented services difficult. It also makes migrating from Zipkin to Jaeger a nonstarter.

We should explore adding a HTTP or TCP thrift Zipkin receiver to jaeger-agent for compatibility and have tests with existing Zipkin instrumentation. Alternatively, we can refactor Zipkin conversions to a separate agent to reduce complexity in jaeger-agent

Support Elasticsearch as additional backend

is there any plan to support Elasticsearch as additional backend? I will be useful for those who want to leverage there existing elasticsearch cluster. Zipkins and Hawkular APM uses elasticsearch as backend.

Modify the limit behavior in our queries

Since reverting away from SASI indices, we need to artificially increase our limit number during queries and then shed off excessive traceIDs after results from cassandra.

HotRod - traces can't be seen on the UI

It seems I got "everything" running, except that I see a message like no peers available on the standalone logs.

The server has been started like this: go run cmd/standalone/main.go -span-storage.type memory -query.port 3001
The UI has been started with npm start
And the example with cd examples/hotrod/ ; go run main.go all

After switching the LogSpans to true and clicking on Japanese Deserts, I can see this on the example's logs:

[I] 2017-04-04T17:20:54Z HTTP request received service=frontend method=GET url=/dispatch?customer=731&nonse=0.29873183383130686
[I] 2017-04-04T17:20:54Z Getting customer service=frontend component=customer_client customer_id=731
[I] 2017-04-04T17:20:54Z HTTP request received service=customer method=GET url=/customer?customer=731
[I] 2017-04-04T17:20:54Z Loading customer service=customer component=mysql customer_id=731
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:598cecebbc62b0d6:221186770f1b11e7:1 service=customer
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:221186770f1b11e7:5c56cc1caa97d99d:1 service=customer
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:221186770f1b11e7:5c56cc1caa97d99d:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:5c56cc1caa97d99d:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found customer service=frontend customer=&{ID:731 Name:Japanese Deserts Location:728,326}
[I] 2017-04-04T17:20:54Z Finding nearest drivers service=frontend component=driver_client location=728,326
[I] 2017-04-04T17:20:54Z Searching for nearby drivers service=driver location=728,326
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 1
count till error 0
[E] 2017-04-04T17:20:54Z redis timeout service=driver driver_id=T752378C error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[E] 2017-04-04T17:20:54Z Retrying GetDriver after error service=driver retry_no=1 error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 4
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 3
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 1
count till error 0
[E] 2017-04-04T17:20:54Z redis timeout service=driver driver_id=T726422C error=redis timeout
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[E] 2017-04-04T17:20:54Z Retrying GetDriver after error service=driver retry_no=1 error=redis timeout
count till error 4
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 3
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
count till error 2
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Search successful service=driver num_drivers=10
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=driver
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:447d52ceebdcc70:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found drivers service=frontend drivers=[{DriverID:T708733C Location:464,677} {DriverID:T780476C Location:523,750} {DriverID:T752378C Location:226,57} {DriverID:T761341C Location:719,409} {DriverID:T787199C Location:780,824} {DriverID:T745302C Location:897,908} {DriverID:T726422C Location:699,836} {DriverID:T713475C Location:4,833} {DriverID:T748433C Location:784,369} {DriverID:T743274C Location:603,664}]
count till error 1
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=226,57 dropoff=728,326
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=464,677 dropoff=728,326
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=523,750 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=464%2C677
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=523%2C750
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=226%2C57
+2.189036e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:687b9b4a7623cadb:630382228a29730c:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:687b9b4a7623cadb:630382228a29730c:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:630382228a29730c:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=719,409 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=719%2C409
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:114d4d6403d035ab:29cbce8d0c38fc9e:1 service=route
+2.000000e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:114d4d6403d035ab:29cbce8d0c38fc9e:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29cbce8d0c38fc9e:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=780,824 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=780%2C824
+6.719845e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2ebf99de852ba41:3c20c4208c588fc1:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2ebf99de852ba41:3c20c4208c588fc1:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:3c20c4208c588fc1:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=897,908 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=897%2C908
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29566611ca466286:69cbfbac4eb59726:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:29566611ca466286:69cbfbac4eb59726:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:69cbfbac4eb59726:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=699,836 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=699%2C836
+6.614381e+000
+3.800338e+000
+7.494734e+000
+8.702753e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:66957a3805a13c83:475bf32b7d802d8b:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:66957a3805a13c83:475bf32b7d802d8b:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:475bf32b7d802d8b:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=4,833 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=4%2C833
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:614d46e0a8120d8e:3680afd13cd2692c:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:614d46e0a8120d8e:3680afd13cd2692c:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:3680afd13cd2692c:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=784,369 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=784%2C369
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:6c76f6eb56c42165:36e83173bdfa4604:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:6c76f6eb56c42165:36e83173bdfa4604:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:36e83173bdfa4604:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Finding route service=frontend component=route_client pickup=603,664 dropoff=728,326
[I] 2017-04-04T17:20:54Z HTTP request received service=route method=GET url=/route?dropoff=728%2C326&pickup=603%2C664
+5.915291e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2965df47a1576fe6:12c1f6e4f2ee0c6d:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:2965df47a1576fe6:12c1f6e4f2ee0c6d:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:12c1f6e4f2ee0c6d:49db0efb135614c6:1 service=frontend
+5.578001e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:47b1c5655a3e34ce:34dc248a89e3d5f0:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:47b1c5655a3e34ce:34dc248a89e3d5f0:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:34dc248a89e3d5f0:49db0efb135614c6:1 service=frontend
+3.835140e+000
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:13a899ccba5f141a:9383933e7be8001:1 service=route
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:13a899ccba5f141a:9383933e7be8001:1 service=frontend
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:9383933e7be8001:49db0efb135614c6:1 service=frontend
[I] 2017-04-04T17:20:54Z Found routes service=frontend routes=[{driver:T708733C route:0xc42047d410 err:<nil>} {driver:T780476C route:0xc420239a10 err:<nil>} {driver:T752378C route:0xc42014c960 err:<nil>} {driver:T787199C route:0xc4202f7ad0 err:<nil>} {driver:T726422C route:0xc4202f7f50 err:<nil>} {driver:T761341C route:0xc42047dad0 err:<nil>} {driver:T745302C route:0xc420176420 err:<nil>} {driver:T713475C route:0xc42016c480 err:<nil>} {driver:T748433C route:0xc42014d980 err:<nil>} {driver:T743274C route:0xc4201772f0 err:<nil>}]
[I] 2017-04-04T17:20:54Z Dispatch successful service=frontend driver=T708733C eta=2m0s
[I] 2017-04-04T17:20:54Z Reporting span 49db0efb135614c6:49db0efb135614c6:0:1 service=frontend

And this can be seen on the standalone's logs:

{"level":"error","ts":1491326454.9346092,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.934651,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9348862,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.935086,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9354107,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326454.9355452,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326455.935414,"msg":"Could not submit zipkin batch","error":"no peers available"}
{"level":"error","ts":1491326455.9359097,"msg":"Could not submit zipkin batch","error":"no peers available"}

I'm not sure the error message is expected, but even if it is, I can't seem to be able to see the traces on the standalone's UI (and the service select box is empty).

[storage] limitations of Cassandra search on LIMIT and complex queries

When querying for traces using serviceName, operationName and a tag with the default LIMIT of 20, some results might be omitted.

This is because of this logic which does the following:

  1. Retrieve all traceIDs matching the operation name
  2. Retrieve all traceIDs matching tags
  3. Intersect 1 & 2

Because Cassandra doesn't guarantee ordering, this could eliminate results.

I propose that we do the following instead (or in addition to what we do now),

  1. Retrieve all traceIds matching tags
  2. Filter by operation name

The reason for retrieving traceIds matching tags first targets the use case when somebody is searching for a jaeger-debug-id or some other tag with low cardinality, guaranteeing them a result when it exists.

[Experimental] Provide GraphQL query service

Instead of an arbitrary "RESTful" HTTP API service, I think a better solution would be to implement the query service over GraphQL.

Here is my sample implementation of what the service would like:

https://launchpad.graphql.com/wxn5zk8mz
HTTP Endpoint: https://wxn5zk8mz.lp.gql.zone/graphql

Some of the benefits:

  • Clients can request what they want, resulting in smaller payloads and less unused data. Really beneficial in the list view of traces, we don't need to guess what a client needs as designs/needs may change.
  • Documentation for free (via introspection) and strongly typed API backed by a spec: https://facebook.github.io/graphql/
  • A descriptive data model to be more expressive with complex queries. (more robust the URL encoded params)
  • GraphQL queries can be issues via HTTP, TChannel, or any communication protocol.
  • Simple computed fields and object type relationships

This will help solve the following issues as well:
https://github.com/uber/jaeger/issues/158
https://github.com/uber/jaeger/issues/123

Migrating from Zipkin Kafka stream

hi, we currently have a kafka cluster which stores zipkin spans. Does jaeger collector support consume zipkin data from kafka and transform to jaeger data model into cassandra?

Guaranteed throughput samplers should report the actual lower bound in the tags

Currently several implementations are reporting tags sampling.type=lowerbound, sampling.param=$rate when lower bound sampler is triggered, where $rate is the sampling rate of the probabilistic sampler. We should change it to report the actual lower bound value, for consistency.

  • Node client
  • Go client
  • Python client
  • Java client

Large traces cause problems

Currently jaeger has problems with traces which exceed few hundred spans.
I noticed following:

  • searching for traces fetches very large payloads - basically /api/traces returns all matching traces along with spans, tags and logs, but the UI shows only a summary like number of spans per service, duration, etc. The search response should contain only the details required to render the screen. Currently the UI becomes very sluggish when the search result contains a large trace or does not render the results at all.
  • showing trace view for a very large trace has similar performance issues - the response payload is very large, the UI is sluggish or does not render.
    • In this case it would make sense to only fetch some part of the trace duration - e.g. first 200 spans. Or top 200 spans when counting distance from the root span. Some way to expand additional spans could allow to gradually navigate the trace. Or maybe some span filters would do the job.
    • another idea that comes to mind would involve dedicating some bits in the span id for "distance from root" to make it possible to search for the top spans efficiently
    • one more way to improve it would be to fetch spans without tags/log and fetch those only when "tags" or "logs" section on a span is "expanded"
  • paylods for large traces are pretty large (10MB, 100MB and more) and the browser has problems with processing them

I was able to view traces that had more than 1k spans (it was laggy and took some time), but not ~5k.

The traces I encountered were usually created by long iterative processes - e.g. recalculating something on thousands of records.
Another more pathological reason is bad communication design - an example would be a process which emits thousands of messages instead of using some batching.

Related post on jaeger-tracing group: here

UI from standalone image does not work

UI from all-in-one image fails to start.

docker run -it --rm --network=host jaegertracing/all-in-one

Stack trace from browser:

main.7c74e600.js:2 Uncaught Error: React.PropTypes type checking code is stripped in production.
    at r (main.7c74e600.js:2)
    at Object.r (main.7c74e600.js:59)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:25)
    at t (main.7c74e600.js:1)
    at Object.<anonymous> (main.7c74e600.js:45)
    at t (main.7c74e600.js:1)

near-zero intrusive tracing for golang

Dapper paper explains near-zero intrusive tracing for language like c++, java. For golang, what I know is passing down the tracing needed info via context. This means the context has to be explicitly passed down, without any interruption.This makes migrating old code base to tracing difficult, as there is no simple way to enforce that context is correctly passing down. Any suggestion on better way to achieve this?

Thanks.
-Kui

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.