openzipkin / brave Goto Github PK

Java distributed tracing implementation compatible with Zipkin backend services.

License: Apache License 2.0

Java 99.64% Shell 0.36%

zipkin java distributed-tracing tracing zipkin-brave instrumentation openzipkin

brave's Introduction

Brave

Brave is a distributed tracing instrumentation library. Brave typically intercepts production requests to gather timing data, correlate and propagate trace contexts. While typically trace data is sent to Zipkin server, third-party plugins are available to send to alternate services such as Amazon X-Ray.

This repository includes dependency-free Java libraries and instrumentation for common components used in production services. For example, this includes trace filters for Servlet and log correlation for Apache Log4J.

You can look at our example project for how to trace a simple web application.

What's included

Brave's dependency-free tracer library works against JRE6+. This is the underlying api that instrumentation use to time operations and add tags that describe them. This library also includes code that parses X-B3-TraceId headers.

Most users won't write tracing code directly. Rather, they reuse instrumentation others have written. Check our instrumentation and Zipkin's list before rolling your own. Common tracing libraries like JDBC, Servlet and Spring already exist. Instrumentation written here are tested and benchmarked.

If you are trying to trace legacy applications, you may be interested in Spring XML Configuration. This allows you to set up tracing without any custom code.

You may want to put trace IDs into your log files, or change thread local behavior. Look at our context libraries, for integration with tools such as SLF4J.

Version Compatibility policy

All Brave libraries match the minimum Java version of what's being traced or integrated with, and adds no 3rd party dependencies. The goal is to neither impact your projects' choices, nor subject your project to dependency decisions made by others.

For example, even including a basic reporting library, zipkin-sender-urlconnection, Brave transitively includes no json, logging, protobuf or thrift dependency. This means zero concern if your application chooses a specific version of SLF4J, Gson or Guava. Moreover, the entire dependency tree including basic reporting in json, thrift or protobuf is less than 512KiB of jars.

There is a floor Java version of 1.6, which allows older JREs and older Android runtimes, yet may limit some applications. For example, Servlet 2.5 works with Java 1.5, but due to Brave being 1.6, you will not be able to trace Servlet 2.5 applications until you use at least JRE 1.6.

All integrations set their associated library to "provided" scope. This ensures Brave doesn't interfere with the versions you choose.

Some libraries update often which leads to api drift. In some cases, we test versions ranges to reduce the impact of this. For example, we test gRPC and Kafka against multiple library versions.

Artifacts

All artifacts publish to the group ID "io.zipkin.brave". We use a common release version for all components.

Library Releases

Snapshots are uploaded to Sonatype which synchronizes with Maven Central

Library Snapshots

Snapshots are uploaded to Sonatype after commits to master.

Version alignments

When using multiple brave components, you'll want to align versions in one place. This allows you to more safely upgrade, with less worry about conflicts.

You can use our Maven instrumentation BOM (Bill of Materials) for this:

Ex. in your dependencies section, import the BOM like this:

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-bom</artifactId>
        <version>${brave.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>

Now, you can leave off the version when choosing any supported instrumentation. Also, any indirect use will have versions aligned:

<dependency>
  <groupId>io.zipkin.brave</groupId>
  <artifactId>brave-instrumentation-okhttp3</artifactId>
</dependency>

With the above in place, you can use the property brave.version to override dependency versions coherently. This is most commonly to test a new feature or fix.

Note: If you override a version, always double check that your version is valid (equal to or later) than what you are updating. This will avoid class conflicts.

brave's People

Contributors

Stargazers

Watchers

Forkers

lgouger dealerdotcom achun2080 xorlev ryantenney ryangardner cody0755 mattweyant jameswei ssetti pochadri pengkw kong666 henrikno gavinhwa gspandy baibaotoo tiankui626 wadey swordsmanli ddierickx findekano nezhazheng leigu changguanghua shengbinzhou jinhouliu tdrisdelle hpttlook ngbinh xinbinhao wadia java10000 junzhusecurity k-jo srapp klette nero520 flipboard cszhan163 tomtomeng ctroullis wjam ababiuk chakra-coder tinedel pa-media-group chengc017 dzzh leepengg nurkiewicz henrypfhu lizhanhui raunak-a fedor57 cbotiza wgpshashank hugomfernandes charlesakalugwu fysoft2006 raycs078 panda7240 wangshengwangsheng ghidhaoui xxz coffeesweet cbruce rconn01 ccvcd toffentoffen binque zhouyuyong jxqlovejava zhujinfei5151 wuqiangxjtu saul-c oliver-schoenherr pdaniel-frk chrisbono hbyscpp elodina ahmedomarjee synk lemonhall liangfei abesto oeohomos codingfabian hardiku jplock is00hcw anandhravindran dbrenden rafabene mrblack1117 llinder marsyoung mkolbert chemicl yyzz1987431

brave's Issues

Measure epoch micros

@michaelsembwever had this suggestion, which I think we could use in brave

a quick instrumentation to check at what microsecond the cpu counter matches when System.currentTimeMillis() ticks over to a new value. then just offset.

Traced request results in NumberFormatException

I am running 2.4 and have noticed when making a request from an Apache HTTP Client to a Spring MVC server a NumberFormatException occurs.It appears as though the TraceId is being passed from client as hex whilst the server expects a long.

Request Header Debug Log

2014-12-20 16:48:26,509 DEBUG | org.apache.http.headers                                          | http-outgoing-0 >> X-B3-Sampled: true 
2014-12-20 16:48:26,509 DEBUG | org.apache.http.headers                                          | http-outgoing-0 >> X-B3-TraceId: 9d3ce26024765927 
2014-12-20 16:48:26,509 DEBUG | org.apache.http.headers                                          | http-outgoing-0 >> X-B3-SpanId: 9d3ce26024765927 
2014-12-20 16:48:26,509 DEBUG | org.apache.http.headers                                          | http-outgoing-0 >> X-B3-SpanName: <NAME REMOVED>

Traced Apache HTTP Client to Spring MVC results

2014-12-20 16:33:20,345 ERROR | o.a.c.c.C.[.[.[<PATH REMOVED>].[dispatcherServlet]    | Servlet.service() for servlet dispatcherServlet threw exception 
java.lang.NumberFormatException: For input string: "c50c96694912f142"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[na:1.8.0_25]
    at java.lang.Long.parseLong(Long.java:592) ~[na:1.8.0_25]
    at java.lang.Long.valueOf(Long.java:776) ~[na:1.8.0_25]
    at com.github.kristofa.brave.ServletHandlerInterceptor$1.apply(ServletHandlerInterceptor.java:20) ~[brave-impl-spring-2.4.jar:na]
    at com.github.kristofa.brave.ServletHandlerInterceptor$1.apply(ServletHandlerInterceptor.java:17) ~[brave-impl-spring-2.4.jar:na]
    at com.google.common.base.Present.transform(Present.java:71) ~[guava-17.0.jar:na]
    at com.github.kristofa.brave.ServletHandlerInterceptor.updateServerState(ServletHandlerInterceptor.java:84) ~[brave-impl-spring-2.4.jar:na]
    at com.github.kristofa.brave.ServletHandlerInterceptor.beginTrace(ServletHandlerInterceptor.java:78) ~[brave-impl-spring-2.4.jar:na]
    at com.github.kristofa.brave.ServletHandlerInterceptor.preHandle(ServletHandlerInterceptor.java:36) ~[brave-impl-spring-2.4.jar:na]
    at org.springframework.web.servlet.HandlerExecutionChain.applyPreHandle(HandlerExecutionChain.java:130) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:938) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:877) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:966) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:857) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:620) ~[tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:842) ~[spring-webmvc-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) ~[tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) ~[tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) ~[tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:748) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:488) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:411) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:338) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.StandardHostValve.custom(StandardHostValve.java:467) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.StandardHostValve.status(StandardHostValve.java:338) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.StandardHostValve.throwable(StandardHostValve.java:428) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:201) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1070) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:611) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1736) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1695) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_25]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_25]
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-7.0.55.jar:7.0.55]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]

Export ZipkinSpanCollector and SpanProcessingThread metrics

Some examples:

spanQueue size of threadPool
number of active threads
number of log entries per thread
EVENT: number of rejected span submissions due to full queue
EVENT: number of successful span submissions to the thread pool
EVENT: span submission time to the thread pool
EVENT: number of bytes sent per emission of log entries to the collector/scribe
EVENT: span batch emission time to the collector/scribe
EVENT: number of emission errors to the collector/scribe
EVENT: number of successful emissions to the collector/scribe
EVENT: number of lost connections to the collector/scribe

I would suggest having them available via JMX for simplicity (event-based metrics could use max of a sliding window or alternatively send the events to opentsdb or graphite each time the event occurs).

Scope of singleton for ClientTracer and ServerTracer

Hi, does the scope of ClientTracerImpl and ServerTracerImpl instances wired through ClientTracerConfig and ServerTracerConfig be singleton? If they are singleton, isn't it possible that the threads will step on each other?

Thanks,
Sridhar

Setting the service name for a given endpoint

@kristofa
I've integrated brave into one of my Java services. Due to some weird issue, the service name in the traces is set as "Unknown Service Name"

Is there a way to set the service name for the current host ?

HTTP Span Transport

zipkin now supports an http transport, which accepts either a list of thrift TBinary spans, or a json list. The easiest path is likely thrifts.

Here's an example of how this is done in finagle:

// serialize all spans as a thrift list
val serializedSpans = try {
  val transport = new TMemoryBuffer(0)
  val oproto = new TBinaryProtocol(transport)
  oproto.writeListBegin(new TList(TType.STRUCT, spans.size))
  spans.foreach(_.toThrift.write(oproto))
  oproto.writeListEnd()
  transport.getArray()
} catch {
  case NonFatal(e) => errorReceiver.counter(e.getClass.getName).incr(); return Future.Unit
}

val request = Request(Method.Post, "/api/v1/spans")
request.headerMap.add("Host", hostname)
request.headerMap.add("Content-Type", "application/x-thrift")
request.headerMap.add("Content-Length", serializedSpans.length.toString)
request.content =  Buf.ByteArray.Owned(serializedSpans)

Id conversion from long to hex in headers

HttpServerRequestAdapter, with HttpClientRequestAdapter and ServletHandlerInterceptor doing the other way round, represents ids (longs) as strings when putting them into headers by transforming them into hex strings. The javadoc for IdConversion explains that it's for compatibility with zipkin / finagle. Do you know the reason behind designing the communication protocol this way in those technologies, transforming to hex strings even though they do keep ids as longs in their internal structures?

As a user of brave alone I see it as a minor inconvenience – it may result in wtfs for unaware people trying to manually send requests with trace headers or viewing request headers during debugging.

What do you think of making the conversion customizable? We could introduce an interface for id conversion with its default implementation being the zipkin compatible one. Brave clients not requiring such compatibility could provide their implementation, e.g. doing a direct conversion without hex mapping, in brave builder.

If you agree this idea is worth extending the brave API then I could prepare a PR for this feature.

Support logging client IP with ServerRequestInterceptor via X-Forwarded-For

The right-most ipv4 value in the X-Forwarded-For header indicates the client IP address. Seems we should be able to log that portably in ServerRequestInterceptor.

For example, in ServerRequestInterceptor, where setServerReceived() is invoked, we can call the overload when we've parsed an IP address correctly.

See #107

Service appears as "Service A,Service B"

See https://groups.google.com/forum/#!topic/zipkin-user/Q_EZp3pQXk4

I believe there's a problem with the design of the Endpoint submitter. Doesn't the service name need to be the same on both the client side and the server side?

Tracing in distributed environment every XX request

Hi Kristof!
We having troubles using trace rate > 1. It seems somehow it broke in our distributed micro services architecture and spans are not combined in long chained calls.

Do u have any idea where to look into to try and diagnose the cause?

We use custom implementation for our needs based on your original version.

remove slf4j-log4j12 as transitive dependency passed onto others

Projects that use brave are not necessarily using sl4j-log4j12.

A library like brave shouldn't be bringing in such dependencies with scope=compile.

Running Brave on custom solution for preserving metadata between threads

Hello,

we're planning to adopt Brave at our company and there is one issue we try to tackle to make it possible.

We already have our own infrastructure for keeping request metadata in thread locals and transferring the information when request processing switches threads. What we'd like to do is to keep span state in our thread locals and make it available to Brave classes that operate on it. From what I see, creating our custom implementation of ServerAndClientSpanState that would delegate to our infrastructure and passing it to ServerTracer and ClientTracer could solve the problem. Unfortunately, ServerAndClientSpanState interface is not public.

Could you recommend how to approach running Brave on existing infrastructure? What do you think of making ServerAndClientSpanState interface public?

Brave submits some spans twice

Seems like brave submits some spans twice. I haven't identified which part is doing this. Testing with brave-resteasy-example (and also in my own code)

Can you make constructur on BraveCallable public

ZipkinSpanCollector via corporate proxy

Sorry for asking questions via an issue but I couldn't find a mailing list or forum to ask on.

I'm using ZipkinSpanCollector to submit spans to a Zipkin Collector, but I need to go via a corporate proxy. Do you have any idea where I can set the proxy settings? I've had a look at ZipkinCollectorClientProvider which creates a TSocket but can't see anything obvious. My Thrift knowledge is more than a little lacking

Change System.currentTimeInMillis() to improve performance

X-B3-SpanId, X-B3-TraceId, X-B3-ParentSpanId should be hex

According to https://github.com/twitter/zipkin/blob/master/doc/collector-api.md#http the ids should be hex encoded.

These seems to be ordinary longs in brave as of now, so it breaks on tools like the zipkin firefox extension with a NumberFormatException from the ServletTracingFilter

Very 1st ClientSpan(cs,cr) not submitted: unable to submit error

I'm adopting Brave to be able to trace within our project.
I do it using ServletFilter for ServerSpan(sr,ss) and Client Filter for REST Hystrix API commands to trace ClientSpan(cs,cr).
In example I use 2 Services. Each service has public REST API and also has Client library which wraps those APIs for easier usage.
Trace path looks like:

client browser starts request(R1) which reaches Service1 (S1)
S1 ServletFilter catches it and initiates ServerSpan(sr)
S1 code create S2 API call command using client lib and invokes this command
S1 initiates new request R2 (for S2) this is caught by Client Filter and ClientSpan started
R2 reaches S2: ServletFilter starts ServerSpan for S2... some logic executed
S2 sends response to S1 (for R2)

During 1st run of this test I always get error saying it cannot submit span to queue.
Its about ClientSpan.
I dont have idea yet why is this happening.
Could u please bring some light here?

Here is error in details:

12:52:43.397 ERROR hystrix-directoryCS_DirectoryDS-1 172.27.243.217:8042 - Unable to submit span to queue: Span(trace_id:-7122290292284903165, name:/persons/qwewq34r3r3r, id:2003666217768388530, parent_
id:-7122290292284903165, annotations:[Annotation(timestamp:1381236761894000, value:cs, host:Endpoint(ipv4:0, port:8042, service_name:directoryCS)), Annotation(timestamp:1381236763395000, value:cr, host:E
ndpoint(ipv4:0, port:8042, service_name:directoryCS))], binary_annotations:null)
! java.lang.InterruptedException: null
! at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(Unknown Source)
! at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(Unknown Source)
! at java.util.concurrent.ArrayBlockingQueue.offer(Unknown Source)
! at com.github.kristofa.brave.zipkin.ZipkinSpanCollector.collect(ZipkinSpanCollector.java:103)
! at com.github.kristofa.brave.ClientTracerImpl.setClientReceived(ClientTracerImpl.java:90)
! at filter.brave.BraveClientSpanProcessor.postProcess(BraveClientSpanProcessor.java:55)
! at filter.brave.BraveClientSpanClientFilter.handle(BraveClientSpanClientFilter.java:48)
! at com.sun.jersey.api.client.Client.handle(Client.java:648)
! at com.sun.jersey.api.client.WebResource.handle(WebResource.java:680)
! at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
! at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:507)
! at ds.directory.GetPersonHysCmd.getResponse(GetPersonHysCmd.java:44)
! at hystrix.GenericHttpHystrixCommand.runBase(GenericHttpHystrixCommand.java:36)
! at hystrix.GenericHttpHystrixCommand.runGet(GenericHttpHystrixCommand.java:61)
! at ds.directory.GetPersonHysCmd.run(GetPersonHysCmd.java:55)
! at ds.directory.GetPersonHysCmd.run(GetPersonHysCmd.java:14)
! at com.netflix.hystrix.HystrixCommand.executeCommand(HystrixCommand.java:1244)
! at com.netflix.hystrix.HystrixCommand.access$2200(HystrixCommand.java:95)
! at com.netflix.hystrix.HystrixCommand$5.call(HystrixCommand.java:1156)
! at com.netflix.hystrix.strategy.concurrency.HystrixContextCallable.call(HystrixContextCallable.java:45)
! at hystrix.O2HystrixContextCallable.call(O2HystrixContextCallable.java:24)
! at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
! at java.util.concurrent.FutureTask.run(Unknown Source)
! at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
! at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
! at java.lang.Thread.run(Unknown Source)

Brave not passing spanId to finagle client

I am trying to use brave-jersey2 implementation with dropwizard ContactRestServer

SpanCollector spanCollector;
        if(contactRestServerConfig.tracing != null ) {
            spanCollector = new ZipkinSpanCollector(
                    contactRestServerConfig.tracing.server,
                    contactRestServerConfig.tracing.port);
        } else {
            spanCollector = new EmptySpanCollectorImpl();
        }
        int sampleRate = 0;
        try {
            sampleRate = Math.round(1 / contactRestServerConfig.tracing.sampleRate);
        } catch (Exception ignored) {}

        Brave brave = new Brave.Builder("contact-rest")
                .spanCollector(spanCollector)
                .traceFilters(ImmutableList.of((TraceFilter) new FixedSampleRateTraceFilter(sampleRate)))
                .build();

        ServerTracer serverTracer = brave.serverTracer();

        ServerRequestInterceptor requestInterceptor = new ServerRequestInterceptor(serverTracer);
        SpanNameProvider spanNameProvider = new DefaultSpanNameProvider();
        BraveContainerRequestFilter containerRequestFilter = new BraveContainerRequestFilter(requestInterceptor, spanNameProvider);
        environment.jersey().register(containerRequestFilter);

        ServerResponseInterceptor responseInterceptor = new ServerResponseInterceptor(serverTracer);
        BraveContainerResponseFilter containerResponseFilter = new BraveContainerResponseFilter(responseInterceptor);
        environment.jersey().register(containerResponseFilter);

And finagle client is used here ContactResource

Future<Contact> contactFuture = client.get().create(RestContactRequest.to(restContactRequest));

Full code available at https://github.com/rojanu/thrift-swift-finagle-example/tree/contact-service

I get a trace entry on the zipkin server, however rest service(contact-rest) is consuming a finagle service(contact-server), but the call made to contact-server from contact-rest looks like two different requests. They really should be one request with two spans.

It looks like trace id generated by contact-rest is not passed to contact-server.

What did I miss or am doing wrong?
Thanks

Volatile variable

Hi,

I wonder if the "stop" variable on SpanProcessingThread.java should be declared as volatile.

zipkin-sink lost span

the sink lost span while sending them to zipkin collecter

cased by class ZipkinSpanCollectorSink on line 114
while ((event = channel.take()) != null && count < batchSize) {

some events are already taken from the channel even count < batchSize is false
should be changed to
while (count < batchSize && (event = channel.take()) != null) {

Brave HTTP headers not set when using spring-impl ServletHandlerInterceptor

I'm currently using 2.4.1 and have several Spring Boot services accessing each other using Apache HTTP client. The Spring use spring-impl/ServletHandlerInterceptor to intercept the calls.

I am seeing individual service traces (e.g. request, response) with a span count of 1, but never spans of greater than one indicating that tracing is not continued across service calls (e.g. serviceA -> serviceB).

Debugging using curl/Postman the HTTP headers returned from a traced service I never see any Brave HTTP headers on the responses, would it be expected to observe these headers?

Looking at ServletHandlerInterceptor it would appear as though the headers are never set on the exchange. At what point in the chain (using spring-impl) should these headers be set?

com.github.kristofa.brave.spring.ServletHandlerInterceptor ignores X-B3-Sampled header with value "1"

Since Version 3.0.0-rc-1 the brave Clients sends "1" instead of "true" as value of X-B3-Sampled, but the ServletHandlerInterceptor does not enabled tracing for this requests.

Add support for the ServerAddr("sa") and ClientAddr("ca") annotations

When the server-side of a trace is not instrumented, we can log ServerAddr("sa") annotation to indicate who the server is (via endpoint/host).

Case in point. If you are writing a database driver, you might send a "query" annotation, but the host would be the local host (the client). The standard way to indicate the server's address would be to add an "sa" annotation, which would hold the destination.

The ClientAddr similarly stores the client address. Some implementations may store, for example the value of "x-forwarded-for" header.

Brave 3.0.0

While brave works quite good I think it is time to rework some parts and to remove some legacy code.
Here is a list of improvements I'm thinking about doing:

Introduce brave server abstraction similar as the client abstraction @srapp implemented ( #27 ). Just like having the client abstraction having the server abstraction that hides ServerTracer will make it less effort to integrate and improve consistency. ClientTracer / ServerTracer can still be used for non http integrations (eg database).
Update documentation on how to use brave client / server abstractions.
Get rid of complexity of dealing with span names. Currently when integrating with http frameworks we use the path part of the URL as the span name. This resulted in inconsistencies ( #32 ) between client/server and lots of effort to clean paths that contain variable content ( #33 ). Instead I would like to follow the approach of Zipkin / Finagle which uses the http method as span name (GET, PUT, POST,...) and adds the url path unchanged as a binary annotation to the span.
Compatibility with Finagle. Make sure mixing Finagle / Zipkin and Brave services works. Previous point is also related to this.
Upgrade to Java 7. Java 6 is not supported anymore and Java 7 is getting pretty old already. Require at least Java 7.

These are the main topics I can think of now. Those changes will change the preferred way of integrating with Brave and the visualization of traces in zipkin-web that's why version 3.0.0

brave Jersey integration

I’m trying to integrate Brave into this Gradle template https://github.com/Netflix/gradle-template/blob/master/build.gradle, and having trouble with some of the Spring and Jersey integration, I added Spring DispatcherServlet with brave package on the context path in web.xml; added SpanCollector and TracerFilters configuration class in source. When I ran it, I got a bunch of "Rejected bean name" in the trace:
org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'org.springframework.context.annotation.internalConfigurationAnnotationProcessor': no URL paths identified
09:57:58.201 [Daemon] DEBUG org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'org.springframework.context.annotation.internalAutowiredAnnotationProcessor': no URL paths identified
09:57:58.201 [Daemon] DEBUG org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'org.springframework.context.annotation.internalRequiredAnnotationProcessor': no URL paths identified
09:57:58.202 [Daemon] DEBUG org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'org.springframework.context.annotation.internalCommonAnnotationProcessor': no URL paths identified
09:57:58.203 [Daemon] DEBUG org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'spanCollectorConfiguration': no URL paths identified
09:57:58.204 [Daemon] DEBUG org.springframework.web.servlet.mvc.annotation.DefaultAnnotationHandlerMapping - Rejected bean name 'traceFiltersConfiguration': no URL paths identified

Issue on ZipkinSpanCollector and Zipkin collector communication

Hi,

It's been a month that I discovered Brave and I am currently struggling with Brave and the Zipkin backend components.

My architecture is as follows:

Client 1 (inside JVM 1)
Server 1 (inside JVM 2)
Server 2 (inside JVM 3)

Ok so let's say that I have the following calls that are made (my program entry point is Client 1):

Client 1 requests Server 1 (first trace)
Server 1 requests Server 2 (second trace)
Server 2 requests Server 1 (third trace)

My problem is that the second trace and the third trace are correctly displayed on the zipkin UI but the first trace shows only the SS and SR annotations. I added some logging (with log4j) inside the collect method of the ZipkinSpanCollector class and I can see the CS and CR annotation of my Client 1 in my logs (with the same span-id and trace-id as the server-side).
Plus: the spanQueue.offer(span) line returns true.

Do you have any idea where my problem could come from?

Best regards,

PS: I posted my question as an issue since I didn't find any google-group or IRC channel talking only about brave. I hope it's not a problem.

Hugo

Is there a way that an application using Brave can startup if Zipkin collector is down

If Zipkin collector is down, I keep getting a ton of autowiring issues, which may impact the application functionality. Is there a way that Brave and Tracing functionality availability will not impact application.

Thanks,
Sridhar

Thread specific Endpoint in ServerAndClientSpanStateImpl?

This comment is based on existing codebase in master. The Endpoint instance in ServerAndClientSpanStateImpl is shared across threads while other attributes like currentServerSpan, currentClientSpan and currentClientServiceName are stored on ThreadLocal instances.
For client spans submitted using the ClientTracer implementation, IMO the endpoint should point to the remote endpoint whereas it should point to the service/server endpoint for server spans submitted via ServerTracer.
In a sample server/service trace that calls a number of client servers/services (in different threads and in parallel possibly), this single endpoint variable gets updated by a number of threads and messes it up.
Would you consider storing the Endpoint variable also on ThreadLocal? Or does it go against the original design of this class?

Application of Brave to Spring MVC with WebAsyncTask

Hi,

With Servlet 3.0, Spring MVC and WebAsyncTask, the ServerReceived and ServerSend need to execute on two separate threads. I have extended HandlerInterceptorAdapter to do something similar to what you have done in BravePre and BravePost interceptors. I started passing some of the serverspanstate through request attributes (as Spring dispatches the request twice). In trying to do so, I had to wrap some of your classes to make them public. I am wondering what would be an easier way to transfer ServerSpanState from one thread to another?

Thanks,
Sridhar

No directions/example for how to add own annotations to traces

I'd like to have two key/value annotations for my traces and I'm not sure how one is intended to set this up. (I'd probably provide documentation or an example in a PR once I figure this out).

From the code it seems that ServerTracer (my primary use case is annotations that are determined by the entry point service) would be the correct place to call submitAnnotation() on the AnnotationSubmitter. Should ServerTracer have an extension point for this or am I missing something?

Also, this would mean that the provided ServletTraceFilter cannot be used in this case.

HttpClientRequestAdapter sends sampled header through as true/false rather than 0/1

As commented in commit b71d743, the sampled header should be 0 or 1 rather than true or false for it to be compatible with the Zipkin spec.

In com.github.kristofa.brave.http.HttpClientRequestAdapter, the sampled header gets set to true or false rather than 0 or 1.

using Tracefilters with ZipkinMetricsSink's Reservoir policy

If using fixed rate sampling, is the sample rate should recorded in annotation for Metrics's processing?
Otherwise, the metrics would not be correct.

Cannot set Scribe category

Right now, the Scribe category is hardcoded to "zipkin". Some environments use a different Scribe category, so it would be nice if this was configurable.

https://github.com/kristofa/brave/blob/fea470c2a248cf7ef3a45d33f2322876c447e128/brave-zipkin-spancollector/src/main/java/com/github/kristofa/brave/zipkin/SpanProcessingThread.java#L141

Allow for dynamic ServiceNames on the "server"

Hi,

this issue somehow relates to #50 and #18

In order to avoid the "Service appears as ServiceA,ServiceB" issue, we have to override the serviceName on the client to match the server's serviceName (using setCurrentClientServiceName).

However suppose that we have a (server) application "app" which provides several fine-grained services as follows:

greetingService
userService
etc.

In zipkin-ui I'd like to see services as "greetingService" and "userService" rather than "app". This seems currently impossible as we cannot override the endpoint's serviceName on the "server" side.

I got this working by adding the functionality (that is already available on "client-site") on the server (using setCurrentServerServiceName) and could also provide a pull request for this.

Another option would be to make the Endpoint also a ThreadLocal as already discussed in #50

thank you,
daniel

accessing the current Span on the client

I want to access the current SpanId from the client, but the ServerAndClientSpanState and the AbstractAnnotationSubmitter.getSpan() methods are not accessible to me. Is there another way to obtain access to the current span? Or would it be reasonable to modify AbstractAnnotationSubmitter.getSpan() to be public? I'm happy to submit a PR if the latter is reasonable.

Pass raw IP to ThreadLocalServerAndClientSpanState

ThreadLocalServerAndClientSpanState requires you use an InetAddress class. In many cloud environments, you can lookup your public ip from an environment variable or a metadata service. I think it would be nice if we had a constructor that either took the int form or byte[] form of an IP, avoiding ns lookup.

Expected value for X-B3-Sampled header

The HTTP header used to determine if a trace should be sampled ("X-B3-Sampled") is either "1" or "0" as per the information provided at https://twitter.github.io/zipkin/Instrumenting.html

However, passing "0" or "1" would mean false because of the current implementation

Negative hex spanIds

Currently you can get negative span ids like this: -140dcf65fae75f53
I don't think this is compatible with zipkin, e.g. if a zipkin/finagle client calls a brave service (tell me if I'm wrong here).

I think we should use Long.toHexString when we serialize span/traceIds, and Long.parseUnsignedLong when deserializing them, or some similar method.
See how finagle generates ids:
https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/tracing/Id.scala

Set endpoint for remove service

I want to implement a JDBC pool monitor that intercepts SQL requests and traces them. However, they are logged as coming from my own service (naturally) but I want them to appear in Zipkin as "Postgresql" , just like Twitter has overriden traces and called them "Memcached" in this example:

http://twitter.github.io/zipkin/images/web-screenshot.png

How can I override the endpoint configuration from the client side?

Apache and Jersey Client Inconsistencies

I was messing a bit more with the Apache and Jersey implementations in Brave and noticed a few discrepancies between the two. Most notably, the headers are calculated differently and the attributes associated with the spans were inconsistent as well. It seems like it would be good for the client side tracing to be as consistent as possible between the libraries Brave offers, and I'd like to offer a plan.

We could introduce a shared 'brave-client' module that contains some interceptors which do the appropriate work of computing the header and span info, and also sends the CR and CS events, agnostic to the web client implementation. The implementation of adding a header to a request differs on which client library your using, but we could add RequestAdaptor interface that the shared module delegates to, and then the Jersey and Apache libraries can implement that accordingly.

I admittedly have started down this path in code to see if it could pan out, and I think there's some real promise with it, but I'd like to see what you thought first. I have a branch up in my fork called refactor-clients that has my current progress on the idea and may provide a bit more context about my proposal.

Kafka Span Transport

We need a kafka span transport, similar to the one in htrace.

Here's some example code that works against zipkin server infrastructure:

import com.github.kristofa.brave.SpanCollector;
import java.io.ByteArrayOutputStream;
import java.io.Flushable;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.transport.TIOStreamTransport;

public class KafkaSpanCollector implements SpanCollector, Flushable {
  private final Producer<byte[], byte[]> producer;
  // in real life, this would be bounded.
  private BlockingQueue<com.twitter.zipkin.gen.Span> queue = new LinkedBlockingQueue<>();
  private int limit = 200;

  public KafkaSpanCollector() {
    Properties props = new Properties();
    props.put("metadata.broker.list", System.getenv("METADATA_BROKER_LIST"));
    props.put("request.required.acks", "0");
    props.put("producer.type", "async");
    props.put("serializer.class", "kafka.serializer.DefaultEncoder");
    props.put("compression.codec", "1");
    this.producer = new Producer<>(new ProducerConfig(props));
  }

  @Override
  public void collect(com.twitter.zipkin.gen.Span span) {
    queue.offer(span);
    if (queue.size() >= limit) {
      flush();
    }
  }

  @Override
  public void flush() {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    final TProtocol streamProtocol = new TBinaryProtocol.Factory().getProtocol(new TIOStreamTransport(baos));

    List<KeyedMessage<byte[], byte[]>> messsages = new ArrayList<>(queue.size());
    while (!queue.isEmpty()) {
      com.twitter.zipkin.gen.Span span = queue.poll();
      if (span != null) {
        baos.reset();
        try {
          span.write(streamProtocol);
          messsages.add(new KeyedMessage<>("zipkin", baos.toByteArray()));
        } catch (TException e) {
          throw new AssertionError(e);
        }

      }
    }
    if (!messsages.isEmpty()) {
      producer.send(messsages);
    }
  }

  @Override
  public void addDefaultAnnotation(String key, String value) {
  }

  @Override
  public void close() {
    producer.close();
  }
}

End-to-end doc/example

It would be great to have an example that fully configures brave against a zipkin collector. For example, given a zipkin collector running via docker or something, the README of brave-zipkin-spancollector would show how you use and verify it, or we'd have an end-to-end example that does that.

cc @benidroe

Question regarding the span detail screen

Hi Kristof,

First off, thank you for your amazing working getting Brave implemented. I managed to get Zipkin up and running and have integrated your Brave impl into my messaging framework. Everything appears to be working correctly but I have a quick question regarding the span detail screen for a process that is a server and client. Do you know why the annotations appear in this order: CS, SR, SS and CR. I would have thought it would be SR, CS, CR, SS? First off I thought I had done something wrong but looking at other images on the web, I see everyone else has CS, SR, SS, and CR.

Thanks in advance

Gianni

Add support for nested client calls

Currently Brave does not allow to create nested spans on the client side.

E.g. after executing this code

clientTracer.startNewSpan("span1");
clientTracer.setClientSend();
clientTracer.startNewSpan("span2");
clientTracer.setClientSend();

remoteCall(clientTracer);

clientTracer.setClientReceived();
clientTracer.setClientReceived();

only span2 will be logged at the client side. Are there any plans to add support for nested client spans? An example of how it can be done is available here.

If you want, I can add nested client spans support myself and create a pull request.

Multithreading issue in ZipkinSpanCollector

Hi,

I could see that if in ZipkinSpanCollector, if you have more that one SpanProcessingThread, all the threads are sharing the same instance of ZipkinCollectorClientProvider. Hence there is a serious multithreading issue.

Client interceptors depend on server Endpoint set

Hi,

I have a test with a structure Browser -> Service2.Method2 -> Service1.Method1

On Service2 by some reason interceptors are installed separately for receiving the request for Method2 and sending the request for Service1.Method1 as a client. Those interceptors use ServerAndClientSpanStateImpl.

If I don't install interceptors for receiving request for Method2 the following happens:

ClientRequestInterceptor.handle is called by Out interceptor for client request
it tries to submit "request" binary annotation by clientTracer.submitBinaryAnnotation
it calls ClientTracerImpl.getClientEndPoint()
it checks serviceName and if it is not null, tries to create an EndPoint using copy constructor
the last one assumes that Endpoint was already stored in ThreadLocal because probably we've sent some annotations like "sr" already and so filled the Endpoint
but in my case the Endpoint was not submitted yet, it has null value, so I get "null reference exception" inside copy constructor
just adding a test if (endPoint != null) probably will not work well since the serviceName on client will be lost. And it is not clear what should be passed as "spanname", "ip" if I use client only and override serviceName? Is EndPoint capable of storing such a state and Zipkin to process it?

I don't understand what is the source of problem here:

incorrect getClientEndPoint implementation?
incorrect logic in ClientTracerImpl in an untested scenario, so it incorrectly assumes that EndPoint must be already submitted without testing it?
incorrect usage of class ServerAndClientSpanStateImpl in the case when the Method2 is not traced itself so I have to use only ClientState and ClientTracer?

I hope this is 1 or 2. If it's 3 that's bad since I will have to create a different code for client tracing depending on tracing of current method.

P.S. Also - the whole algorithm for providing service/span names from client and server side is not clear, including the purpose of X-B3-Span-Name header. That happens if they don't match on client/server side. And what are the best practices? I think it's worth to have it all described in one readme.

Thank you for an attention!

ServerTracer class Javadoc refers to unknown method

The javadoc of the com.github.kristofa.brave.ServerTracer class refers to the setStateExistingTrace method which does not exist; I presume it's supposed to be the setStateCurrentTrace method?

Question: Integration with Flume (for Zipkin and Graphite)

Hi Kristof,

I've got a graphite server and flume running with your example flume.conf from flume-zipkin-metrics-sink model (which I looking to use to keep directing regular spans to the Zipkin backend but direct spans with durations to the Graphite server).

Have a couple of quick questions before I try it out:

Do I need to put both of your dist jars on the flume classpath (i.e. flume-zipkin-collector-sink & flume-zipkin-metrics-sink, or do I just need the flume-zipkin-metrics-sink)?
With regards to changes required at my application which uses your brave impl:

Do I still keep using the ZipkinSpanCollector but just re-configure it to point to the flume scribe source 1463 instead?

i.e. the only change from the application point of view to simply change the port configured for the ZipkinSpanCollector from 9410 to 1463?

Regards

Gianni

Incorrect project description

As far as I can tell, brave is zipkin client implementation, not zipkin implementation. Project description on github is misleading.

openzipkin / brave Goto Github PK

brave's Introduction

Brave

What's included

Version Compatibility policy

Artifacts

Library Releases

Library Snapshots

Version alignments

brave's People

Contributors

Stargazers

Watchers

Forkers

brave's Issues

Request Header Debug Log

Traced Apache HTTP Client to Spring MVC results

Recommend Projects

Recommend Topics

Recommend Org