Giter Club home page Giter Club logo

Comments (17)

gbbr avatar gbbr commented on July 30, 2024 1

@manuel-huez thanks a lot for the effort of moving this. Hopefully it should be solved soon!

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

Looks like a problem with the agent. Do you have access to the logs? Would be great to see what happened at that moment when you got those messages. They are in /var/log/datadog/trace-agent.log and on c:\programdata\datadog\logs\trace-agent.log for Windows.

from dd-trace-go.

manuel-huez avatar manuel-huez commented on July 30, 2024

Thanks for the prompt reply. Here's a part of that log file containing errors:

2018-02-20 15:46:35 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:03:14 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:15:46 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:23:27 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:3108756104495211551 spanID:3108756104495211551 start:15191438...
2018-02-20 16:23:59 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:7722660901761579816 spanID:7722660901761579816 start:15191438...
2018-02-20 16:24:30 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:5273839187162951140 spanID:5273839187162951140 start:15191438...
2018-02-20 16:47:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:462730192269898272 spanID:462730192269898272 start:151914525074349209...
2018-02-20 16:52:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:8229154722886646623 spanID:8229154722886646623 start:1519145550660763...
2018-02-20 17:02:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:6913225961392150914 spanID:6913225961392150914 start:1519146167667694...
2018-02-20 17:07:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:2321761317779871275 spanID:2321761317779871275 start:1519146450288150...
2018-02-20 17:12:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:2285384087229003712 spanID:2285384087229003712 start:1519146750482210...
2018-02-20 17:12:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:785244854339271482 spanID:785244854339271482 start:151914676768520044...
2018-02-20 17:16:51 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: unexpected EOF
2018-02-20 17:22:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:4294281343579544319 spanID:4294281343579544319 start:1519147350673618...
2018-02-20 17:22:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:4852484717936411477 spanID:4852484717936411477 start:1519147367679504...
2018-02-20 17:28:45 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:6983175358650587154 spanID:6983175358650587154 start:1519147723825698...
2018-02-20 17:38:30 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: unexpected EOF
2018-02-20 17:39:11 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: EOF
2018-02-20 17:42:49 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:8705007339549081199 spanID:8705007339549081199 start:1519148567437578...

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

No problem. I will look into it soon.

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

@manuel-huez is there any chance you could provide a full line out of that log? (more even better). They are truncated with ellipsis (...).

from dd-trace-go.

manuel-huez avatar manuel-huez commented on July 30, 2024

Unfortunately, this output is what's getting stored in the log file, so I don't seem to have access to the full line (cat output).

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

I can see that the trace is dropped during the normalization process in the agent because some of the data contained in one of the fields is not being processed for some reason. It is hard to say without seeing it. Do you think there is any way to obtain some examples of those spans? Perhaps an fmt.Printf("%#v", ... output or similar? It would help in tracking down the problem.

As context, the returned error can be seen here, but it would be nice to track down exactly which field in the Normalize function is causing the problem.

from dd-trace-go.

manuel-huez avatar manuel-huez commented on July 30, 2024

The issue is intermittent, so I'm not that fond of adding a new logging like that, especially since it's going to be extremely heavy since we're trying to use as many spans as possible.
However, wouldn't it be possible to update the trace agent so it logs full lines?

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

Hey @manuel-huez, yes of course that wouldn't make sense if the issue is intermittent. I assumed it was easy to reproduce.

However, wouldn't it be possible to update the trace agent so it logs full lines?

Sorry, I wasn't aware that the agent is doing this. I assumed that it was something specific to your environment.

Would you mind moving this issue to the datadog-trace-agent repository? I am happy to do it myself but I'd like you to be the author so that you can get updates. I think the turn-around for a solution will be faster. Simply go here and copy/paste the issue description, including the logs that you've shown me.

That would be very helpful. Thanks!

from dd-trace-go.

mhratson avatar mhratson commented on July 30, 2024

Still seeing the errors; and 1 second timeout is way too short.

commit 4eb4e
Author:     Maryan

    ddtrace: patch: avoid dropping traces with longer timeout
    
    ```
    Datadog Tracer v1.19.0 ERROR: lost 275 traces: Post http://10.0.0.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:09 UTC)
    ```

diff --git a/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go b/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
index 9334ca3..738eb38 100644
--- a/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
+++ b/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
@@ -43,7 +43,7 @@ const (
        defaultHostname    = "localhost"
        defaultPort        = "8126"
        defaultAddress     = defaultHostname + ":" + defaultPort
-       defaultHTTPTimeout = time.Second             // defines the current timeout before giving up with the send process
+       defaultHTTPTimeout = 30 * time.Second        // defines the current timeout before giving up with the send process
        traceCountHeader   = "X-Datadog-Trace-Count" // header containing the number of traces in the payload
 )

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

Hi @mhratson!

Please tell us a bit more. Did upping the timeout change anything? Also, is your trace-agent running in the local network? If so, one second should be plenty for a payload smaller than 10 MB. The timeout usually happens if there is a disconnect, so increasing it might not always help.

Let's explore this and figure it out. If we can't, we might want to explore improving the error message to give us more insight into what kind of payload failed to send.

from dd-trace-go.

mhratson avatar mhratson commented on July 30, 2024

Hey @gbbr,

Here Some logs. Let me know if there are better ways to debug it…

Client

Debug output:

Nov 12 20:03:26.175 | podX |   | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 4010 traces: 10
Nov 12 20:03:24.159 | podX |   | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 69897 traces: 173
Nov 12 20:03:24.151 | podX |   | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 32214 traces: 77
Nov 12 20:03:22.338 | podX |   | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 104760 traces: 218

And occasional errors:


Nov 13 14:20:21.305	podX		Datadog Tracer v1.19.0 ERROR: lost 235 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:19 UTC)
Nov 13 14:17:59.280	podX		Datadog Tracer v1.19.0 ERROR: lost 30 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:16 UTC)
Nov 13 14:10:21.074	podX		Datadog Tracer v1.19.0 ERROR: lost 275 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:09 UTC)
Nov 13 14:00:23.137	podX		Datadog Tracer v1.19.0 ERROR: lost 2 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:59 UTC)
Nov 13 13:50:21.154	podX		Datadog Tracer v1.19.0 ERROR: lost 223 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:49 UTC)
Nov 13 13:37:59.308	podX		Datadog Tracer v1.19.0 ERROR: lost 23 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:36 UTC)
Nov 13 13:35:21.218	podX		Datadog Tracer v1.19.0 ERROR: lost 118 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:34 UTC)
Nov 13 13:15:21.189	podX		Datadog Tracer v1.19.0 ERROR: lost 146 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:14 UTC)
Nov 13 12:57:59.195	podX		Datadog Tracer v1.19.0 ERROR: lost 107 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:56 UTC)
Nov 13 12:42:57.329	podX		Datadog Tracer v1.19.0 ERROR: lost 3 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:41 UTC)
Nov 13 12:25:23.054	podX		Datadog Tracer v1.19.0 ERROR: lost 226 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:24 UTC)
Nov 13 12:12:58.221	podX		Datadog Tracer v1.19.0 ERROR: lost 4 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:11 UTC)
Nov 13 12:08:00.437	podX		Datadog Tracer v1.19.0 ERROR: lost 3 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:06 UTC)
Nov 13 11:32:58.618	podX		Datadog Tracer v1.19.0 ERROR: lost 15 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers), 1 additional messages skipped (first occurrence: 13 Nov 19 19:31 UTC)
Nov 13 11:30:23.049	podX		Datadog Tracer v1.19.0 ERROR: lost 256 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 19:29 UTC)

the host is 1 hop away:

$ traceroute 169.254.1.1
traceroute to 169.254.1.1 (169.254.1.1), 30 hops max, 46 byte packets
 1  169.254.1.1 (169.254.1.1)  0.024 ms  0.004 ms  0.004 m

There's been no Client.Timeout exceeded while awaiting headers Errors after the patch deploy…

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

We could add an option when the tracer starts to specify a custom timeout. Something like:

tracer.Start(tracer.WithHTTPTimeout(5*time.Second))

... which you could use to increase the timeout, if that fixes the problem. What do you think?

from dd-trace-go.

mhratson avatar mhratson commented on July 30, 2024

Did more instrumentation and one of examples how requests may fail:

DNS lookup:           0 ms
TCP connection:     102 ms
TLS handshake:        0 ms
Server processing:    0 ms
Content transfer:   399 ms
Name Lookup:       0 ms
Connect:         102 ms
Pre Transfer:    102 ms
Start Transfer:  699 ms
Total:          1098 ms

from dd-trace-go.

mhratson avatar mhratson commented on July 30, 2024

tracer.Start(tracer.WithHTTPTimeout(5*time.Second))

How about having option for custom http.Client, similar to WithHTTPRoundTripper?

from dd-trace-go.

mhratson avatar mhratson commented on July 30, 2024

Another finding: #536

from dd-trace-go.

gbbr avatar gbbr commented on July 30, 2024

Did more instrumentation and one of examples how requests may fail:

Thank you for that. This really helps.

Another finding: #536

Not exactly sure what you were trying to do here, but we were closing the body.

How about having option for custom http.Client, similar to WithHTTPRoundTripper?

This means that WithHTTPClient would overlap with WithHTTPRoundTripper. Because of this we would have to deprecate the latter. I guess that's probably the best option, since it provides full flexibility to the user.

Let's do that then. I'm happy to add this change. We should also probably increase the default timeout to 2 seconds.

from dd-trace-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.