Comments (17)
@manuel-huez thanks a lot for the effort of moving this. Hopefully it should be solved soon!
from dd-trace-go.
Looks like a problem with the agent. Do you have access to the logs? Would be great to see what happened at that moment when you got those messages. They are in /var/log/datadog/trace-agent.log
and on c:\programdata\datadog\logs\trace-agent.log
for Windows.
from dd-trace-go.
Thanks for the prompt reply. Here's a part of that log file containing errors:
2018-02-20 15:46:35 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:03:14 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:15:46 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: http: unexpected EOF reading trailer
2018-02-20 16:23:27 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:3108756104495211551 spanID:3108756104495211551 start:15191438...
2018-02-20 16:23:59 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:7722660901761579816 spanID:7722660901761579816 start:15191438...
2018-02-20 16:24:30 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"tessellate" name:"middlewares.init" traceID:5273839187162951140 spanID:5273839187162951140 start:15191438...
2018-02-20 16:47:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:462730192269898272 spanID:462730192269898272 start:151914525074349209...
2018-02-20 16:52:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:8229154722886646623 spanID:8229154722886646623 start:1519145550660763...
2018-02-20 17:02:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:6913225961392150914 spanID:6913225961392150914 start:1519146167667694...
2018-02-20 17:07:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:2321761317779871275 spanID:2321761317779871275 start:1519146450288150...
2018-02-20 17:12:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:2285384087229003712 spanID:2285384087229003712 start:1519146750482210...
2018-02-20 17:12:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:785244854339271482 spanID:785244854339271482 start:151914676768520044...
2018-02-20 17:16:51 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: unexpected EOF
2018-02-20 17:22:31 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:4294281343579544319 spanID:4294281343579544319 start:1519147350673618...
2018-02-20 17:22:47 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:4852484717936411477 spanID:4852484717936411477 start:1519147367679504...
2018-02-20 17:28:45 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:6983175358650587154 spanID:6983175358650587154 start:1519147723825698...
2018-02-20 17:38:30 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: unexpected EOF
2018-02-20 17:39:11 ERROR (receiver.go:387) - cannot decode v0.3 traces payload: EOF
2018-02-20 17:42:49 ERROR (receiver.go:249) - dropping trace reason: invalid span service:"api" resource:"gin.request" traceID:8705007339549081199 spanID:8705007339549081199 start:1519148567437578...
from dd-trace-go.
No problem. I will look into it soon.
from dd-trace-go.
@manuel-huez is there any chance you could provide a full line out of that log? (more even better). They are truncated with ellipsis (...
).
from dd-trace-go.
Unfortunately, this output is what's getting stored in the log file, so I don't seem to have access to the full line (cat
output).
from dd-trace-go.
I can see that the trace is dropped during the normalization process in the agent because some of the data contained in one of the fields is not being processed for some reason. It is hard to say without seeing it. Do you think there is any way to obtain some examples of those spans? Perhaps an fmt.Printf("%#v", ...
output or similar? It would help in tracking down the problem.
As context, the returned error can be seen here, but it would be nice to track down exactly which field in the Normalize function is causing the problem.
from dd-trace-go.
The issue is intermittent, so I'm not that fond of adding a new logging like that, especially since it's going to be extremely heavy since we're trying to use as many spans as possible.
However, wouldn't it be possible to update the trace agent so it logs full lines?
from dd-trace-go.
Hey @manuel-huez, yes of course that wouldn't make sense if the issue is intermittent. I assumed it was easy to reproduce.
However, wouldn't it be possible to update the trace agent so it logs full lines?
Sorry, I wasn't aware that the agent is doing this. I assumed that it was something specific to your environment.
Would you mind moving this issue to the datadog-trace-agent
repository? I am happy to do it myself but I'd like you to be the author so that you can get updates. I think the turn-around for a solution will be faster. Simply go here and copy/paste the issue description, including the logs that you've shown me.
That would be very helpful. Thanks!
from dd-trace-go.
Still seeing the errors; and 1 second timeout is way too short.
commit 4eb4e
Author: Maryan
ddtrace: patch: avoid dropping traces with longer timeout
```
Datadog Tracer v1.19.0 ERROR: lost 275 traces: Post http://10.0.0.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:09 UTC)
```
diff --git a/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go b/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
index 9334ca3..738eb38 100644
--- a/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
+++ b/vendor/gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/transport.go
@@ -43,7 +43,7 @@ const (
defaultHostname = "localhost"
defaultPort = "8126"
defaultAddress = defaultHostname + ":" + defaultPort
- defaultHTTPTimeout = time.Second // defines the current timeout before giving up with the send process
+ defaultHTTPTimeout = 30 * time.Second // defines the current timeout before giving up with the send process
traceCountHeader = "X-Datadog-Trace-Count" // header containing the number of traces in the payload
)
from dd-trace-go.
Hi @mhratson!
Please tell us a bit more. Did upping the timeout change anything? Also, is your trace-agent running in the local network? If so, one second should be plenty for a payload smaller than 10 MB. The timeout usually happens if there is a disconnect, so increasing it might not always help.
Let's explore this and figure it out. If we can't, we might want to explore improving the error message to give us more insight into what kind of payload failed to send.
from dd-trace-go.
Hey @gbbr,
Here Some logs. Let me know if there are better ways to debug it…
Client
Debug output:
Nov 12 20:03:26.175 | podX | | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 4010 traces: 10
Nov 12 20:03:24.159 | podX | | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 69897 traces: 173
Nov 12 20:03:24.151 | podX | | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 32214 traces: 77
Nov 12 20:03:22.338 | podX | | Datadog Tracer v1.19.0 DEBUG: Sending payload: size: 104760 traces: 218
And occasional errors:
Nov 13 14:20:21.305 podX Datadog Tracer v1.19.0 ERROR: lost 235 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:19 UTC)
Nov 13 14:17:59.280 podX Datadog Tracer v1.19.0 ERROR: lost 30 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:16 UTC)
Nov 13 14:10:21.074 podX Datadog Tracer v1.19.0 ERROR: lost 275 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 22:09 UTC)
Nov 13 14:00:23.137 podX Datadog Tracer v1.19.0 ERROR: lost 2 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:59 UTC)
Nov 13 13:50:21.154 podX Datadog Tracer v1.19.0 ERROR: lost 223 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:49 UTC)
Nov 13 13:37:59.308 podX Datadog Tracer v1.19.0 ERROR: lost 23 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:36 UTC)
Nov 13 13:35:21.218 podX Datadog Tracer v1.19.0 ERROR: lost 118 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:34 UTC)
Nov 13 13:15:21.189 podX Datadog Tracer v1.19.0 ERROR: lost 146 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 21:14 UTC)
Nov 13 12:57:59.195 podX Datadog Tracer v1.19.0 ERROR: lost 107 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:56 UTC)
Nov 13 12:42:57.329 podX Datadog Tracer v1.19.0 ERROR: lost 3 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:41 UTC)
Nov 13 12:25:23.054 podX Datadog Tracer v1.19.0 ERROR: lost 226 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:24 UTC)
Nov 13 12:12:58.221 podX Datadog Tracer v1.19.0 ERROR: lost 4 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:11 UTC)
Nov 13 12:08:00.437 podX Datadog Tracer v1.19.0 ERROR: lost 3 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 20:06 UTC)
Nov 13 11:32:58.618 podX Datadog Tracer v1.19.0 ERROR: lost 15 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers), 1 additional messages skipped (first occurrence: 13 Nov 19 19:31 UTC)
Nov 13 11:30:23.049 podX Datadog Tracer v1.19.0 ERROR: lost 256 traces: Post http://169.254.1.1:8126/v0.4/traces: net/http: request canceled (Client.Timeout exceeded while awaiting headers) (occurred: 13 Nov 19 19:29 UTC)
the host is 1 hop away:
$ traceroute 169.254.1.1
traceroute to 169.254.1.1 (169.254.1.1), 30 hops max, 46 byte packets
1 169.254.1.1 (169.254.1.1) 0.024 ms 0.004 ms 0.004 m
There's been no Client.Timeout exceeded while awaiting headers
Errors after the patch deploy…
from dd-trace-go.
We could add an option when the tracer starts to specify a custom timeout. Something like:
tracer.Start(tracer.WithHTTPTimeout(5*time.Second))
... which you could use to increase the timeout, if that fixes the problem. What do you think?
from dd-trace-go.
Did more instrumentation and one of examples how requests may fail:
DNS lookup: 0 ms
TCP connection: 102 ms
TLS handshake: 0 ms
Server processing: 0 ms
Content transfer: 399 ms
Name Lookup: 0 ms
Connect: 102 ms
Pre Transfer: 102 ms
Start Transfer: 699 ms
Total: 1098 ms
from dd-trace-go.
tracer.Start(tracer.WithHTTPTimeout(5*time.Second))
How about having option for custom http.Client
, similar to WithHTTPRoundTripper
?
from dd-trace-go.
Another finding: #536
from dd-trace-go.
Did more instrumentation and one of examples how requests may fail:
Thank you for that. This really helps.
Another finding: #536
Not exactly sure what you were trying to do here, but we were closing the body.
How about having option for custom http.Client, similar to WithHTTPRoundTripper?
This means that WithHTTPClient
would overlap with WithHTTPRoundTripper
. Because of this we would have to deprecate the latter. I guess that's probably the best option, since it provides full flexibility to the user.
Let's do that then. I'm happy to add this change. We should also probably increase the default timeout to 2 seconds.
from dd-trace-go.
Related Issues (20)
- [BUG] dd-trace-go states that it is an Apache license, but it pulls in DataDog/datadog-agent which includes eBPF code which is GPL 2.0 licensed HOT 4
- Stats for contrib/jackc/pgxpool HOT 4
- Add option to gqlgen contrib for skipping spans for field resolvers w/o methods HOT 5
- internal/telemetry: TestProductChange/profiler_start,_tracer_start is flaky HOT 1
- Snyk is reporting CVE-2023-7104 HOT 4
- propose : support Amazon EventBridge Scheduler tracing. HOT 3
- pgx query times report as application times in the Datadog UI HOT 2
- Full DBM propagation for pgx HOT 6
- Profiler: Add ability to override logger HOT 1
- Headers not propagated across custom spans within trace HOT 1
- contrib/jackc/pgx.v5: trace acquire and release HOT 1
- [BUG] Datadog Tracer v1.64.0 WARN: Turning off Data Streams Monitoring. Upgrade your agent to 7.34+ HOT 5
- proposal: integrate with github.com/twmb/franz-go HOT 1
- [BUG] fatal error: concurrent map writes when using contrib/net/http.(*roundTripper).RoundTrip HOT 1
- Remove warn logs from tracer/option newConfig HOT 1
- Go 1.19 no longer works HOT 1
- Failed to configure Datadog Profiler locally HOT 1
- labstack/echo v3.3.10 dependency vulnerability HOT 4
- Add WithStatsdClient StartOption for tracer HOT 1
- [BUG] 99designs/gqlgen/tracer.go: panic cause nil pointer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dd-trace-go.