Comments (18)
There's no open PR yet, we still need to follow through and make the change in #288 (comment).
@jlind23 we can keep this open but move it to a future sprint. I will edit the description with the latest information. I don't see a point in opening a separate implementation issue since this one has all the history already.
from elastic-agent-shipper.
Just to confirm I am interpreting this correctly, using the MultiField example as a reference:
* The existing protobuf implementation is 60% slower than mapstr and stdlib json: `17851260.0 / 11121605.0 = 1.60` * The realistic protobuf unmarshal/marshal fields case is narrowly faster than mapstr and stdlib json `10299411.0 / 11121605.0 = 0.92`
yes
I would then conclude that we should absolutely switch to using JSON serialization for the metadata and fields. We can iterate on making the JSON serialization faster, or attempting to optimistically avoid unmarshalling the JSON if we don't need to look at the event metadata or fields later.
One big problem with avoiding unmarshalling "Fields" field. The data we need to run processors, even a drop processor or adding a tag is in there. So I think it is unlikely that we can avoid unmarshalling the "Fields" field.
Is there any reason you tested with the fields as
string
instead ofbytes
? I would have usedbytes
so we don't require a UTF-8 encoding, but maybe there is a reason we want this. The UTF-8 requirement is the only difference betweenstring
andbytes
.
brain fart. I've switched to bytes and updated results with some more encoders for comparison coming soon.
from elastic-agent-shipper.
If the benchmarks are better using serialized JSON instead of google.protobuf.Struct (which is what we expect), then let's make the change to the protocol to use JSON going forward.
We can also explore other tweaks to the protocol if the performance tests reveal any other new information we need to take into account.
from elastic-agent-shipper.
Initial results
I tested round tripping (Unmarshal then Marshal) events because in our use case clients will have to Marshall and the shipper will Unmarshal, so the impact of both is important. Also I used 2 types of events. The first (SingleField) was where the data for the event was stored in a single field ("message").
example:
{"@timestamp":"2023-04-20T14:51:58.913Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"log":{"offset":0,"file":{"path":"/Users/hinman/tmp/clf_1765593378.log"}},"message":"66.4.203.154 - - [20/Apr/2023:09:49:46 -0500] \"GET /random-47.html HTTP/2\" 200 1318 - \"Mozilla/5.0 (Macintosh; Intel Mac OS X 12.3; rv:98.0) Gecko/20100101 Firefox/98.0\"","input":{"type":"filestream"},"host":{"mac":["36-B9-EA-B0-ED-00","36-B9-EA-B0-ED-04","36-B9-EA-B0-ED-08","44-67-52-02-50-BD","A2-64-0B-12-7F-B6","A2-64-0B-12-7F-B7","A2-64-0B-12-7F-B8","A2-64-0B-12-7F-D6","A2-64-0B-12-7F-D7","A2-64-0B-12-7F-D8","B6-E3-AB-15-A1-BB","BC-D0-74-23-C9-F2","BE-D0-74-23-C9-F2"],"name":"elastic2","hostname":"elastic2","architecture":"arm64","os":{"version":"13.3.1","family":"darwin","name":"macOS","kernel":"22.4.0","build":"22E261","type":"macos","platform":"darwin"},"id":"3F8C45C1-BAF2-5AED-8484-947CD6BAD36A","ip":["fe80::a064:bff:fe12:7fd8","fe80::a064:bff:fe12:7fd7","fe80::a064:bff:fe12:7fd6","fe80::5d:9a80:93b8:ab9","172.16.1.205","fe80::b4e3:abff:fe15:a1bb","fe80::b4e3:abff:fe15:a1bb","fe80::a040:495d:28b:9e86","fe80::7fda:39e1:2de:127a","fe80::ce81:b1c:bd2c:69e","fe80::1cbd:4d85:36d5:a9c8","172.16.1.240"]},"agent":{"version":"8.7.0","ephemeral_id":"6a1557ad-d218-4d77-bc2a-d5d2d218e158","id":"21cb6281-9e90-4723-8d58-319249607e90","name":"elastic2","type":"filebeat"},"ecs":{"version":"8.0.0"}}
The second type of event was where the data for the event was added to the root of the message as fields (MultiField).
example:
{"@timestamp":"2023-04-19T21:01:07.001Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.7.0"},"agent":{"id":"d2b3c240-5d88-41c6-a03e-f7401342dbea","name":"elastic2","type":"filebeat","version":"8.7.0","ephemeral_id":"b2b538e8-7f35-4648-b938-9c5a3cff1989"},"event":{"event_type":"netflow","src_ip":"118.9.14.112","proto":"UDP","timestamp":"2023-04-19T15:52:43.671103-0500","src_port":34177,"dst_ip":"12.163.211.175","dst_port":52025,"flow_id":6129484611666145821,"netflow":{"max_ttl":72,"pkts":94,"bytes":64579,"start":"2023-04-19T15:52:43.671103-0500","end":"2023-04-19T15:52:43.671103-0500","age":0,"min_ttl":72}},"event_timestamp":"1681937563","firewall_name":"Firewall-81","input":{"type":"filestream"},"host":{"os":{"build":"22E261","type":"macos","platform":"darwin","version":"13.3.1","family":"darwin","name":"macOS","kernel":"22.4.0"},"id":"3F8C45C1-BAF2-5AED-8484-947CD6BAD36A","name":"elastic2","ip":["fe80::a064:bff:fe12:7fd8","fe80::a064:bff:fe12:7fd7","fe80::a064:bff:fe12:7fd6","fe80::5d:9a80:93b8:ab9","172.16.1.205","fe80::b4e3:abff:fe15:a1bb","fe80::b4e3:abff:fe15:a1bb","fe80::a040:495d:28b:9e86","fe80::7fda:39e1:2de:127a","fe80::ce81:b1c:bd2c:69e","fe80::1cbd:4d85:36d5:a9c8","172.16.1.240"],"mac":["36-B9-EA-B0-ED-00","36-B9-EA-B0-ED-04","36-B9-EA-B0-ED-08","44-67-52-02-50-BD","A2-64-0B-12-7F-B6","A2-64-0B-12-7F-B7","A2-64-0B-12-7F-B8","A2-64-0B-12-7F-D6","A2-64-0B-12-7F-D7","A2-64-0B-12-7F-D8","B6-E3-AB-15-A1-BB","BC-D0-74-23-C9-F2","BE-D0-74-23-C9-F2"],"hostname":"elastic2","architecture":"arm64"},"availability_zone":"eu-west-1c","log":{"file":{"path":"/Users/hinman/tmp/firewall_2474911009.log"},"offset":0},"ecs":{"version":"8.0.0"}}
Both types of events were generated by running filebeat against sample data and using the file output. benchtime was set to 30 seconds.
SingleField Results
Condition | Iterations | ns/op | B/op | allocs/op |
---|---|---|---|---|
existing protobuf | 6516 | 5449813 | 1736473 | 45540 |
mapstr and stdlib json | 10000 | 3168283 | 1535909 | 33771 |
protobuf with Fields and Metadata as strings | 216559 | 169403 | 371089 | 1320 |
protobuf Fields/Metadata Marshal/Unmarshal | 10000 | 3149967 | 2059934 | 35201 |
MultiField Results
Condition | Iterations | ns/op | B/op | allocs/op |
---|---|---|---|---|
existing protobuf | 2030 | 17851260 | 5402970 | 144140 |
mapstr and stdlib json | 3241 | 11121605 | 4814398 | 106575 |
protobuf with Fields and Metadata as strings | 69279 | 521275 | 1110870 | 3720 |
protobuf Fields/Metadata Marshal/Unmarshal | 3520 | 10299411 | 6400501 | 110595 |
The fourth condition was added, because clients will still have to spend time Marshalling the fields and metadata to strings to send the data. The shipper will have to spend time Unmarshalling to use the fields for processors or access the metadata.
https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_base
https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_string_fields
from elastic-agent-shipper.
Just to confirm I am interpreting this correctly, using the MultiField example as a reference:
- The existing protobuf implementation is 60% slower than mapstr and stdlib json:
17851260.0 / 11121605.0 = 1.60
- The realistic protobuf unmarshal/marshal fields case is narrowly faster than mapstr and stdlib json
10299411.0 / 11121605.0 = 0.92
I would then conclude that we should absolutely switch to using JSON serialization for the metadata and fields. We can iterate on making the JSON serialization faster, or attempting to optimistically avoid unmarshalling the JSON if we don't need to look at the event metadata or fields later.
Is there any reason you tested with the fields as string
instead of bytes
? I would have used bytes
so we don't require a UTF-8 encoding, but maybe there is a reason we want this. The UTF-8 requirement is the only difference between string
and bytes
.
from elastic-agent-shipper.
Updated Results
Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/benchmark_string_fields
Command: go test -bench=. -benchmem -benchtime 15s -timeout 30m
run on Apple M1 Max, OS 13.3.1, go 1.17.9
Raw Results
goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10 3115 5667029 ns/op 1736485 B/op 45540 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrStdJSON-10 5421 3299549 ns/op 1537192 B/op 33773 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobuf-10 111667 161508 ns/op 372850 B/op 1320 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFull-10 5642 3191448 ns/op 1901476 B/op 34984 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFullGoJSON-10 9440 1908256 ns/op 1479525 B/op 20594 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventStdJSON-10 5060 3573773 ns/op 1571119 B/op 34214 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventGoJSON-10 9351 1937137 ns/op 1162826 B/op 18823 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventCBORL-10 8058 2229902 ns/op 1088005 B/op 30253 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10 1044 17318237 ns/op 5402877 B/op 144140 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrStdJSON-10 1710 10541866 ns/op 4809868 B/op 106568 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobuf-10 35442 507303 ns/op 1115833 B/op 3720 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFull-10 1622 11070076 ns/op 5921757 B/op 109993 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFullGoJSON-10 2640 6799568 ns/op 4632009 B/op 64192 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventStdJSON-10 1614 11151869 ns/op 4904499 B/op 107802 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventGoJSON-10 2942 6134171 ns/op 3643867 B/op 59202 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventCBORL-10 2569 7019086 ns/op 3405332 B/op 96056 allocs/op
PASS
ok github.com/elastic/elastic-agent-shipper-client/benchmark 484.142s
Comparing to original protobuf implementation, "Shallow Protobuf" is with Fields & Metadata as bytes. "Deep Event" is a go struct that mimics the original protobuf implementation (Fields & Metadata as mapstr, timestamp as string). "goccy" is using https://github.com/goccy/go-json. "CBORL" is using https://github.com/fxamacker/cbor
Condition | Normalized to Original Protobuf (orig/condition) ns/op |
---|---|
Single Field, mapstr, stdlib json | 1.72 |
Single Field, Shallow Protobuf | 35.09 |
Single Field, Shallow Protobuf, stdlib json Fields, Meta | 1.78 |
Single Field, Shallow Protobuf, goccy json Fields, Meta | 2.97 |
Single Field, Deep Event, stdlib json | 1.59 |
Single Field, Deep Event, goccy json | 2.93 |
Single Field, Deep Event, CBORL | 2.54 |
Multi Field, mapstr, stdlib json | 1.64 |
Multi Field, Shallow Protobuf | 34.14 |
Multi Field, Shallow Protobuf, stdlib json Fields, Meta | 1.56 |
Multi Field, Shallow Protobuf, goccy json Fields, Meta | 2.55 |
Multi Field, Deep Event, stdlib json | 1.55 |
Multi Field, Deep Event, goccy json | 2.82 |
Multi Field, Deep Event, CBORL | 2.47 |
Observations
- The deep protobuf is worse that using mapstr and stdlib json encode/decode.
- The shallow protobuf is significantly faster (~35x)
- But once you add back decoding Fields & Metadata that drops to to 1.5x - 3x (depends on number and depth of fields)
- Using goccy json encoder provides similar benefits to shallow protobuf with full decode.
- CBORL is similar in speed to goccy, this might be useful if using the disk queue since it stores events in this format
from elastic-agent-shipper.
CBORL is similar in speed to goccy, this might be useful if using the disk queue since it stores events in this format
One advantage to serializing to JSON for the shipper is users and teams that are already writing to Elasticsearch already heavily use JSON and have invested time in making it efficient. We need to consider existing code that already exists for this purpose. I don't like the idea of throwing away this existing efficiency when using the shipper.
There are multiple alternative JSON serialization libraries in use for efficiency and convenience reasons, and switching to CBORL would force everyone to pick a CBORL library and invest time in performance tuning it.
The disk queue isn't the most common use case, and I think it is generally accepted that writing to disk will be slower. It seems acceptable to pay the cost to serialize from JSON to CBORL there like we do in Beats today, and we generally don't get complaints about it.
from elastic-agent-shipper.
The only other thing it may be worth investigating here is trying to use an alternative protobuf generator as described in #263.
Specifically it could be worth quickly testing if using https://github.com/planetscale/vtprotobuf makes a difference.
from elastic-agent-shipper.
Results with vtprotobuf
Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf
Command: go test -bench=. -benchmem -benchtime 15s -timeout 30m
run on Apple M1 Max, OS 13.3.1, go 1.17.9
Raw Results
goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10 2834 6243650 ns/op 1736471 B/op 45540 allocs/op
BenchmarkMarshalUnmarshal/SingleVTProtobuf-10 10000 1633667 ns/op 1243770 B/op 27060 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrStdJSON-10 5431 3296384 ns/op 1537780 B/op 33774 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobuf-10 112160 160555 ns/op 372850 B/op 1320 allocs/op
BenchmarkMarshalUnmarshal/SingleVTShallowProtobuf-10 144440 124353 ns/op 362290 B/op 1320 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFull-10 5661 3159657 ns/op 1901542 B/op 34984 allocs/op
BenchmarkMarshalUnmarshal/SingleVTShallowProtobufFull-10 5778 3100950 ns/op 1890367 B/op 34983 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowProtobufFullGoJSON-10 9375 1907575 ns/op 1479526 B/op 20594 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowVTProtobufFullGoJSON-10 9735 1845337 ns/op 1468934 B/op 20594 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrGoJSON-10 9420 1917613 ns/op 1108078 B/op 19334 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowEventStdJSON-10 13436 1340617 ns/op 586345 B/op 1982 allocs/op
BenchmarkMarshalUnmarshal/SingleShallowEventGoJSON-10 38572 467163 ns/op 652993 B/op 554 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventStdJSON-10 5029 3580827 ns/op 1571243 B/op 34214 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventSructFormCBORL-10 6268 2880505 ns/op 1968003 B/op 31083 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventGoJSON-10 9337 1949360 ns/op 1162966 B/op 18823 allocs/op
BenchmarkMarshalUnmarshal/SingleDeepEventCBORL-10 8090 2230057 ns/op 1088256 B/op 30254 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10 933 19333760 ns/op 5402882 B/op 144140 allocs/op
BenchmarkMarshalUnmarshal/MultiVTProtobuf-10 3607 5001397 ns/op 3840957 B/op 83660 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrStdJSON-10 1708 10538655 ns/op 4810026 B/op 106568 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobuf-10 35932 501989 ns/op 1115832 B/op 3720 allocs/op
BenchmarkMarshalUnmarshal/MultiVTShallowProtobuf-10 47934 375503 ns/op 1086071 B/op 3720 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFull-10 1634 11023805 ns/op 5920839 B/op 109992 allocs/op
BenchmarkMarshalUnmarshal/MultiVTShallowProtobufFull-10 1663 10791995 ns/op 5891310 B/op 109992 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowProtobufFullGoJSON-10 2664 6739939 ns/op 4629145 B/op 64184 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowVTProtobufFullGoJSON-10 2758 6540159 ns/op 4598962 B/op 64183 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrGoJSON-10 2924 6153722 ns/op 3498135 B/op 60679 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowEventStdJSON-10 4365 4117105 ns/op 1750289 B/op 5591 allocs/op
BenchmarkMarshalUnmarshal/MultiShallowEventGoJSON-10 12261 1467770 ns/op 1992242 B/op 1566 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventStdJSON-10 1602 11206129 ns/op 4904679 B/op 107802 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventSructFormCBORL-10 2084 8615209 ns/op 5840472 B/op 92738 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventGoJSON-10 2943 6140325 ns/op 3641569 B/op 59197 allocs/op
BenchmarkMarshalUnmarshal/MultiDeepEventCBORL-10 2551 7022336 ns/op 3405496 B/op 96056 allocs/op
PASS
ok github.com/elastic/elastic-agent-shipper-client/benchmark 634.441s
Comparison against original protobuf implementation
Condition | Normalized to Original Protobuf (orig/condition) ns/op |
---|---|
Single Field, mapstr, stdlib json | 1.89 |
Single Field, vtprotbuf | 3.82 |
Single Field, Shallow Protobuf, goccy json Fields, Meta | 3.27 |
Single Field, Shallow Protobuf, vtprotobufgoccy json Fields, Meta | 3.38 |
Single Field, Deep Event, goccy json | 3.20 |
Multi Field, mapstr, stdlib json | 1.83 |
Multi Field, vtprotbuf | 3.87 |
Multi Field, Shallow Protobuf, goccy json Fields, Meta | 2.87 |
Multi Field, Shallow Protobuf, vtprotobufgoccy json Fields, Meta | 2.96 |
Multi Field, Deep Event, goccy json | 3.15 |
Observations
- vtprotobuf gives us the best speed with access to all the fields in
Fields
andMeta
from elastic-agent-shipper.
We spoke about this in the shipper project team meeting, and the vtprotobuf implementation is so much faster that it seems obvious that we should use it. It is however not compatible with messages serialized using the default protobuf compiler: https://github.com/planetscale/vtprotobuf#mixing-protobuf-implementations-with-grpc
One way to take advantage of the vtprotobuf optimizations for Go code that can use it is to keep the existing protobuf message definitions, but also support the "wrapped JSON" message proposed in the issue description for clients that can't support an optimizied protobuf implementation. #288 (comment)
This would allow us to use the fastest implementation for the majority of the shipper clients which will be Beats or other Go processes but gives us a fallback for clients like endpoint security that wouldn't be compatible with the optimized vtprotobuf RPC. There was general support of this approach.
@leehinman one thing I'm not sure we considered is that in the real system Beats will be converting from beat.Event
-> protobuf
and then from protobuf
-> beat.Event
again on the server side.
Should we benchmark the performance of these beat.Event
conversions specifically? Or are you confident that you benchmark accurately represents the performance we'll see already?
from elastic-agent-shipper.
@leehinman one thing I'm not sure we considered is that in the real system Beats will be converting from
beat.Event
->protobuf
and then fromprotobuf
->beat.Event
again on the server side.Should we benchmark the performance of these
beat.Event
conversions specifically? Or are you confident that you benchmark accurately represents the performance we'll see already?
The tests include Marshal & Unmarshal so messages.Event -> Marshal -> UnMarshal -> messages.Event:
func rtMessagesEvent(m *messages.Event) {
b, err := proto.Marshal(m)
if err != nil {
panic(err)
}
new := messages.Event{}
err = proto.Unmarshal(b, &new)
if err != nil {
panic(err)
}
}
We could test beat.Event -> messages.Event -> Marshal -> UnMarshal -> messages.Event -> beat.Event, but I think the largest time is spent Marshal/Unmarshal.
from elastic-agent-shipper.
The conversion from beat.Event
to messages.Event
is fairly complex so I think it is worth testing:
The conversion of the mapstrs to messages.Struct
might be quite costly.
from elastic-agent-shipper.
Results adding beat.Event conversion
Updated code at https://github.com/leehinman/elastic-agent-shipper-client/tree/vtprotobuf
New test case is called BeatVTmessageEvent
Raw Results
% go test -bench=. -benchmem -benchtime 15s -timeout 30m
goos: darwin
goarch: arm64
pkg: github.com/elastic/elastic-agent-shipper-client/benchmark
BenchmarkMarshalUnmarshal/SingleOriginalProtobuf-10 4322 4049273 ns/op 1736506 B/op 43560 allocs/op
BenchmarkMarshalUnmarshal/SingleVTProtobuf-10 13071 1372157 ns/op 1243713 B/op 27060 allocs/op
BenchmarkMarshalUnmarshal/SingleMapStrGoJSON-10 10000 1562388 ns/op 1106850 B/op 19332 allocs/op
BenchmarkMarshalUnmarshal/SingleBeatVTmessageEvent-10 6410 2809052 ns/op 2827978 B/op 57088 allocs/op
BenchmarkMarshalUnmarshal/MultiOriginalProtobuf-10 1418 12723039 ns/op 5403007 B/op 138160 allocs/op
BenchmarkMarshalUnmarshal/MultiVTProtobuf-10 4140 4368486 ns/op 3840886 B/op 83660 allocs/op
BenchmarkMarshalUnmarshal/MultiMapStrGoJSON-10 3626 4912205 ns/op 3499173 B/op 60677 allocs/op
BenchmarkMarshalUnmarshal/MultiBeatVTmessageEvent-10 2091 8558858 ns/op 8751094 B/op 174991 allocs/op
PASS
ok github.com/elastic/elastic-agent-shipper-client/benchmark 159.178s
Observations
- Adding the beat.Event conversions was pretty close to doubling ns/op (VTProtobuf vs BeatVTmessageEvent)
- Adding the beat.Event conversions was pretty close to doubling the allocs/op (VTProtobuf vs BeatVTmessageEvent)
- Even with beat.Event conversion vtprotobuf is still faster than the original protobuf implementation
Follow up questions
- Is the messages.Event struct bringing significant value over the beat.Event struct to warrant the extra conversions
Roundtrip code is copy of what is currently done in beats shipper client and server
func rtBeatsVTMessagesEvent(e *beat.Event) {
meta, err := helpers.NewValue(e.Meta)
if err != nil {
panic(err)
}
fields, err := helpers.NewValue(e.Fields)
if err != nil {
panic(err)
}
source := &messages.Source{}
ds := &messages.DataStream{}
inputIDVal, err := e.Meta.GetValue("input_id")
if err != nil {
panic(err)
}
source.InputId, _ = inputIDVal.(string)
streamIDVal, err := e.Meta.GetValue("stream_id")
if err != nil {
panic(err)
}
source.StreamId, _ = streamIDVal.(string)
dsType, err := e.Fields.GetValue("data_stream.type")
if err != nil {
panic(err)
}
ds.Type, _ = dsType.(string)
dsNamespace, err := e.Fields.GetValue("data_stream.namespace")
if err != nil {
panic(err)
}
ds.Namespace, _ = dsNamespace.(string)
dsDataset, err := e.Fields.GetValue("data_stream.dataset")
if err != nil {
panic(err)
}
ds.Dataset, _ = dsDataset.(string)
m := &messages.Event{
Timestamp: timestamppb.New(e.Timestamp),
Metadata: meta.GetStructValue(),
Fields: fields.GetStructValue(),
Source: source,
DataStream: ds,
}
b, err := m.MarshalVT()
if err != nil {
panic(err)
}
newMessage := &messages.Event{}
err = newMessage.UnmarshalVT(b)
if err != nil {
panic(err)
}
be := &beat.Event{}
be.Timestamp = newMessage.Timestamp.AsTime()
be.Fields = helpers.AsMap(newMessage.Fields)
be.Meta = helpers.AsMap(newMessage.Metadata)
}
from elastic-agent-shipper.
Is the messages.Event struct bringing significant value over the beat.Event struct to warrant the extra conversions
Probably not. messages.Event
was proposed before we realized that reinventing the Beat event pipeline in a standalone shipper. Now that the shipper itself will be a Beat, we could define a message type that more closely maps to beat.Event
and minimizes the overhead.
from elastic-agent-shipper.
We could also consider just serializing beat.Event
to JSON and wrapping it in protobuf to be able to pull the metadata out easily.
from elastic-agent-shipper.
Spoke about this today, and decided the shipper should just accept the same JSON that an input would be writing to ES without the shipper.
This would give us an event format like the following:
message Event {
// JSON serialized event to be written to ES.
oneof event {
bytes json = 1;
}
}
This is using a oneof type because we made find or decide that we could benefit from a more specialized protobuf event in the future (particularly given the results above showing vtprotobuf serialization is faster than JSON), but we can start from where we are today.
We will have to document the structure of the JSON event in the comments, for example the data stream fields are mandatory for the shipper to be able to construct the index name and apply processors.
from elastic-agent-shipper.
@leehinman @cmacknz now that we have measured the performances differences: #288 (comment)
Can we close this issue as done?
from elastic-agent-shipper.
Spoke about this today, and decided the shipper should just accept the same JSON that an input would be writing to ES without the shipper.
Is there an active PR for this change?
from elastic-agent-shipper.
Related Issues (20)
- Build 94 for main with status FAILURE
- Add a counter for the number of bytes transmitted by the shipper HOT 2
- Build 95 for main with status FAILURE
- Build 3 for 8.7 with status FAILURE
- Enable mTLS on connections between inputs and the shipper HOT 1
- Build 96 for main with status FAILURE
- Build 98 for main with status FAILURE
- Implement design to handle queued events on policy update
- Build 16 for 8.6 with status FAILURE
- Add document equivalency test case to end to end tests HOT 1
- Decide what the default shipper Elasticsearch output configuration should be
- Migrate the default global Beat processors to the shipper HOT 1
- run docker integration tests during CI
- TestPersistedIndex/server_should_properly_shutdown is flaky HOT 1
- The Elasticsearch output should not report itself as degraded based only on the time between events HOT 3
- Archive this repository and remove the elastic-agent-shipper binary from the release process HOT 2
- Build 8 for 8.7 with status FAILURE
- ensure meta data isn't overwritten (shipper should not replace input, etc.)
- Build 111 for main with status FAILURE
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastic-agent-shipper.