Brief Summary
Ruby Server SDK calls Eventsource with large payloads of JSON data, causing Rails Puma workers to timeout on the default 90s timeout. We have identified likely causes which will be highlighted below.
Buffer Indexing
The code for handling streaming responses chunks on record breaks (/\r\n/
approx), but uses a mechanism which can lead to exponential growth on large data
rows:
def read_line
loop do
@lock.synchronize do
i = @buffer.index(/[\r\n]/)
if !i.nil? && !(i == @buffer.length - 1 && @buffer[i] == "\r")
i += 1 if (@buffer[i] == "\r" && @buffer[i + 1] == "\n")
return @buffer.slice!(0, i + 1).force_encoding(Encoding::UTF_8)
end
end
return nil if !read_chunk_into_buffer
end
end
Specifically this line is worrying:
i = @buffer.index(/[\r\n]/)
When LD is initialized I believe that it tries to send the entirety of the payload in one line of data, but server responses are chunked streams. If the full response is 150Mb (not theoretical) and the chunk size is say 1Mb (guessing for example) that means it won't hit a record-break until 150Mb.
In other words, it does this (-
is one chunk):
buffer = ''
buffer.index(/[\r\n]/)
buffer = '-'
buffer.index(/[\r\n]/)
buffer = '--'
buffer.index(/[\r\n]/)
buffer = '---'
buffer.index(/[\r\n]/)
buffer = '----'
buffer.index(/[\r\n]/)
buffer = '-----'
buffer.index(/[\r\n]/)
# ...
buffer = '-' * 150
buffer.index(/[\r\n]/)
Repeat until you get to 150Mb from what I think might be happening and you've done a substantial number of reads over the same data.
The problem is that eventsource is reading the entirety of the buffer and storing the entirety of the response, rather than checking for record breaks in each chunk.
Stream Init
Why does that become an issue? I believe this chunk of code is relevant:
message = JSON.parse(message.data, symbolize_names: true)
all_data = Impl::Model.make_all_store_data(message[:data])
@feature_store.init(all_data)
@initialized.make_true
...and that won't be triggered by conn.on_event { |event| process_message(event) }
(src) until that entire JSON payload is processed from chunks.
Summary
We believe that the Ruby Server SDK is trying to retrieve the entirety of the flag data as a singular response from Ruby Eventsource, and for large clients (150Mb+) this will cause crashes.
Current experimentation I've been working on has been to prevent full reads of a partial buffer, but I believe this reflects more on the entirety of flag data being sent in a single message. If it truly is all being sent in a single response this will be a severe issue for larger clients.