Comments (7)
I have the same problem. My records are in avro format and the topic consists of over two million records. The filtering takes also extremely long - couple of hours. My topic has 2 partitions and in comparison paging through records in kafka is very fast.
from kafka-webview.
Hey @xstephen95x apologies for the slow reply, I've been out on travel.
I haven't seen such poor performance from the filtering logic even in topics with hundreds of millions of records. The filtering is done via a Kafka interceptor here.
How many partitions does your topic have? Do you have similar performance issues using the websocket streams vs just paging through records in kafka?
from kafka-webview.
I have a python script for reading from kafka topics (protobufs), and it reads a couple thousand records per second. So theres a considerable slowdown somewhere in the stack.
Perhaps its the deserializer? Perhaps its the interceptor you linked here? No way to know without perf profiling. Do you have a recommended way to do perf analysis on this? I've never perfed java, just c/c++.
Topic only has 1 or 2 partitions, but i dont think thats related.
Just timed it, it filters about 300 records/minute. so with millions of records it will take days.
from kafka-webview.
I've tried using https://github.com/jvm-profiling-tools/perf-map-agent to run perf-top, and i've not been able to get anything useful.
In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.
java.lang.NullPointerException: null
at org.sourcelab.kafka.webview.ui.controller.stream.StreamController.getLoggedInUser(StreamController.java:189) ~[classes/:na]
and yes i have anon access set up
from kafka-webview.
I have found that during the filtering of each record:
2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.
gets logged to stdout.
from kafka-webview.
Hey @xstephen95x thanks for the detailed responses! I'll try to hit all of them here, but let me know if I missed something.
In terms of streaming vs paging, I have not been able to successfully stream. I see a null pointer exception each time i try to do stream.
Which version are you running of Kafka-WebView? That sounds a lot like a bug fixed in 2.1.3 Issue-127 Let me know if you're running version 2.1.3 or newer, and I may need to revisit this. If you're running version 2.1.2 or older, upgrading should resolve this issue.
I have found that during the filtering of each record:
2019-02-27 17:29:02.961 WARN 25762 --- [p-nio-80-exec-3] o.a.k.clients.consumer.ConsumerConfig : The configuration 'RecordFilterInterceptor.recordFilterDefinitions' was supplied but isn't a known config.
gets logged to stdout.
I think that is considered "normal" Basically if you define any non-standard configuration property that the kafka library isn't explicitly aware of, it will toss out that warning. In this case, I set a custom property to configure kafka-webviews record filter.
RE: The performance issue. The fact that you have a small number of partitions, and it sounds like paging thru the topic without filtering enabled, definitely makes me believe something is up with the filtering logic, I must be doing something silly, I just can't seem to spot it with my eyes. I believe you're right, performance profiling is going to be the best way to determine the cause here. Short of doing that, I may be able to put together a custom build for you that adds debug timing log statements to help track down the source. Is this something you would be interested in trying if I put together?
from kafka-webview.
Thank you for all of your responses.
So, I upgraded to 2.1.4, and ran from the compiled jar instead of ./buildAndRun.sh
, and i am now getting about 800 records filtered / minute. Good speed up, but still gonna take days to filter millions of records. So i believe thats around 80ms per record, which isn't that great.
I've been working on getting a perf analysis, but im having a hard time getting it to work with the jvm.
I've attached a flamegraph from my last attempt.
If perf isn't going to cooperate, then yes perhaps the best option is to start logging timestamps.
Although, i would also need to add them in my deserializer and filter, so not sure the best way to go about that.
from kafka-webview.
Related Issues (20)
- Lets add warning log msgs when ignoring or overwriting user defined configuration values for deserializers HOT 1
- Kafka-Webview should provide option to "Stringify" ByteArray messages HOT 2
- Failing to connect to GSSAPI/SSL cluster HOT 2
- Custom protobuf deserializer failed because of pb version conflict HOT 2
- On View page, the "previous button" jump is twice as big as the "next button" jump. HOT 1
- use custom config.yml with docker
- Repository with ID="orgsourcelab-1031" not found HOT 1
- Expose actuator/health path without login HOT 1
- Selecting a Partition filter from a 'Stream' persists the partition as an enforced filter.
- Allow Filtering on Views Page HOT 3
- Consumer poll timeout is hardcoded
- How to Build this as a Single Jar/War file to deploy in server which doesn't have MVN/JDK HOT 2
- Disable server host name verification HOT 3
- "Unknown magic byte!" when deserializing avro message with TopicRecordNameStrategy HOT 2
- Mistake in environment variable name HOT 2
- Feature Request: Make `requestTimeoutMs` configurable per cluster HOT 3
- Security Update HOT 3
- [Docker] Execute web application fail while upgrade from v2.4.0 to 2.8.1 HOT 5
- Is webview vulnerable to Spring4Shell vulnerablity (CVE-2022-22965)? HOT 3
- unable to connect to AWS MSK clusters HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-webview.