Comments (27)
I've been observing the CPU usage and it seems that it keeps going upwards every day. From 6-8% 4 days ago it went up to around 10-12%. I'm really afraid to deploy it in production, I see it as a huge risk. I have to revert back to 1.3.1.
If anyone figures out a solution to this, please, share it with us
from surrealdb.
1. In my case and since I've been using SurrealDB, no.
2. In my current case and for this app, no.
3. Difficult to assess, but in my case, I can say well before version 1.0.0, i.e. v0.x.x, so since October/November 2023.
4. 0, I've had no use for it.
5. I uploaded this video as soon as SurrealDB was launched with the --trace
flag #3906 (comment). The logs are still the same, just like the ticket author's (for docker). If you need anything else, give us instructions on how to get exactly the logs you're interested in.
6. In my case, I've been running SurrealDB since I started using it without logs (--trace
) because everything was working normally, there was nothing special to observe. I only used the logs to make the video above. In dev mode, here's the command i use surrealdb start --allow-all -u root -p root file:///...
7. In my case, no no, not necessary for the application in its current state.
The novelty of this bug is that when I CTRL+C to stop the server, I get a Failed to send shutdown signal to task: sending on a closed channel
error as in the video, which I didn't get before. Another observation is that this bug is variable: one day everything's fine, and another day we see excessive memory consumption.
from surrealdb.
That's really useful, thank you! Particularly about the channel - I saw something like that in development and believed it was resolved; Will have a look as you shouldn't see the channel closed
from surrealdb.
- number of namespaces: 6 created by me
- number of databases: each namespace contains one database
- number of tables: each database contains 4 tables (SCHEMALESS)
- number of records :
- 2 tables with 300 records (which I've limited to 300, otherwise there are many more - Select (frequently), Delete and Create (Not very frequent), when a new entry is added, the oldest is automatically deleted (1 to 2 changes per day depending on interface). no Update)
- 2 tables with currently 7 records (creation, modification, deletion - Not very frequent - 1 to 2 changes or not per day)
- do you use events: no
- approximate number of open database connections: Only one in dev mode
- nature of queries (number of selections, updates, creations and others) : C.R.U.D and especially Select. Creations, deletions and modifications are rare. Data is known and pre-loaded, transformed once from CSV into RUST..
I took the opportunity to run a test with and without a backup created with older versions of surrealdb.
- Running surrealdb v1.4.x without a backup, starting from 0 in a new folder with no load ramp-up, high memory consumption (just launch, no select, delete or update...).
without.backup.mp4
- Running surrealdb v1.4.x with a backup created from previous versions and then modified (export/import) as surrealdb is updated, with an immediate increase in load and memory consumption (just launch, no selection, deletion or update...).
with.backup.mp4
from surrealdb.
1.5.0-beta1 is out, please try it if you have time. It should resolve the issue.
https://deploy-preview-564--surrealdb-web.netlify.app/releases/#v1-5-0-beta-1
from surrealdb.
Closing as this is resolved in 1.5.0-beta1, but can be mitigated on 1.4.x with --tick-interval 10s
from surrealdb.
I just tried removing my old database file (the one I created with v1.0.0) and started 1.4.0. The cpu usage is back to the 0-1% range. It seems like the problem is with something in the database file itself.
from surrealdb.
I noticed after about a week of normal operation the process started idling at 6-8%. I updated to 1.4.2 a few days ago, but not sure if it has anything to do with it.
from surrealdb.
Hi @phughk
I'm not using LiveQueries, ChangeFeeds
(3) I've exported and imported my data in a new Instance of SurrealDB 1.4.0, the CPU usage was high after 12 hours (overnight) when I checked.
(6) Running without logs has no effect in CPU usage, it remains high either way.
(5) Sure, I did a backup after the clean import, and another backup after 12 hours (docker volumes)
No matter which machine I use (arm/x86), the new one has 0% CPU usage, and the aged one has 16% constant CPU usage, the logs seem the same.
I've attached both logs, also included the tree structure of the database folder at both snapshots in the top.
High-CPU-Usage-(12H-aged).txt
No-CPU-Usage-(New-Instance).txt
Hope this helps.
I did not see Failed to send shutdown signal to task: sending on a closed channel
in my instance when stopping.
I suspected if Events were causing this, testing by inserting some data in tables with events set up, but to no avail.
from surrealdb.
- Yes, I am using Live Queries
- A few months, exported only about 2MB... (High usage is usually after a restart)
- Not sure, I'm using JS beta5 and the lib always logs this (haven't used them since the beta got released):
from surrealdb.
Hi @phughk
I ran the instance only for a minute, the CPU usage is high in a few seconds (the problem won't go away with restarts once started)
The first time to spot the issue was after 12 hours in a clean imported instance.
I've got the flamegraphs (for linux-amd64 bare-metal) highlighting the CPU usage from both versions 1.4.0 and 1.4.2
flamegraphs.tar.gz (Download XML)
I did see the Failed to send shutdown signal to task: sending on a closed channel
message this time.
I'll be happy to provide more details if needed.
Edit: Also confirmed this issue with v1.4.0.beta.1 release.
from surrealdb.
Hi @phughk
- 1 Namespace
- 6 Databases (2 databases have Events).
- 28 to 30 Tables.
- 500K+ records.
- I do use events.
- max 2 to 3 connections for a short time (then no connections most of the time, this issue started probably when there was no connections)
- 60% select, 30% create, 6% update, 4% delete (CPU usage will be high even without any connections / running queries)
Was able to get a flamegraph from a debug build of 1.4.0-beta-1, but limited samples for 10 seconds (CPU usage was high the entire time)
flamegraph-debug-10sec.tar.gz (Download XML)
I can try getting a full minute flamegraph from debug build with even fewer samples, if the above one is insufficient.
I've created another clean instance (1.4.0-beta-1) with a single database including events, if the issue crops up there, I'll have full logs to share.
from surrealdb.
As expected, I was able to recreate this time, after letting the instance (1.4.0-beta.1 arm64-docker) run for about 12 hours, I saw 2% constant CPU usage, without any connections / queries.
I backed up this instance (entire database file) at 4 points in time.
- Immediately after creation (0 min)
- After importing 1 database with events (15 min)
- After performing all operations in that database (30 min and 200+ queries)
- After 12 hours of idling (12 hours and 20 queries)
I used these backups to run with a debug build (1.4.0-beta.1 amd64-local) for each snapshots (each 1 minute runtime).
I'm attaching only the perfs this time, (flamegraph
takes really long time to generate .svg from debug perfs)
debug-perfs.tar.gz
Thank you for introducing me to flamegraphs
, I've found yet another amazing tool Hotspot
perfs after 12 hours (snapshot 4)
perfs before 12 hours (snapshot 3)
perfs before and after 12h, despite very little activity in between the snapshots, CPU usage became high.
from surrealdb.
Hey @micisse , thanks. Those are just error messages, it isn't actually a failure. When a channel is closed on one side, the other side failed to send and stopped as well. We will remove the error message, but they are not a major concern.
The delr is not a concern and will also be removed from trace logs.
from surrealdb.
My environment is same, surrealdb in docker.
Firstly, I tried exporting (in 1.4.0 itself) and importing all my databases manually (to a new instance), this did lower the CPU usage back to near 0%, but still had the logs as before.
This new instance got to same 25% CPU usage after a day or so.
I just created another new instance with only one database imported. Seems fine for more than a day. The logs were present nevertheless.
This log surrealdb_core::kvs::tx: Delr page was empty
, seems to be printed for the number of databases in the instance.
In addition to the that.
I could see some other failed attempts to connect to port 4317, not sure if this relates to CPU usage.
surrealdb | 2024-04-18T12:32:41.355797Z TRACE tonic::transport::service::reconnect: poll_ready; connecting
surrealdb | 2024-04-18T12:32:41.355811Z TRACE hyper::client::connect::http: connect error for 127.0.0.1:4317: ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })
surrealdb | 2024-04-18T12:32:41.355821Z TRACE tonic::transport::service::reconnect: poll_ready; error
surrealdb | 2024-04-18T12:32:41.355824Z DEBUG tonic::transport::service::reconnect: reconnect::poll_ready: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }))
surrealdb | 2024-04-18T12:32:41.355830Z DEBUG tower::buffer::worker: service.ready=true processing request
surrealdb | 2024-04-18T12:32:41.355836Z TRACE tonic::transport::service::reconnect: Reconnect::call
surrealdb | 2024-04-18T12:32:41.355839Z DEBUG tonic::transport::service::reconnect: error: error trying to connect: tcp connect error: Connection refused (os error 111)
from surrealdb.
also seeing this
from surrealdb.
Same here 1.4.0
and 1.4.2
, sitting between 75 and 100% CPU usage. RAM keeps cycling between 220 and 320MB... No queries running
Screen.Recording.2024-04-21.at.10.45.21.mov
from surrealdb.
It's not just related to docker, just running surrealdb start ...
causes this problem and hogs all the memory, when I stop it the machine goes back to normal. It seems to launch several instances in a single run.
sample.mp4
Linux
SurrealDB 1.4.0 for linux on x86_64 & SurrealDB 1.4.2 for linux on x86_64
from surrealdb.
+1
from surrealdb.
Don't dare deploy it in production, it's an existential problem that has appeared since certain versions. I've never had this problem before. We tried to downgrade to earlier versions, but there were compatibility errors with our backups. New versions work with backups created with older versions, but not older versions, which display compatibility errors. This problem also exists on Windows, apparently describing the same problem on the same releases. Since opening the ticket, the team has not yet reacted, so you'll have to wait if you don't have a database alternative.
Bug: Massive performance degradation on v1.4.2 #3935
from surrealdb.
Hi @iteroji, @Ragavendaran, @MerryOscar, @Micnubinub, @micisse, @KristjanVall
We have some ideas, but don't have access to your environments - could you please share some information on the above to help us debug?
- Are you using live queries
- Are you using change feeds
- How long have you been running the DBMS (SurrealDB) (creation data, approximate size)
- How long have you been running change feeds and the retention size
- Could you please attach the output logs at trace logging level
- Could you try running without trace logs to see if that reduces usage please
- Are you using Live queries v2?
We have some ideas what it could be, but we would like to get some confirmation so any help will be useful on the above questions.
Thanks for the help in finding this and helping resolve,
Hugh
from surrealdb.
Hi @Ragavendaran thank you for providing those logs. I am a bit surprised that the timestamps don't indicate that the db was running for a long time. For example the first log message is
47 surrealdb | 2024-05-02T11:45:17.700722Z TRACE tower::buffer::worker: worker polling for next message
and the last is
1072 surrealdb | 2024-05-02T11:45:59.903767Z TRACE surrealdb_core::kvs::tx: Delr page was empty
As far as I can see, that indicates about a single minute of logging.
We are going to try and reproduce this internally, but were wondering if you may have better log samples. Also if you are able to get a DTrace from the instance, that would be fantastic. A tool I have been using for that is flamegraph https://github.com/flamegraph-rs/flamegraph?tab=readme-ov-file
from surrealdb.
Hi everyone thank you for sharing all the information, it is very useful.
The flamegraphs indicate that there are 3 main parts to this, and the largest part seems to be accessing rocksdb (libc.so.6). Because you are all running release builds, we don't have debug information in the flamegraphs and cannot decipher easily exactly what the above functions are. However we can speculate that since we have polling functions present, this is to do with that. And the Delr
message may be part of the issue.
Can you please confirm @iteroji, @Ragavendaran, @MerryOscar, @Micnubinub, @micisse, @KristjanVall
- number of namespaces
- number of databases
- number of tables
- number of records
- are you using events
- approximately how many connections you have open to the database
- nature of requests (number of select vs update vs create vs other)
We are continuing trying to reproduce this locally but can't immediately see how this is happening for you. We can confirm #3952 and have a fix in the works for that.
from surrealdb.
Hey @Ragavendaran 2% passive usage is tolerable, as the database is doing liveness checks. I can see you mentioned that after 12h the CPU was high, but how high? 2% or higher? and was this constant or simply spikes? Thank you for providing this information and captures, looking at them now. The perfdata files are missing the corresponding stack files - do you have these by the chance?
from surrealdb.
Hi @phughk, there were no changes to the database before and after 12hours of idle, there was no CPU usage in the beginning (snapshot 3), and 2 to 4 % after the 12hour idle (snapshot 4). It spikes every 1 second and seems constant in htop
.
Yes, 2 to 4 % CPU usage in this test instance, lower than the ~25% CPU usage in the original instance with 6 databases.
I believe, the perf command records stack frames
perf record --call-graph dwarf ...
I'm not sure about the stack files. If there is a specific perf command to use, and some directions to get the stack files, I'll be able to get them.
After your suggestion, I'll try getting the perfs again for this test instance and the original instance with 25% CPU usage.
from surrealdb.
Yeah, this is the polling rate for change feeds and live queries. We have changed it back to the 10s polling rate that it was in 1.3, as 1.4 changed it to 1s; 1.5 will have the fix for this. Thanks all for your cooperation in resolving this
@iteroji @Ragavendaran @MerryOscar @Micnubinub @micisse @KristjanVall
Will close this ticket after 1.5 is tested by you all
from surrealdb.
Hi @phughk, i've been using v1.5.0-beta.1 since this Saturday afternoon and I haven't noticed any load increase. I've only been using it for a day and a half and so far there's nothing to report about the problem that caused the ticket.
However, I'm still seeing these messages and/or errors, as in the video below (view in full):
- ...TRACE surrealdb_core::kvs::tx: Delr page was empty, but this time with a certain delay and still in bursts.
- Server shutdown error (CTRL+C) ...ERROR surreal::cli::start: Failed to send shutdown signal to task: sending on a closed channel
perhaps planned in another fix
Surrealdb.v1.5.0-beta.1.test.mp4
Have a nice day
from surrealdb.
Related Issues (20)
- Feature: Omit in fetch
- Feature: Live Query WHERE clause should process Params HOT 6
- Bug: Delr is displayed in trace logs of releases HOT 1
- Bug: I upgraded my server surreal version and lost all my data HOT 9
- Bug: OPTION IMPORT causes bugs with FTS indexes
- Bug: subquery runs create twice
- Feature: Embedded replicas
- Bug: TIMEOUT not respected inside RETURN statement
- Bug: Rust fails to build surrealdb HOT 11
- Bug: Parser interpretes string id with starting with number followed by 'e' as exponenets HOT 2
- Bug: COMPOSITE INDEX does not work
- Bug: 'Unsupported value' in no-index WHERE condition, causes INDEX to not be used HOT 1
- Bug: The alphabetical order of the properties of an object-based Record ID is query-significant HOT 1
- Bug: HTTP CREATE not working when using TOKEN auth HOT 3
- Bug: Can't define a schema with flexible object keys HOT 3
- Bug: database fails and stop on attempt to delete namespace HOT 1
- Bug: the parse::url::port() does not work on https url
- Bug: Parse error when using RELATE query from docs
- Feature: NN Vector search and similarity masking (partial vector search)
- Bug: Floats treated as integers and duplicate records not checked when floats used as record id
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from surrealdb.