Comments (12)
@pomidoroshev I guess it is not relative to bolt
try this one ๐
import (
"strconv"
)
func main() {
for i := 10000000; i < 20000000; i++ {
go func(i int) {
strconv.Itoa(i)
return
}(i)
}
}
consumes about 20gb of ram
from bbolt.
@alexmironof ...aaand I was wrong. This implementation allocates just ~20M RAM:
for i := 10000; i < 20000; i++ {
db.Update(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("test"))
for j := 0; j < 1000; j++ {
key := strconv.Itoa(i*1000 + j)
b.Put([]byte(key), []byte(""))
}
return nil
})
}
Still trying to rewrite it with magic db.Batch()
and goroutines.
from bbolt.
@alexmironof
when I add time.Sleep(time.Microsecond) I have to wait forever when 10M loop will ends, it goes VERY slow.
Really? This code is executed in ~2:30 minutes on my machine:
package main
import (
"log"
"runtime"
"strconv"
"time"
bolt "github.com/coreos/bbolt"
)
func main() {
db, _ := bolt.Open("test6.db", 0600, nil)
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("test"))
return nil
})
for i := 10000000; i < 20000000; i++ {
go func(i int) {
db.Batch(func(tx *bolt.Tx) error {
key := strconv.Itoa(i)
b := tx.Bucket([]byte("test"))
b.Put([]byte(key), []byte(""))
if i%100000 == 0 {
alloc, sys := getMemUsage()
log.Printf("Key = '%v', Alloc = %4vM, Sys = %4vM", key, alloc, sys)
}
return nil
})
}(i)
time.Sleep(time.Microsecond)
}
}
func getMemUsage() (uint64, uint64) {
var m runtime.MemStats
runtime.ReadMemStats(&m)
return bToMb(m.Alloc), bToMb(m.Sys)
}
func bToMb(b uint64) uint64 {
return b / 1024 / 1024
}
Output:
2018/05/15 22:33:18 Key = '10000000', Alloc = 0M, Sys = 3M
2018/05/15 22:33:20 Key = '10100000', Alloc = 2M, Sys = 12M
2018/05/15 22:33:21 Key = '10200000', Alloc = 2M, Sys = 13M
...
2018/05/15 22:35:51 Key = '19700000', Alloc = 6M, Sys = 16M
2018/05/15 22:35:52 Key = '19800000', Alloc = 5M, Sys = 16M
2018/05/15 22:35:54 Key = '19900000', Alloc = 4M, Sys = 16M
With you workaround it takes ~40 seconds and more RAM:
2018/05/15 22:40:12 Key = '10000000', Alloc = 0M, Sys = 1M
2018/05/15 22:40:12 Key = '10100000', Alloc = 54M, Sys = 301M
2018/05/15 22:40:13 Key = '10200000', Alloc = 53M, Sys = 307M
2018/05/15 22:40:13 Key = '10300000', Alloc = 93M, Sys = 308M
...
2018/05/15 22:40:50 Key = '19800000', Alloc = 90M, Sys = 328M
2018/05/15 22:40:50 Key = '19900000', Alloc = 90M, Sys = 328M
By the way db.Batch doesn't frees up memory after loop ends so the only way I found is to use debug.FreeOSMemory()
Yes, I also found an answer about it on SO:
The Go runtime will however return memory to the OS if it is not used for some time (which is usually around 5 minutes).
Thank you very much!
Summary:
db.Batch()
is bad-documented feature without examples that should be used with caution;- 10 million
Put()
calls inside onedb.Update()
- ok if you have enough RAM; - 10 million goroutines in loop - bad thing, and there's no magic pill.
from bbolt.
@pomidoroshev Cause you are trying to put all your values in one transaction, but bolt writes to disk and cleans memory only when transaction is finished. Inside transaction all keys/values goes into memory
from bbolt.
@alexmironof Yes, it's a bad example. But it's interesting that memory is not being released after this transaction and even after db.Close()
.
My original problem was with db.Batch()
, and maybe I still do not understand how it works.
E. g.:
var wg sync.WaitGroup
for i := 10000000; i < 20000000; i++ {
wg.Add(1)
go func(i int) {
key := strconv.Itoa(i)
db.Batch(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("test"))
b.Put([]byte(key), []byte(""))
return nil
})
if i%200000 == 0 {
alloc, sys := getMemUsage()
log.Printf("Key = '%v', Alloc = %4vM, Sys = %4vM", key, alloc, sys)
}
wg.Done()
}(i)
}
wg.Wait()
And it eats memory as well (and even more).
Maybe I should split batch logic, but I see that db.Batch()
commits transaction, when len(db.batch.calls) >= db.MaxBatchSize
https://github.com/coreos/bbolt/blob/master/db.go#L742, it uses mutex and, in theory, should wait for commit before starting new transaction. But memory still leaks.
I'm new to Go, and do not understand some things about concurrency. Can you, please, describe the proper way to add 10M objects to bucket in Bold?
from bbolt.
@alexmironof Nanosecond fixes everything!
for i := 10000000; i < 20000000; i++ {
go func(i int) {
strconv.Itoa(i)
return
}(i)
time.Sleep(time.Nanosecond)
}
from bbolt.
@alexmironof ...but not in this case. I still think it's about key length, but don't know what to to.
from bbolt.
Iโll try to resolve this tomorrow, but nice results already ๐๐
I think that strconv.Itoa
executes less than nanosecond so that trick with for loop works, but db.Batch
doesnโt...
from bbolt.
Also if you will write 10m keys inside db.Update
it will consume only 1,5gb ram
from bbolt.
@alexmironof
Also if you will write 10m keys inside db.Update it will consume only 1,5gb ram
Yes, my benchmark output is in the second part of original gist:
...
2018/05/14 10:06:40 Key = '19600000', Alloc = 1423M, Sys = 2099M
2018/05/14 10:06:40 Key = '19800000', Alloc = 1433M, Sys = 2099M
But memory is not being released after transaction and even after return
, if you put this code in a function. I have a bad feeling about this.
I think that strconv.Itoa executes less than nanosecond so that trick with for loop works, but db.Batch doesnโt...
Meet... the MICROSECOND!
for i := 10000000; i < 20000000; i++ {
go func(i int) {
db.Batch(func(tx *bolt.Tx) error {
// ...
})
}(i)
time.Sleep(time.Microsecond)
}
...
2018/05/15 00:46:06 Key = '19800000', Alloc = 6M, Sys = 17M
2018/05/15 00:46:07 Key = '19900000', Alloc = 4M, Sys = 17M
Stupid workaround, but it works. Also, for my case with UUID as keys I had to increase it to 100 microseconds :)
So key length really matters: long key โย slow Put()
โ slow goroutine โ crappy stack.
from bbolt.
@pomidoroshev when I add time.Sleep(time.Microsecond)
I have to wait forever when 10M loop will ends, it goes VERY slow.
But this workaround works for me:
var wg sync.WaitGroup
const limit = 100000
for i := 10000000; i <= 20000000; i++ {
if i%limit == 0 {
wg.Add(1)
}
go func(i int) {
db.Batch(func(tx *bolt.Tx) error {
// ...
})
if i%limit == 0 {
wg.Done()
}
}(i)
if i%limit == 0 {
wg.Wait()
}
By the way db.Batch
doesn't frees up memory after loop ends so the only way I found is to use debug.FreeOSMemory()
Also runtime.ReadMemStats
show incorrect stats, after debug.FreeOSMemory
from bbolt.
@pomidoroshev yeah thats weird, but I have top tier workstation with 24gb ram and newest Core I7 processor
In my example you can control amount of concurrent goroutines before wg.Wait
executes with limit
constant, it slows down programm a bit, but consumes less memory
from bbolt.
Related Issues (20)
- Hash integrity HOT 2
- `bbolt` CLI util panics / shows not-user-friendly errors in certain cases with incorrect (number of) arguments HOT 3
- tx.Commit() - function not implemented HOT 6
- Incremental etcd defrag HOT 6
- Update the document for the known data corruption issue caused by fast commit feature HOT 2
- Plan to release bbolt 1.3.9 HOT 1
- forced power off (Hyper-V/VMware Workstation) creates page cycle HOT 21
- Panic happens when opening a boltdb HOT 2
- Nil pointer dereference in bbolt 1.4.0-alpha.0 detected by robustness tests HOT 3
- Dramatic drop in sequential read performance in main compared to tag v1.3.8 HOT 41
- Can not distinguish between "does not exist" and "has no value" HOT 5
- Use cobra ExactArgs
- Rename cobra commands
- Cursor.Prev() out of range issue v1.3.9 HOT 4
- Add check-page sub command HOT 1
- Introduce benchmarking for PRs HOT 6
- Plan to release v1.3.10 HOT 5
- Race tests timing out trying to run on user forks
- Extend the maintainers of bbolt HOT 6
- Proposal: Add `ForceMunlock` option HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bbolt.