Comments (13)
@sanimej any idea?
Sounds like the exclusive lock on the file descriptor is not released properly and locks the whole database on subsequent calls to New
.
from libkv.
@abronan I didn't get a chance to look into this today.. but one possibility is when daemon is killed ungracefully it leaves the boltdb lock in an inconsistent state..
from libkv.
I've tried several times to kill docker using
kill -9 `cat /var/run/docker.pid`
Everything seems fine. I can successfully start daemon again.
I suspect if there is another process that holds the lock cause file lock will be released when process which holds the file descriptor terminates, see flock
document http://man7.org/linux/man-pages/man2/flock.2.html
Furthermore, the lock is released either by an explicit LOCK_UN
operation on any of these duplicate descriptors, or when all such
descriptors have been closed.
from libkv.
@chenchun Have you tried to do a few save/restores before stopping the process?
@mavenugo Any steps to reproduce that issue (even if it's hard to trigger)? I have the same feeling than the one described above and another process could have been running during your tests which was holding the lock preventing any other process to access the DB (the stacktrace shows clearly that it blocks at flock
).
from libkv.
Yes, with the localstore change for libnetwork, save/restores will happen on creating networks during starting of daemon.
from libkv.
@abronan i couldnt think of a particular test case to reproduce this consistently. I hit it once and havent seen it again. but am concerned about this issue as the save/restore becomes fundamental to the design.
from libkv.
@abronan @sanimej @chenchun okay, I know exactly how to reproduce this issue now :)
Its the case of another application holding the lock on boltdb.
So in my case, I was running dnet
daemon in libnetwork which was holding the lock on the boltdb database. and when Docker daemon tried to get a lock on it, it was just waiting forever.
This would obviously cause an issue with the case of running multiple docker daemons in parallel.
and infact, I think it is incorrect to have the same boltdb database shared between these docker daemons. Hence we need a way to dedicate a db per daemon instance.
ping @mrjana as we were discussing on a similar situation for another case.
At this point, I dont think it is an boltdb or libkv issue. libnetwork must handle this situation.
I will keep this issue open so that we can continue discussing on it.
from libkv.
@mavenugo Ok yes in this case I think it make sense to use a separate DB for each role. Not really a boltdb
(nor libkv) issue in that case as this is by design.
from libkv.
@abronan am just thinking if we can rather fail the request if there is another process already holding on the lock instead of waiting for ever.
from libkv.
@abronan i will push a patch to the boltdb driver to honor the timeout set by the caller.
from libkv.
@mavenugo Yes I guess that we can include a timeout
like other backends for boltdb. Easy to implement and gives more inputs to users of the lib if something went wrong.
from libkv.
@abronan PR on its way :)
from libkv.
👍
from libkv.
Related Issues (20)
- do libkv support etcd v3? HOT 1
- bug in WatchTree for etcd
- Add function to list all the key-values in the boltdb store HOT 1
- add option to disable quorum for gets with etcd backend HOT 1
- Lock() of Locker interface should probably take a receive-only channel
- In etcd at least, waitLock() ignores an action HOT 1
- A race is possible in etcd's Lock() HOT 1
- Implement a Kubernetes driver
- Panic in older versions of consul
- Active again? HOT 2
- Optionally disable ZooKeeper logging
- Inconsistent reference to license for docs
- When multiple etcd stores are passed to the API. Which one is given preference? HOT 1
- can not get zookeeper path data, node cannot be discovery.
- panic in List() method with boltdb backend
- zookeeper GetW function seems has a atomic error
- possible data race in AddStore()
- implemented a etcd v3 api store HOT 1
- Travis-ci: AMD64 build is failing HOT 1
- Valkeyrie: a maintained fork created by the original author of libkv HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libkv.