Giter Club home page Giter Club logo

Comments (13)

abronan avatar abronan commented on July 20, 2024

@sanimej any idea?

Sounds like the exclusive lock on the file descriptor is not released properly and locks the whole database on subsequent calls to New.

from libkv.

sanimej avatar sanimej commented on July 20, 2024

@abronan I didn't get a chance to look into this today.. but one possibility is when daemon is killed ungracefully it leaves the boltdb lock in an inconsistent state..

from libkv.

chenchun avatar chenchun commented on July 20, 2024

I've tried several times to kill docker using

kill -9 `cat /var/run/docker.pid`

Everything seems fine. I can successfully start daemon again.

I suspect if there is another process that holds the lock cause file lock will be released when process which holds the file descriptor terminates, see flock document http://man7.org/linux/man-pages/man2/flock.2.html

Furthermore, the lock is released either by an explicit LOCK_UN
operation on any of these duplicate descriptors, or when all such
descriptors have been closed.

from libkv.

abronan avatar abronan commented on July 20, 2024

@chenchun Have you tried to do a few save/restores before stopping the process?

@mavenugo Any steps to reproduce that issue (even if it's hard to trigger)? I have the same feeling than the one described above and another process could have been running during your tests which was holding the lock preventing any other process to access the DB (the stacktrace shows clearly that it blocks at flock).

from libkv.

chenchun avatar chenchun commented on July 20, 2024

Yes, with the localstore change for libnetwork, save/restores will happen on creating networks during starting of daemon.

from libkv.

mavenugo avatar mavenugo commented on July 20, 2024

@abronan i couldnt think of a particular test case to reproduce this consistently. I hit it once and havent seen it again. but am concerned about this issue as the save/restore becomes fundamental to the design.

from libkv.

mavenugo avatar mavenugo commented on July 20, 2024

@abronan @sanimej @chenchun okay, I know exactly how to reproduce this issue now :)
Its the case of another application holding the lock on boltdb.

So in my case, I was running dnet daemon in libnetwork which was holding the lock on the boltdb database. and when Docker daemon tried to get a lock on it, it was just waiting forever.

This would obviously cause an issue with the case of running multiple docker daemons in parallel.
and infact, I think it is incorrect to have the same boltdb database shared between these docker daemons. Hence we need a way to dedicate a db per daemon instance.
ping @mrjana as we were discussing on a similar situation for another case.

At this point, I dont think it is an boltdb or libkv issue. libnetwork must handle this situation.

I will keep this issue open so that we can continue discussing on it.

from libkv.

abronan avatar abronan commented on July 20, 2024

@mavenugo Ok yes in this case I think it make sense to use a separate DB for each role. Not really a boltdb (nor libkv) issue in that case as this is by design.

from libkv.

mavenugo avatar mavenugo commented on July 20, 2024

@abronan am just thinking if we can rather fail the request if there is another process already holding on the lock instead of waiting for ever.

from libkv.

mavenugo avatar mavenugo commented on July 20, 2024

@abronan i will push a patch to the boltdb driver to honor the timeout set by the caller.

from libkv.

abronan avatar abronan commented on July 20, 2024

@mavenugo Yes I guess that we can include a timeout like other backends for boltdb. Easy to implement and gives more inputs to users of the lib if something went wrong.

from libkv.

mavenugo avatar mavenugo commented on July 20, 2024

@abronan PR on its way :)

from libkv.

chenchun avatar chenchun commented on July 20, 2024

👍

from libkv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.