Giter Club home page Giter Club logo

genode-checkpointrestore-sharedmemory's Introduction

Genode real-time capable checkpoint/restore mechanism

General workflow in migration

  1. Checkpoint component A

  2. Serialize data

  3. Transfer to new node

  4. Deserialize data

  5. Restore state of component A and restart it

Accessing target component's resources

  • Parent/child approach
  • Target component = child component
  • Parent provides custom services which are used by the child (i.e. parent intercepts services used by the child)
  • Child creation: PD, CPU, and RAM sessions
  • Child runtime: All other sessions like RM, LOG, Timer sessions
  • Custom services use the real services in the background
  • Parent stores information about the state of each session
  • Creation arguments
  • Update arguments
  • Method invokations
  • Parent restores the inner state of used sessions through these information

Checkpoint/Restore

  • Component's name: Rtcr (Real-time checkpointer/restorer)
  • Checkpoint in userland
  • Service method: checkpoint() => it uses Checkpointer::checkpoint
  • Pause target during checkpoint
  • Read information about the capability space and map
  • Store intercepted session information (not dataspace content) to parent's address space
  • Store dataspace content to parent's address space
  • Resume target after checkpoint
  • Restore in userland
  • Service method: restore() => it uses Restorer::restore
  • Recreate empty child without sessions
  • Recreate sessions and their RPC objects
  • Restore state of sessions and their RPC objects
  • Restore capability space and map with new capabilities
  • Incremental checkpointing as optimization
  • Approach
  • At checkpoint time store only the changes to the last checkpoint
  • Marking/tracing "dirty pages" by using page faults exceptions
  • Parent provides a custom RAM session to the child which allocates managed dataspaces (=region maps) instead of usual dataspaces
  • The managed dataspace is filled with usual dataspaces (called designated dataspaces)
  • Designated dataspaces occupy an exclusive area in the managed dataspace, thus, the whole space of the managed dataspace is filled
  • To mark an accessed dataspace, all dataspaces from the managed dataspace are detached
  • When a region in the managed dataspace is accessed a page fault is triggered
  • The page fault is resolved by a thread which attaches the corresponding designated dataspace to the faulting region
  • Now the target component can use (read, write, execute) the region in the managed dataspace without disruption
  • When a checkpoint is performed this designated dataspace is stored to parent's address space and detached from the managed dataspace
  • Now the managed dataspace is ready to mark/trace accessed regions again
  • Tweaks
  • The granularity of the marking mechanism can be modified by changing the size of designated dataspaces
  • Increasing the designated dataspace size
  • Decreasing the chance a page fault occurs which lowers the overhead while the target component is running (runtime overhead)
  • Increasing the duration of the checkpoint, because the dataspace is larger and needs more time for copying (checkpoint overhead)
  • Decreasing the designated dataspace size
  • Increasing the chance a page fault occurs which increases the runtime overhead
  • Decreasing the duration of the checkpoint, because the dataspace is smaller
  • A balance between runtime and checkpoint overhead has to be found out in regard to the locality of target's memory usage
  • Accessing adjacent memory regions profits from large designated dataspaces
  • Accessing spread memory regions profits from small designated dataspaces

genode-checkpointrestore-sharedmemory's People

Contributors

denishubertum avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genode-checkpointrestore-sharedmemory's Issues

Missing session creation

Output when using parent service
[init] session construct
[init] route
[init] resolve_session_request
[init] Service denied
[init] create session
[init] Child target_restorer-tester create session PD
Completely missing when using local service

Some new errors

transfer cap<kcap=0x341000,key=1240> amount 3
Error: init -> target_restorer-tester -> virt_space=1, ram_quota=1048576, cap_quota=100, label=: attempt to transfer initial quota
Warning: PD (init -> target_restorer-tester -> virt_space=1, ram_quota=1048576, cap_quota=100, label=) cap limit (used=2, limit=12) exceeded during transfer_quota(3)
[init -> target_restorer-tester] Warning: sheep_counter: could not revert session cap quota (service=ROM cid=0 args='cap_quota=3, label="sheep_counter -> config", ram_quota=5500, diag=0' state=CLOSED ram_quota=5500, cap_quota=3)

cap index out of bounds

Kernel: assertion failed: cap index out of bounds at operating-system/genode/repos/base-focnados/src/include/base/internal/cap_alloc.h:108

PD environment session denied

https://github.com/malsami/genode/tree/focnados_18.02_r78
combined with branch fixing_restore-AR-18.02 of this repo
executing run/rtcr_restore_child
leads to:

[init -> target_restorer-tester] Error: sheep_counter: PD environment session denied
[init -> target_restorer-tester] Error: Uncaught exception of type 'Genode::Reconstructible<Genode::Local_connectionGenode::Pd_connection >::Deref_unconstructed_object'

Problem:
resolve_session_request does not arrive at target_child

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.