checkpoint-restore / checkpointctl Goto Github PK
View Code? Open in Web Editor NEWA tool for in-depth analysis of container checkpoints
License: Apache License 2.0
A tool for in-depth analysis of container checkpoints
License: Apache License 2.0
├── CRIU dump statistics
│ ├── Freezing time: 539 us
│ ├── Frozen time: 116647 us
│ ├── Memdump time: 11168 us
│ ├── Memwrite time: 7195 us
│ ├── Pages scanned: 7867 us
│ └── Pages written: 2297 us
Pages scanned and pages written should not have us
. The number is just the number of actual memory pages scanned or written. Just remove us
.
The checkpointctl show
command currently supports displaying the information for a single checkpoint file. However, it would be very useful for this command to be extended with support for multiple files (similar to other Unix commands).
Example:
checkpointctl show /var/lib/kubelet/checkpoints/*.tar
Using --ps-tree
on a container checkpoint I get following output:
counter
└── [1] bash
├── [7] bash
├── [7] counter.py
├── [8] bash
└── [8] tee
That doesn't look like what I expected. I would expect something like:
counter
└── [1] bash
├── [7] counter.py
└── [8] tee
@snprajwal PTAL
The checkpointctl.go
file has grown after adding the new inspect
command, and will grow further as we expand the CLI. It would be better to refactor it into it's own package, and use the existing file only to invoke the main function.
Hi,
I tried to use this API. I clone this to my machine and I use the command
"make" because you have makefile right? I try to follow the video kubectl-drain-checkpoint.mp4
For some reason, I cannot use the command "checkpointctl", but I use ./checkpointctl with the command
I got this error
Error: Target /var/lib/kubelet/checkpoints access error
: stat /var/lib/kubelet/checkpoints: no such file or directory
I don't know how to deal with this? Any recommendation?
Running make
gives the following error:
❯ make
go build -o checkpointctl -ldflags "-X main.name=checkpointctl -X main.version=1.1.0"
# github.com/containers/storage/pkg/unshare
unshare.c: In function 'try_bindfd':
unshare.c:196:30: error: 'O_PATH' undeclared (first use in this function)
196 | ret = open(template, O_PATH | O_CLOEXEC);
| ^~~~~~
unshare.c:196:30: note: each undeclared identifier is reported only once for each function it appears in
unshare.c: In function 'copy_self_proc_exe':
unshare.c:236:24: error: 'SYS_memfd_create' undeclared (first use in this function); did you mean 'SYS_timerfd_create'?
236 | mmfd = syscall(SYS_memfd_create, exename, (long) MFD_ALLOW_SEALING | MFD_CLOEXEC);
| ^~~~~~~~~~~~~~~~
| SYS_timerfd_create
make: *** [Makefile:36: checkpointctl] Error 1
System details:
OS: Linux Mint 21.1 x86_64
Kernel: 5.15.0-91-generic
CPU: 11th Gen Intel i5-11320H (8) @ 4.500GHz
Currently, the output format of checkpointctl
is in a table format. However, due to the limitations of horizontal space and the need to display multiple checkpoints, it would be beneficial to add a tree view format. This offers several advantages:
systemctl status
), would enhance the usability of the output.Here's an example of how the output could look:
containername
├── IMAGE: registry/name:latest
├── ID: 209872364234
├── RUNTIME: runc
├── CREATED: 2023-05-19T08:44:58.018444733Z
├── ENGINE: CRI-O
├── IP: 10.88.0.11
├── CHECKPOINT SIZE:
├── ROOT FS DIFF SIZE:
├── NAMESPACES:
│ ├── UTSNS
│ │ ├── NODENAME: hostname01
│ │ └── DOMAINNAME: (none)
│ └── IPC
└── PROCESSES
└── [1] bash
├── [7] counter.py
└── [8] tee
Suggested by @adrianreber
Hi, I noticed that there is an indirect dependency vulnerability on your 1.1.0 release that is correlated to a CVE regarding runc versions <= 1.1.11
Lines 9 to 21 in 3ba5cad
The use of github.com/opencontainers/runtime-spec v1.1.0
is the cause and it looks like you have patched it on main.
Line 9 in 6c3a263
Is there any timetable for the next release/patch of checkpointctl?
There are several standard locations where container engines like Podman and CRI-O store container checkpoints. However, these locations are not well documented and it might be difficult for users to find them. It would be very useful to have a command like checkpointctl list
that would be able to list container checkpoints created by Podman and CRI-O.
spec.dump
often contains env
and mounts
as a part of the dump.
@adrianreber do you think displaying it in the show output or adding a new flag makes sense?
I'd like to take this up and send in a PR if this seems valuable.
Currently checkpointctl is not really clever when it comes to showing information about a checkpoint archive. Checkpoint archives are always unpacked unconditionally and completely.
It would be enough, depending on the options used, to only unpack one or two files. This would require less space in /tmp
and probably be faster.
The size of the checkpoint can also be figured out from the tar headers and does not require actual unpacking of the files.
Currently, we only run tests with BATS. This does not let us test changes in real world scenarios with containers. An integration test where a container is checkpointed and then the archive used for testing is much more robust and bulletproof.
Hello @adrianreber I am new to K8 and right now working on Application migration between two different clusters for my Master Thesis. This tool looks very promising and I would like to try it to manipulate with checkpoinzs. However, I am unable to install it as I do not see any instructions anywhere.
Currently we don't support compatibility between container runtimes such as runc and crun, or engines such as Podman and CRI-O. For example, if a container checkpoint is created with crun, it is not currently possible to restore it with runc, or a checkpoint created with Podman can not be restored with CRI-O. Since all these container runtimes and engines use CRIU, it is technically possible and this functionality would be very useful.
For example, the following two files contain the implementation for container checkpointing with crun and runc:
For instance, in the case of runc and crun, the difference comes from implementation-specific format of configuration file stored in the checkpoint. For example, the code in [1] is used to save a configuration file and information about file descriptors specific to runc, while the code in [2] implements the equivalent in crun but with different format. The easiest way to see the difference would be to create a container checkpoint [3] using both runc and crun [4] and see what files are included in the checkpoint.
An effort in a similar vein is the proposal to standardize the checkpoint image definition format.
I believe adding the functionality to convert from container archive format to the other through checkpointctl
will be quite useful. A sample invocation could look like (from clause can be optional):
$checkpointctl migrate/convert --from {podman,cri-o,kubernetes} --to {podman,cri-o,kubernetes} /tmp/ubuntu_looper.tar.gz
I'm happy to contribute to this issue. Please let me know your thoughts here!
CC: @rst0git
Podman is going to abstract CNI from libpod (see containers/podman#11232).
When the work is done libpod will no longer use the CNI result as network status. I am currently wiring the new interface into libpod and noticed that the CNI result is dumped into the checkpoint dir and that this library is using the result on restore to the set the ip and mac address. In my opinion this library should not have to know anything about the network status used in podman. I think it would make more sense to handle setting the correct ip and mac entirely in libpod and not here.
The second link is dead:
Details on how to create checkpoints with the help of CRIU can be found at:
Did it get moved to https://podman.io/docs/checkpoint?
checkpointctl
currently shows IP address information for checkpoints created with CRI-O but not with Podman. For checkpoints created with Podman, this information is stored in network.status
, which has content in a JSON format like the following example:
{
"podman": {
"interfaces": {
"eth0": {
"subnets": [
{
"ipnet": "10.88.0.9/16",
"gateway": "10.88.0.1"
}
],
"mac_address": "f2:99:8d:fb:5a:57"
}
}
}
}
To enable this feature, we need to extract the network.status file from the checkpoint and parse its content.
Currently checkpointctl inspect
only support the tree format. It might be useful to also be able to render the output in a JSON format (--format=json
). This would enable easier integration with other tools to further process the output of checkpointctl.
Suggested by: @rst0git
Currently, the --ps-tree
and --files
flags print the entire process tree, along with the associated information. These flags can optionally use a PID, and prune the tree to show just that process and its children. This allows for reduced noise when the process that needs to be inspected is already known. E.g. the PID of a known process can be retrieved with --ps-tree
, and the finer details can be printed specifically for that process with checkpointctl inspect --files --sk --pid <PID>
. A short form -p
can also be provided.
The examples in the README.md
are outdated since PR #76. To accurately reflect the current state of checkpointctl
, it is necessary to update the example about the usage of checkpointctl show
. Additionally, it may be good to include new examples showcasing the usage of checkpointctl inspect
to provide users with up-to-date information.
Hi there,
I would like to create an aur package (arch linux users repo) for checkpointctl but I saw that there is no version at all or any git tag.
May I ask why?
Do you have in plan to release any version soon?
I can create the package from the latest git commit but it's better to a versioning.
thanks
Currently it is necessary to run a couple of buildah
commands to convert a checkpoint archive to an OCI image which can be used by CRI-O or Podman.
It should be possible to do those steps in checkpointctl
with one single command. If this is implemented it is important that including lib/
in another go project does not pull in any additional dependencies (like buildah).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.