checkpoint-restore / checkpointctl Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 15.0 5.44 MB

A tool for in-depth analysis of container checkpoints

License: Apache License 2.0

Makefile 4.22% Go 46.97% Shell 44.79% C 4.02%

checkpointctl's People

Contributors

Stargazers

Watchers

Forkers

adrianreber tacinight kolyshkin dcbw luap99 rst0git chansushin-konkuk machworklab snprajwal sankalp-12 behouba parthiba-hazra shikharish bsbiradar simardeepsingh-zsh

checkpointctl's Issues

Extend `inspect` output format with JSON

Currently checkpointctl inspect only support the tree format. It might be useful to also be able to render the output in a JSON format (--format=json). This would enable easier integration with other tools to further process the output of checkpointctl.

Suggested by: @rst0git

Do not use the podman network status

Podman is going to abstract CNI from libpod (see containers/podman#11232).
When the work is done libpod will no longer use the CNI result as network status. I am currently wiring the new interface into libpod and noticed that the CNI result is dumped into the checkpoint dir and that this library is using the result on restore to the set the ip and mac address. In my opinion this library should not have to know anything about the network status used in podman. I think it would make more sense to handle setting the correct ip and mac entirely in libpod and not here.

Statisitics output adds `us` everywhere

├── CRIU dump statistics
│   ├── Freezing time: 539 us
│   ├── Frozen time: 116647 us
│   ├── Memdump time: 11168 us
│   ├── Memwrite time: 7195 us
│   ├── Pages scanned: 7867 us
│   └── Pages written: 2297 us

Pages scanned and pages written should not have us. The number is just the number of actual memory pages scanned or written. Just remove us.

Do not unpack the whole checkpoint archive

Currently checkpointctl is not really clever when it comes to showing information about a checkpoint archive. Checkpoint archives are always unpacked unconditionally and completely.

It would be enough, depending on the options used, to only unpack one or two files. This would require less space in /tmp and probably be faster.

The size of the checkpoint can also be figured out from the tar headers and does not require actual unpacking of the files.

Update `README.md`

The examples in the README.md are outdated since PR #76. To accurately reflect the current state of checkpointctl, it is necessary to update the example about the usage of checkpointctl show. Additionally, it may be good to include new examples showcasing the usage of checkpointctl inspect to provide users with up-to-date information.

Add integration test with podman

Currently, we only run tests with BATS. This does not let us test changes in real world scenarios with containers. An integration test where a container is checkpointed and then the archive used for testing is much more robust and bulletproof.

Add support for checkpoint conversion for different container runtimes/engines

Currently we don't support compatibility between container runtimes such as runc and crun, or engines such as Podman and CRI-O. For example, if a container checkpoint is created with crun, it is not currently possible to restore it with runc, or a checkpoint created with Podman can not be restored with CRI-O. Since all these container runtimes and engines use CRIU, it is technically possible and this functionality would be very useful.

For example, the following two files contain the implementation for container checkpointing with crun and runc:

https://github.com/containers/crun/blob/main/src/checkpoint.c
https://github.com/opencontainers/runc/blob/main/libcontainer/criu_linux.go
In Podman, the problem is solved by extracting the runtime used to create a checkpoint:
- https://github.com/containers/podman/blob/7eaedaf3/pkg/checkpoint/crutils/checkpoint_restore_utils.go#L238
- https://github.com/containers/podman/blob/7eaedaf3/cmd/podman/root.go#L260

For instance, in the case of runc and crun, the difference comes from implementation-specific format of configuration file stored in the checkpoint. For example, the code in [1] is used to save a configuration file and information about file descriptors specific to runc, while the code in [2] implements the equivalent in crun but with different format. The easiest way to see the difference would be to create a container checkpoint [3] using both runc and crun [4] and see what files are included in the checkpoint.

An effort in a similar vein is the proposal to standardize the checkpoint image definition format.

I believe adding the functionality to convert from container archive format to the other through checkpointctl will be quite useful. A sample invocation could look like (from clause can be optional):

$checkpointctl migrate/convert --from {podman,cri-o,kubernetes} --to {podman,cri-o,kubernetes} /tmp/ubuntu_looper.tar.gz

I'm happy to contribute to this issue. Please let me know your thoughts here!

CC: @rst0git

Add support for list command

There are several standard locations where container engines like Podman and CRI-O store container checkpoints. However, these locations are not well documented and it might be difficult for users to find them. It would be very useful to have a command like checkpointctl list that would be able to list container checkpoints created by Podman and CRI-O.

Dead link to Podman checkpoint

The second link is dead:

Details on how to create checkpoints with the help of CRIU can be found at:

Did it get moved to https://podman.io/docs/checkpoint?

Release 1.1.X+ to remove indirect dependency related to CVE-2024-21626

Hi, I noticed that there is an indirect dependency vulnerability on your 1.1.0 release that is correlated to a CVE regarding runc versions <= 1.1.11

checkpointctl/go.mod

Lines 9 to 21 in 3ba5cad

 github.com/opencontainers/runtime-spec v1.1.0 

 github.com/spf13/cobra v1.7.0 

 github.com/xlab/treeprint v1.2.0 

 ) 

 require ( 

 github.com/docker/go-units v0.5.0 // indirect 

 github.com/inconshreveable/mousetrap v1.1.0 // indirect 

 github.com/klauspost/compress v1.16.7 // indirect 

 github.com/klauspost/pgzip v1.2.6 // indirect 

 github.com/mattn/go-runewidth v0.0.9 // indirect 

 github.com/moby/sys/mountinfo v0.6.2 // indirect 

 github.com/opencontainers/runc v1.1.9 // indirect

The use of github.com/opencontainers/runtime-spec v1.1.0 is the cause and it looks like you have patched it on main.

checkpointctl/go.mod

Line 9 in 6c3a263

github.com/opencontainers/runtime-spec v1.2.0

Is there any timetable for the next release/patch of checkpointctl?

Add support for showing information for multiple checkpoints

The checkpointctl show command currently supports displaying the information for a single checkpoint file. However, it would be very useful for this command to be extended with support for multiple files (similar to other Unix commands).

Example:

checkpointctl show /var/lib/kubelet/checkpoints/*.tar

Make command fails

Running make gives the following error:

❯ make
go build -o checkpointctl -ldflags "-X main.name=checkpointctl -X main.version=1.1.0"
# github.com/containers/storage/pkg/unshare
unshare.c: In function 'try_bindfd':
unshare.c:196:30: error: 'O_PATH' undeclared (first use in this function)
  196 |         ret = open(template, O_PATH | O_CLOEXEC);
      |                              ^~~~~~
unshare.c:196:30: note: each undeclared identifier is reported only once for each function it appears in
unshare.c: In function 'copy_self_proc_exe':
unshare.c:236:24: error: 'SYS_memfd_create' undeclared (first use in this function); did you mean 'SYS_timerfd_create'?
  236 |         mmfd = syscall(SYS_memfd_create, exename, (long) MFD_ALLOW_SEALING | MFD_CLOEXEC);
      |                        ^~~~~~~~~~~~~~~~
      |                        SYS_timerfd_create
make: *** [Makefile:36: checkpointctl] Error 1

System details:
OS: Linux Mint 21.1 x86_64
Kernel: 5.15.0-91-generic
CPU: 11th Gen Intel i5-11320H (8) @ 4.500GHz

Extend the output format with a tree view

Currently, the output format of checkpointctl is in a table format. However, due to the limitations of horizontal space and the need to display multiple checkpoints, it would be beneficial to add a tree view format. This offers several advantages:

The tree format takes advantage of unlimited vertical space, which is more suitable for displaying detailed information.
The tree format can be used with multiple checkpoints, allowing for the presentation additional details for each checkpoint.
A built-in terminal pager, (e.g: systemctl status), would enhance the usability of the output.

Here's an example of how the output could look:

containername
├── IMAGE: registry/name:latest
├── ID: 209872364234
├── RUNTIME: runc
├── CREATED: 2023-05-19T08:44:58.018444733Z
├── ENGINE: CRI-O
├── IP: 10.88.0.11
├── CHECKPOINT SIZE:
├── ROOT FS DIFF SIZE:
├── NAMESPACES:
│   ├── UTSNS
│   │   ├── NODENAME: hostname01
│   │   └── DOMAINNAME: (none)
│   └── IPC
└── PROCESSES
    └── [1]  bash
        ├── [7]  counter.py
        └── [8]  tee

Suggested by @adrianreber

Extend checkpointctl to convert checkpoint archives to OCI images

Currently it is necessary to run a couple of buildah commands to convert a checkpoint archive to an OCI image which can be used by CRI-O or Podman.

It should be possible to do those steps in checkpointctl with one single command. If this is implemented it is important that including lib/ in another go project does not pull in any additional dependencies (like buildah).

Prune process tree by PID if specified

Currently, the --ps-tree and --files flags print the entire process tree, along with the associated information. These flags can optionally use a PID, and prune the tree to show just that process and its children. This allows for reduced noise when the process that needs to be inspected is already known. E.g. the PID of a known process can be retrieved with --ps-tree, and the finer details can be printed specifically for that process with checkpointctl inspect --files --sk --pid <PID>. A short form -p can also be provided.

Display network information of checkpoints created with Podman

checkpointctl currently shows IP address information for checkpoints created with CRI-O but not with Podman. For checkpoints created with Podman, this information is stored in network.status, which has content in a JSON format like the following example:

{
  "podman": {
    "interfaces": {
      "eth0": {
        "subnets": [
          {
            "ipnet": "10.88.0.9/16",
            "gateway": "10.88.0.1"
          }
        ],
        "mac_address": "f2:99:8d:fb:5a:57"
      }
    }
  }
}

To enable this feature, we need to extract the network.status file from the checkpoint and parse its content.

Process tree output looks not totally correct.

Using --ps-tree on a container checkpoint I get following output:

counter
└── [1]  bash
    ├── [7]  bash
    ├── [7]  counter.py
    ├── [8]  bash
    └── [8]  tee

That doesn't look like what I expected. I would expect something like:

counter
└── [1]  bash
    ├── [7]  counter.py
    └── [8]  tee

@snprajwal PTAL

How can I install?

Hi,

I tried to use this API. I clone this to my machine and I use the command
"make" because you have makefile right? I try to follow the video kubectl-drain-checkpoint.mp4

For some reason, I cannot use the command "checkpointctl", but I use ./checkpointctl with the command

I got this error

Error: Target /var/lib/kubelet/checkpoints access error
: stat /var/lib/kubelet/checkpoints: no such file or directory

I don't know how to deal with this? Any recommendation?

Versioning?

Hi there,

I would like to create an aur package (arch linux users repo) for checkpointctl but I saw that there is no version at all or any git tag.
May I ask why?
Do you have in plan to release any version soon?
I can create the package from the latest git commit but it's better to a versioning.

thanks

Doubt - Relation with Kubernetes

Is this a custom kubectl operator ?
2)why was kubernetes/kubernetes#104907 only limited to forensic checkpointing and not restore ? When can this be expected
3)Are there any Pull requests for native kubernetes checkpoint restore support?

How to install checkpointctl tool

Hello @adrianreber I am new to K8 and right now working on Application migration between two different clusters for my Master Thesis. This tool looks very promising and I would like to try it to manipulate with checkpoinzs. However, I am unable to install it as I do not see any instructions anywhere.

Refactor `checkpointctl.go` and move commands into individual files

The checkpointctl.go file has grown after adding the new inspect command, and will grow further as we expand the CLI. It would be better to refactor it into it's own package, and use the existing file only to invoke the main function.

Showing env and mounts

spec.dump often contains env and mounts as a part of the dump.

@adrianreber do you think displaying it in the show output or adding a new flag makes sense?
I'd like to take this up and send in a PR if this seems valuable.

	github.com/opencontainers/runtime-spec v1.1.0
	github.com/spf13/cobra v1.7.0
	github.com/xlab/treeprint v1.2.0
	)

	require (
	github.com/docker/go-units v0.5.0 // indirect
	github.com/inconshreveable/mousetrap v1.1.0 // indirect
	github.com/klauspost/compress v1.16.7 // indirect
	github.com/klauspost/pgzip v1.2.6 // indirect
	github.com/mattn/go-runewidth v0.0.9 // indirect
	github.com/moby/sys/mountinfo v0.6.2 // indirect
	github.com/opencontainers/runc v1.1.9 // indirect