rust-vmm / vm-memory Goto Github PK

View Code? Open in Web Editor NEW

291.0 291.0 96.0 438 KB

Virtual machine's guest memory crate

License: Apache License 2.0

Rust 100.00%

vm-memory's People

Contributors

Stargazers

Watchers

Forkers

jiangliu sameo dagrh alexandruag insula-rs aghecenco rvolgers isgasho petrutlucian94 c4pt0r liujing2 sboeuf andreeaflorescu bjzhjing fkuchar tamionv chengxw2016 cloud-hypervisor michael2012z chrischiedo iulianbarbu andraprs rbradford lauralt slp gbionescu jmcconnell26 qinghai5060 nmanthey teawater emeraldshift qhjindev elmarco daniel-aaron-bloom shaobo322 baka233 bytedance lihangjing likebreath pogobanane sandreim krady21 alexandrucihodaru alindima gaelan bolin3540 dmgolembiowski liuw stefano-garzarella vireshk blitz aftersnow uarif1 fabianpatras mattschlebusch jonathanwoollett-light ceyert andypeng2015 sshyran wllenyj vbitz songzhi linuxsecuritymodules obeis danielsn gmh5225 anp rohankumardubey roypat justxuewei fuuuoverclocking cuviper iq-scm kalyazin germag howard0su willmtemple epilys 00xc russell-islam ablu joaopeixoto13 cblichmann bchalios jacksonbrim tylerfanelli

vm-memory's Issues

Remove as_slice from GuestMemoryRegion

The as_slice/as_mut_slice method inside of GuestMemoryRegion is not safe to use even though it is marked unsafe. This is because a Rust primitive slice assumes that a const slice will not have mutable references (which there are in a guest) or that a mutable slice is exclusive (but the guest can modify it).

Simplify Byte Implementations

I'm trying to wrap my head around the different read/write functions in the Byte trait. It seems a bit needlessly intimidating and confusing. For reference, I'm refering to these functions: https://docs.rs/vm-memory/0.7.0/vm_memory/bytes/trait.Bytes.html#tymethod.read

write writes to the Byte object, but write_to actually reads from it. read_into could have been a better name.
Most read/write functions could have default implementations based on read/write but that seems to be prevented by the configurable error type.

Are these desirable changes?

Chainsawing of firecracker memory_model went a bit too far

A few useful methods were removed; we should copy them back from firecracker to ease porting.

Simplify the vm-memory interfaces and implementation

This is a high level issue (and should be updated as we find other simplification directions).

#123
#133

After closing #140 we can identify further simplification directions.

MmapRegion::build fails when the backing object is a device

Currently mmap::check_file_offset attempts to validate the backing object by checking, among other things, its size to confirm the mapping will fit on it. This check is implemented by using std::fs::metadata to obtain the file size, but this approach doesn't work when the backing object is a block or char device, as the returned value is always zero.

This was discovered when virtiofsd users noticed that adding a memory object in QEMU backed by a device (i.e. /dev/kvmfr0) would cause the daemon to fail with Waiting for daemon failed: HandleRequest(ReqHandlerError(Custom { kind: Other, error: MappingPastEof })) (issue 46).

Use container from rustvmm org

Replace fandree/rust-vmm-dev with rustvmm/dev:v1 in Buildkite pipeline.

dev:v1 has exactly the same packages and rust version as fandree/rust-vmm-dev so it should be a seamless change.

Support MmapRegion with huge pages

Feature request

Add support for huge pages guest memory. For a specific use case, please consult this issue.

volatile_memory::Error doesn't impl std::error::Error

Using ? on VolatileMemory::get_slice (applied to a MmapRegion), in a function returning a generic Error, doesn't work, because VolatileMemory::get_slice returns a Result with an error type of volatile_memory::Error, which doesn't impl std::error::Error.

Fixing this would make it easier to avoid using .unwrap().

Switch to rust-vmm-ci

Use rust-vmm-ci as a gitsubmodule to test the code. Since vm-memory has non-default features, a custom pipeline is also required.

For more details about how to make the switch, please check the rust-vmm-ci readme

Add constructors to mmap_unix

It'd be useful to add a global constructor to MmapRegion in mmap_unix.rs (here) since this would provide a proper wrapper for the mmap() syscall.

The point would be to be able to provide PROT_NONE/READ/WRITE along with MAP_SHARED/PRIVATE/..., so that we're not tied to the predefined prototype and map types.

Windows: leaked mmap regions

We're incorrectly using VirtualFree in the MmapRegion Drop trait implementation, for which reason it will be silently leaking memory. This function will fail if the size is specified when releasing memory.

Converting AtomicAccess to u64

I'm trying to implement the Bytes trait for the abstractions of our hypervisor and I'm really struggling. My current problem is that Bytes has a store method that takes the value to store as an AtomicAccess. Now our internal abstractions really want an u64 to store data. That didn't seem much of an issue until I tried converting it:

    fn store<T: AtomicAccess>(
        &self,
        val: T,
        addr: MemoryRegionAddress,
        order: Ordering,
    ) -> Result<(), Self::E> {
        let actual_value: <<T as AtomicAccess>::A as AtomicInteger>::V = val.into();

        // At this point actual_value has type V that is unconstrained and thus we can't do anything with it?
        let an_u64: u64 = actual_value as u64; // Error: an `as` expression can only be used to convert between primitive types or to coerce to a specific trait object

So the initially innocent looking problem of getting an actual integer I can use out of AtomicAccess appears really complex now. Am I missing something obvious? How do others solve this?

@bonzini

Add dirty map tracking support

Tracking dirty bitmap is needed for features such as snapshot, live migration, or live update.

Avoid mixing std::ptr::read/write_volatile() and std::ptr::copy()

The copy_slice() wrapper is used to access guest memory for most VolatileMemory methods, but there's an exception that copy_to_volatile_slice() uses std::ptr::copy(). The copy_to_volatile_slice() method should be refined to use copy_slice() so we never mix std::ptr::read/write_volatile() and std::ptr::copy()

Allow mapping MAP_PRIVATE memory to file

The only provided interfaces right now force MAP_SHARED on all file mappings.

Handling of indirect descriptor violates virtio spec

According to Virtio spec 1.1, 2.6.5.3.2 Device Requirements: Indirect Descriptors:
The device MUST ignore the write-only flag (flags&VIRTQ_DESC_F_WRITE) in the descriptor that refers
to an indirect table.
The device MUST handle the case of zero or more normal chained descriptors followed by a single descriptor
with flags&VIRTQ_DESC_F_INDIRECT
Note: While unusual (most implementations either create a chain solely using non-indirect descriptors, or
use a single indirect element), such a layout is valid.

And commit "a83152da14554cf810f672f6f4079e943fe6ce28 : simplify descriptor chain handling" breaks the rules defined above.

Implement PartialEq for errors in vm-memory

The lack of PartialEq in vm-memory is forcing weird patterns across all rust-vmm crates (where errors need to be compared using match or even worse where errors are not compared at all). The problem with implementing PartialEq resides in the underlying io::Error which does not implement PartialEq. Since the use case for errors is pretty well defined in vm-memory, we should be able to manually implement PartialEq because we can reason about what errors should actually be equal.

DoS issue when using virtio with rust-vmm/vm-memory

We have identified an issue in the rust-vmm vm-memory crate that leads to a denial-of-service (DoS) issue if the crate is used in a VMM in conjunction with virtio. The issue affects both vm-memory releases (v0.1.0 and v0.2.0). In our environment, we reproduced this with musl builds on x86_64, and with all aarch64 builds.

Issue Description

In vm-memory, the functions read_obj and write_obj are not doing atomic accesses for all combinations of platform and libc implementations. These reads and writes translate to memcpy, which may be performing byte-by-byte copies. Using vm-memory in the virtio implementation can cause undefined behavior, as descriptor indexes require 2-byte atomic accesses.

Impact

The issue can affect any virtio/emulated device which expects atomic writes for base types longer than 1 byte.

Observed impact: When the network stack is under load, the driver will try to clear a used descriptor before the index of the descriptor is fully written by the device. When this issue is triggered, the virtio-net device will be unable to transmit packets. This leads to VMs using rust-vmm/vm-memory having their network effectively disconnected by outside network traffic, resulting in both a DoS vector and an availability issue under normal at-load operations.

Affected Systems

For a VMM to be affected, it must run on aarch64 (built with either musl or glibc), or on x86_64 with a musl build. All VMMs using rust-vmm/vm-memory (any release) in a production scenario, and that take arbitrary traffic over the virtio-net device, are confirmed to be at risk of a DOS. All VMMs using rust-vmm/vm-memory (any release) in a production scenario with a virtio-net deice are under availability risk. All VMMs using rust-vmm/vm-memory (any release) in a production scenario using other devices that expect atomic reads for more than 1-byte values may also be affected, but we are unaware of any risk for other devices (beyond the guest freezing its own virtio stack).

Add CODEOWNERS file

Please add a CODEOWNERS file with the details of the maintainers. Please use the following format as outlined in: https://help.github.com/en/articles/about-code-owners

Most projects can simply follow the wildcard syntax. e.g.

* @owner1 @owner2

Not only does this provide a way to see who is responsible or this repository they will also automatically be notified of incoming PR reviews.

Deprecate `GuestMemory` stopgap "iterator" methods

with_regions, with_regions_mut and map_and_fold were just papering over the lack of iterator-based access to the regions in GuestMemory. After #130 closes out #8, they will no longer be needed and should be deprecated and eventually removed.

Allow adding GuestRegionMmap's to GuestMemoryMmap

GuestMemoryMmap only has a:
pub fn new(ranges: &[(GuestAddress, usize)]) -> std::result::Result<Self, MmapError> {

constructor.
We need a way to build a GuestMemoryMmap that's full of GuestRegionMmap's (e.g. that are created from_fd's).
So either a new constructor or an 'insert' method.

GuestMemory::read_from and GuestMemory::write_to have "exact" semantics

The GuestMemory implementation of the read_from function of the Bytes trait is equivalent to read_exact_from.
In analogy, the write_to implementation is equivalent to write_all_to.

This is because both functions rely on a call to try_access, which will run until an irrecoverable error occurs (not ErrorKind::Interrupted) or count bytes are transferred.
This semantic is correct for read_exact_from and write_all_to, assuming they have similar meaning to the read_exact and write_all functions of the std::io::Read and std::io::Write traits.

For the "non-exact" counterparts, only one read/write should occur.

Normalize the way to create GuestMemoryMmap instance

With code evolves, now we have several constructors for GuestMemoryMmap:

pub fn new(ranges: &[(GuestAddress, usize)]) -> result::Result<Self, Error>
pub fn with_files<A, T>(ranges: T) -> result::Result<Self, Error>
pub fn from_regions(mut regions: Vec<GuestRegionMmap>) -> result::Result<Self, Error>;
pub fn from_arc_regions(regions: Vec<Arc<GuestRegionMmap>>) -> result::Result<Self, Error>

With these four constructors, we still have no constructors to create an empty GuestMemoryMmap instance. All these constructors assume that memory regions are static and created at boot time. And memory hotplug breaks this assumption.

So how about refining the constructors as:

pub fn new() -> Self
pub fn from_ranges(ranges: &[(GuestAddress, usize)]) -> result::Result<Self, Error>
pub fn from_ranges_with_files<A, T>(ranges: T) -> result::Result<Self, Error>
pub fn from_regions(mut regions: Vec<GuestRegionMmap>) -> result::Result<Self, Error>;
pub fn from_arc_regions(regions: Vec<Arc<GuestRegionMmap>>) -> result::Result<Self, Error>

Note, these changes may break existing code in CrosVM, firecracker and cloud hypervisor projects.

Cleanup after moving to Rust 1.34

After #19 gets merged, looks like we'll have an extra Cargo.toml feature (integer-atomics) and an extra dependency (cast) because of missing Rust features that got added/stabilized in 1.34. We should remove these as soon as we can assume everyone uses 1.34 or a newer version.

Temparory EINTR error condition breaks vm_memory logic

According to rust doc for Read::read():
If this function encounters any form of I/O or other error, an error
variant will be returned. If an error is returned then it must be
guaranteed that no bytes were read.

An error of the [ErrorKind::Interrupted] kind is non-fatal and the
read operation should be retried if there is nothing else to do.

So we should silently ignore [ErrorKind::Interrupted] and retry,
otherwise it will false alarms/wrong results.

Remove dependency on tempfile

While tempfile is a nice utility that creates temp files which are automatically deleted when the object is out of scope, it is also a burden in terms of dependency.

Having the tempfile crate as a dependency adds 17 dependencies to the vm-memory crate. I propose to get rid of it and instead have a dummy wrapper that we can use in tests.

Trait depending on implementations

I just got some more time to look to the vm-memory code. Should the memory regions and the guest memory traits depend on traits instead of actual implementations?
We have Address which is a trait and GuestAddress which implements the trait. Then when creating the memory region trait instead of using the address trait we are using the GuestAddress. Is this intended? If yes then what is the point of having Address as a trait? It looks like we are limited to using u64 addresses.

Pinging @bonzini & @jiangliu

Add a CHANGELOG

Can we add a changelog to this repository? It helps in keeping track of changes between releases.

Can we also tag the commit from which you publish to crates.io so that people can easily identify the code version starting from the crates.io version? This is what we've done for the other crates that are published as well. I can do that if the maintainers are on board with this.

CC: @bonzini @alexandruag @jiangliu

`read_exact_from` and `write_all_to` from `GuestMemory` don't have the correct semantics

When using the GuestMemory implementation of read_exact_from and write_all_to, I was expecting they would handle the retry, but they simply return an Error saying they couldn't read or write the entire buffer. That's the wrong behavior IMO, especially because of the naming that suggest they should behave exactly as the Rust traits Read and Write.

The proper solution would be to call underlying functions read_exact_from and write_all_to from the GuestRegionMmap implementation after we found the regions affected by the range, leading to a proper retry when not all data has been read or written.

/cc @andreeaflorescu @alexandruag @jiangliu

Provide atomic accesses to guest memory

We should fix logic that handles guest memory access and does not guarantee the operations will have volatile semantics. One example is the copy here:

vm-memory/src/volatile_memory.rs

Line 277 in 08b24bf

copy(self.addr, slice.addr, min(self.size, slice.size));

Update License to Apache-v2.0 + 3-clause BSD

For details check rust-vmm/community#14

Fuzzing for vm-memory

Evaluate the possibility to test vm-memory interfaces & implementation using fuzzing techniques.

Revert Bytes implementation changes introduced by #95

#95 fixed the problem of torn reads/writes caused by the implementation of read/write_obj essentially leveraging the environment specific memcpy, and the implicit assumption that read/write_obj perform atomic accesses up to a certain size at aligned addresses. Meanwhile, we added the load/store operations to Bytes which provide explicit atomic access semantics. We should document that read/write_obj must not be used when atomicity is required, and revert the related changes from #95.

Here are some next steps we should consider:

Explicitly document the semantics of read/write_obj with respect to atomicity.
Reach out to folks using vm-memory and/or discuss here about the implications of the transition.
Consumers of the crate have to replace read/write_objs that are supposed to be atomic with load/stores. Let's do a 0.3.x (or later) release right before removing anything, to help with the transition.
Apply the changes and include them in a 0.4.x (or later).

MmapError is not exported

The mmap API returns MmapError but it's not exported.
For callers to propagate the right errors from the vm-memory crate, this needs to be exported.

Publish on crates.io

As we merge the remaining PRs, and the code base appears to settle down, we should start thinking about publishing vm-memory on crates.io (or identify any reasons to hold back and for how long). One semantic versioning question is whether we want to bump the version to 1.0.0 first, or we're fine with staying at 0.something for the time being.

Let's track any work/concerns related to publishing on this issue. From our side, I'm currently working towards merging #29 and going through another full code review of the crate to make sure things look fine for the upcoming Firecracker integration.

Add guard pages to mmaped memory regions

In theory it is possible to take a pointer to a mmaped memory region and read/write outside of the region's bounds.

We could prevent this behavior by adding a guard page at the beginning and at the end of each region.

v0.5.0 release

Hi,

Although there are few changes, releasing vm-memory early would allow to merge rust-vmm/vm-virtio#33.

Thanks

Improve performance of GuestMemoryMmap::find_region()

This function gets called a lot and appears in the performance reports when running cloud-hypervisor.

This currently has an O(n) algorithm and a simple fix I think would be to keep the vector of regions sorted and then use Vec::binary_search_by to give us an O(lg n) lookup. The added cost of sorting the regions when new ones get added would be minimal compared to the number of times .find_region() is called

Add `get_slice` and `as_volatile_slice` to GuestMemoryRegion

These would return a Result<>, similar to get_host_address, but the resulting object would be range-checked and allow to access guest memory without unsafe code.

Performance degradation after fixing #95

When implementing the fix for #95, we introduced a performance degradation on (some?) glibc builds. On Firecracker with iperf3 and we observed a performance degradation of 5% for glibc builds.
On Cloud Hypervisor, the performance degradation is significantly worse. Observed impact is up to 50%. More details here: cloud-hypervisor/cloud-hypervisor#1258

Opening this issue so that we can decide on the next steps for fixing the performance degradation.

My 2-cents: I wouldn't want to introduce this fix only for x86_64 musl builds & aarch64 glibc & musl builds because we cannot know for sure what glibc versions are people using out there, hence we cannot know for sure that glibc is doing the right thing (i.e. optimizing memcpy at a higher granularity than 1 byte).

I would say that the underlying problem which is the reason for the performance degradation & the bug, is that we lose type information about the object that needs to written/read. I would rather like us to work on improving the interface so that type information doesn't need to be sort-of inferred (by checking alignments & reading/writing in the largest possible chunks). I would need to do some experiments before having a solution here.

Another thing that I believe is of paramount importance at this moment is to add performance testing to vm-memory. Pretty much every other function in vm-memory is on the critical performance path. We should make sure to not introduce regression here as we continue the development.

CC: @sboeuf @rbradford @sameo @alexandruag @serban300 @bonzini

Enable doc-markdown clippy

It just makes the documentation look nicer.

Iron out the GuestMemory interface

We should clarify some aspects of the GuestMemory interface, and a good starting point is around atomicity semantics. As Gerry mentioned in this comment, there are a couple of ways to do an atomic access, but I think all of them have some downsides right now.

get_atomic_ref/array_ref are a bit unwieldy, but more important, they're only available when the address falls within a guest region that's backed by host memory. This means that users of generic M: GuestMemory objects must either add a fallback code path, or rely exclusively on read/write_obj.

The main issue with read/write_obj is they don't have well-defined semantics at the interface level. One simple contract we could go for is "read/write_obj provide atomic accesses for certain sizes at aligned memory locations" (with the caveat that different archs provide atomicity guarantees up to different widths). A more verbose (but clearer and overall safer IMO) alternative is to expand the interface around read/write_obj to something like the following pseudocode:

// A type (easily convertible to/from GuestAddress) which guarantees the
// inner value is aligned with respect to T.
pub struct AlignedGuestAddress<T>(/*...*/);

// Just a marker.
pub unsafe trait AtomicAccess: ByteValued { }

impl GuestMemory {
    // ...

    // Always atomic!
    fn read_atomic<T: AtomicAccess>(addr: AlignedGuestAddress<T>) -> Result<T>;

    // The next two methods are not guaranteed to be atomic.
    fn read_obj<T: ByteValued>(addr: AlignedGuestAddress<T>) -> Result<T>;

    fn read_obj_unaligned<T: ByteValued>(addr: GuestAddress) -> Result<T>;

    // Similar for write ...
}

I think adding something like AlignedGuestAddress is a useful abstraction, because many implementations (i.e. device model building blocks) provide/demand alignment guarantees for certain addresses, and it makes sense to explicitly leverage the additional information or model the extra requirement. It also helps with making operation semantics more explicit.

WDYT?

Implement builder for MmapRegion

Currently the MmapRegion provides several methods to create MmapRegion objects, but we often need to change the existing methods to support new functionalities. So it would be better to adopt Builder pattern and implement MmapRegionBuilder, so we could add new functionality without affecting existing users.

Update manifest to specify link to readme

The front page of the vm-memory crate on crates.io is pretty much blank.

We should update the manifest to specify the path of the readme so that creates.io can display it as the front page of the crate.

Add threat model documentation

Create types of memory regions

I think the next step for the vm-memory is to define types (Acpi, Ram, Persistent, ...) of memory regions.

The first use case I have for this is to support virtio-pmem properly from the cloud-hypervisor VMM. The idea is that virtio-pmem memory map the backing file into the VMM address space, to make it available to the VM. So this is a clear use case where we want the portion of memory exposed from virtio-pmem to be part of the GuestMemory as a separate region. But this region cannot be considered the same as the region representing the RAM, otherwise the virtio device implementations will be broken, thinking that it is part of the VM RAM.
That's why having types associated with regions could be a simple way for any code in the VMM to check if the region should be considered or ignored, depending on what you're trying to achieve.

I think this should be added once #29 will be merged.

@bonzini @jiangliu @alexandruag WDYT?

Asserts trigger on zero-length access

Some asserts (such as assert!(offset < count); in read_from and write_to) get triggered when the respective methods are called with count == 0. It's an edge case, but still valid and we should not treat this as an exceptional situation.

MmapError doesn't impl std::error::Error

Using ? on GuestMemoryMmap::new and similar, in a function returning a generic Error, doesn't work, because GuestMemoryMmap::new returns a Result with an error type of MmapError, and MmapError doesn't impl std::error::Error.

Fixing this would allow cleaning up many uses of .unwrap().

Add GuestMemory method to return an Iterator

Currently we have methods such as with_regions, with_regions_mut and map_and_fold that are just papering over the lack of iterator-based access to the regions in GuestMemory. If possible we should try to add a better and more powerful abstraction.

Add struct_util module from linux-loader

struct_util is a generic utility module that reads structs from bytes. It should be included in this utility crate and consumed by linux-loader from here.