rust-vmm / vm-memory Goto Github PK
View Code? Open in Web Editor NEWVirtual machine's guest memory crate
License: Apache License 2.0
Virtual machine's guest memory crate
License: Apache License 2.0
The as_slice
/as_mut_slice
method inside of GuestMemoryRegion
is not safe to use even though it is marked unsafe. This is because a Rust primitive slice assumes that a const slice will not have mutable references (which there are in a guest) or that a mutable slice is exclusive (but the guest can modify it).
I'm trying to wrap my head around the different read/write functions in the Byte
trait. It seems a bit needlessly intimidating and confusing. For reference, I'm refering to these functions: https://docs.rs/vm-memory/0.7.0/vm_memory/bytes/trait.Bytes.html#tymethod.read
write
writes to the Byte
object, but write_to
actually reads from it. read_into
could have been a better name.read
/write
but that seems to be prevented by the configurable error type.Are these desirable changes?
A few useful methods were removed; we should copy them back from firecracker to ease porting.
Currently mmap::check_file_offset
attempts to validate the backing object by checking, among other things, its size to confirm the mapping will fit on it. This check is implemented by using std::fs::metadata
to obtain the file size, but this approach doesn't work when the backing object is a block or char device, as the returned value is always zero.
This was discovered when virtiofsd users noticed that adding a memory object in QEMU backed by a device (i.e. /dev/kvmfr0
) would cause the daemon to fail with Waiting for daemon failed: HandleRequest(ReqHandlerError(Custom { kind: Other, error: MappingPastEof }))
(issue 46).
Replace fandree/rust-vmm-dev with rustvmm/dev:v1 in Buildkite pipeline.
dev:v1 has exactly the same packages and rust version as fandree/rust-vmm-dev so it should be a seamless change.
Feature request
Add support for huge pages guest memory. For a specific use case, please consult this issue.
Using ?
on VolatileMemory::get_slice
(applied to a MmapRegion
), in a function returning a generic Error, doesn't work, because VolatileMemory::get_slice
returns a Result with an error type of volatile_memory::Error
, which doesn't impl std::error::Error
.
Fixing this would make it easier to avoid using .unwrap()
.
Use rust-vmm-ci as a gitsubmodule to test the code. Since vm-memory has non-default features, a custom pipeline is also required.
For more details about how to make the switch, please check the rust-vmm-ci readme
It'd be useful to add a global constructor to MmapRegion
in mmap_unix.rs
(here) since this would provide a proper wrapper for the mmap()
syscall.
The point would be to be able to provide PROT_NONE/READ/WRITE
along with MAP_SHARED/PRIVATE/...
, so that we're not tied to the predefined prototype and map types.
We're incorrectly using VirtualFree in the MmapRegion Drop trait implementation, for which reason it will be silently leaking memory. This function will fail if the size is specified when releasing memory.
I'm trying to implement the Bytes
trait for the abstractions of our hypervisor and I'm really struggling. My current problem is that Bytes
has a store method that takes the value to store as an AtomicAccess
. Now our internal abstractions really want an u64
to store data. That didn't seem much of an issue until I tried converting it:
fn store<T: AtomicAccess>(
&self,
val: T,
addr: MemoryRegionAddress,
order: Ordering,
) -> Result<(), Self::E> {
let actual_value: <<T as AtomicAccess>::A as AtomicInteger>::V = val.into();
// At this point actual_value has type V that is unconstrained and thus we can't do anything with it?
let an_u64: u64 = actual_value as u64; // Error: an `as` expression can only be used to convert between primitive types or to coerce to a specific trait object
So the initially innocent looking problem of getting an actual integer I can use out of AtomicAccess
appears really complex now. Am I missing something obvious? How do others solve this?
Tracking dirty bitmap is needed for features such as snapshot, live migration, or live update.
The copy_slice() wrapper is used to access guest memory for most VolatileMemory methods, but there's an exception that copy_to_volatile_slice() uses std::ptr::copy(). The copy_to_volatile_slice() method should be refined to use copy_slice() so we never mix std::ptr::read/write_volatile() and std::ptr::copy()
The only provided interfaces right now force MAP_SHARED on all file mappings.
According to Virtio spec 1.1, 2.6.5.3.2 Device Requirements: Indirect Descriptors:
The device MUST ignore the write-only flag (flags&VIRTQ_DESC_F_WRITE) in the descriptor that refers
to an indirect table.
The device MUST handle the case of zero or more normal chained descriptors followed by a single descriptor
with flags&VIRTQ_DESC_F_INDIRECT
Note: While unusual (most implementations either create a chain solely using non-indirect descriptors, or
use a single indirect element), such a layout is valid.
And commit "a83152da14554cf810f672f6f4079e943fe6ce28 : simplify descriptor chain handling" breaks the rules defined above.
The lack of PartialEq in vm-memory is forcing weird patterns across all rust-vmm crates (where errors need to be compared using match
or even worse where errors are not compared at all). The problem with implementing PartialEq
resides in the underlying io::Error which does not implement PartialEq
. Since the use case for errors is pretty well defined in vm-memory, we should be able to manually implement PartialEq
because we can reason about what errors should actually be equal.
We have identified an issue in the rust-vmm vm-memory crate that leads to a denial-of-service (DoS) issue if the crate is used in a VMM in conjunction with virtio. The issue affects both vm-memory releases (v0.1.0 and v0.2.0). In our environment, we reproduced this with musl builds on x86_64, and with all aarch64 builds.
In vm-memory, the functions read_obj and write_obj are not doing atomic accesses for all combinations of platform and libc implementations. These reads and writes translate to memcpy, which may be performing byte-by-byte copies. Using vm-memory in the virtio implementation can cause undefined behavior, as descriptor indexes require 2-byte atomic accesses.
The issue can affect any virtio/emulated device which expects atomic writes for base types longer than 1 byte.
Observed impact: When the network stack is under load, the driver will try to clear a used descriptor before the index of the descriptor is fully written by the device. When this issue is triggered, the virtio-net device will be unable to transmit packets. This leads to VMs using rust-vmm/vm-memory having their network effectively disconnected by outside network traffic, resulting in both a DoS vector and an availability issue under normal at-load operations.
For a VMM to be affected, it must run on aarch64 (built with either musl or glibc), or on x86_64 with a musl build. All VMMs using rust-vmm/vm-memory (any release) in a production scenario, and that take arbitrary traffic over the virtio-net device, are confirmed to be at risk of a DOS. All VMMs using rust-vmm/vm-memory (any release) in a production scenario with a virtio-net deice are under availability risk. All VMMs using rust-vmm/vm-memory (any release) in a production scenario using other devices that expect atomic reads for more than 1-byte values may also be affected, but we are unaware of any risk for other devices (beyond the guest freezing its own virtio stack).
Please add a CODEOWNERS file with the details of the maintainers. Please use the following format as outlined in: https://help.github.com/en/articles/about-code-owners
Most projects can simply follow the wildcard syntax. e.g.
* @owner1 @owner2
Not only does this provide a way to see who is responsible or this repository they will also automatically be notified of incoming PR reviews.
with_regions
, with_regions_mut
and map_and_fold
were just papering over the lack of iterator-based access to the regions in GuestMemory
. After #130 closes out #8, they will no longer be needed and should be deprecated and eventually removed.
GuestMemoryMmap only has a:
pub fn new(ranges: &[(GuestAddress, usize)]) -> std::result::Result<Self, MmapError> {
constructor.
We need a way to build a GuestMemoryMmap that's full of GuestRegionMmap's (e.g. that are created from_fd's).
So either a new constructor or an 'insert' method.
The GuestMemory implementation of the read_from
function of the Bytes
trait is equivalent to read_exact_from
.
In analogy, the write_to
implementation is equivalent to write_all_to
.
This is because both functions rely on a call to try_access
, which will run until an irrecoverable error occurs (not ErrorKind::Interrupted
) or count
bytes are transferred.
This semantic is correct for read_exact_from
and write_all_to
, assuming they have similar meaning to the read_exact
and write_all
functions of the std::io::Read
and std::io::Write
traits.
For the "non-exact" counterparts, only one read/write should occur.
With code evolves, now we have several constructors for GuestMemoryMmap:
pub fn new(ranges: &[(GuestAddress, usize)]) -> result::Result<Self, Error>
pub fn with_files<A, T>(ranges: T) -> result::Result<Self, Error>
pub fn from_regions(mut regions: Vec<GuestRegionMmap>) -> result::Result<Self, Error>;
pub fn from_arc_regions(regions: Vec<Arc<GuestRegionMmap>>) -> result::Result<Self, Error>
With these four constructors, we still have no constructors to create an empty GuestMemoryMmap instance. All these constructors assume that memory regions are static and created at boot time. And memory hotplug breaks this assumption.
So how about refining the constructors as:
pub fn new() -> Self
pub fn from_ranges(ranges: &[(GuestAddress, usize)]) -> result::Result<Self, Error>
pub fn from_ranges_with_files<A, T>(ranges: T) -> result::Result<Self, Error>
pub fn from_regions(mut regions: Vec<GuestRegionMmap>) -> result::Result<Self, Error>;
pub fn from_arc_regions(regions: Vec<Arc<GuestRegionMmap>>) -> result::Result<Self, Error>
Note, these changes may break existing code in CrosVM, firecracker and cloud hypervisor projects.
After #19 gets merged, looks like we'll have an extra Cargo.toml
feature (integer-atomics
) and an extra dependency (cast
) because of missing Rust features that got added/stabilized in 1.34
. We should remove these as soon as we can assume everyone uses 1.34
or a newer version.
According to rust doc for Read::read():
If this function encounters any form of I/O or other error, an error
variant will be returned. If an error is returned then it must be
guaranteed that no bytes were read.
An error of the [ErrorKind::Interrupted
] kind is non-fatal and the
read operation should be retried if there is nothing else to do.
So we should silently ignore [ErrorKind::Interrupted
] and retry,
otherwise it will false alarms/wrong results.
While tempfile is a nice utility that creates temp files which are automatically deleted when the object is out of scope, it is also a burden in terms of dependency.
Having the tempfile crate as a dependency adds 17 dependencies to the vm-memory crate. I propose to get rid of it and instead have a dummy wrapper that we can use in tests.
I just got some more time to look to the vm-memory code. Should the memory regions and the guest memory traits depend on traits instead of actual implementations?
We have Address which is a trait and GuestAddress which implements the trait. Then when creating the memory region trait instead of using the address trait we are using the GuestAddress. Is this intended? If yes then what is the point of having Address as a trait? It looks like we are limited to using u64 addresses.
Can we add a changelog to this repository? It helps in keeping track of changes between releases.
Can we also tag the commit from which you publish to crates.io so that people can easily identify the code version starting from the crates.io version? This is what we've done for the other crates that are published as well. I can do that if the maintainers are on board with this.
When using the GuestMemory
implementation of read_exact_from
and write_all_to
, I was expecting they would handle the retry, but they simply return an Error saying they couldn't read or write the entire buffer. That's the wrong behavior IMO, especially because of the naming that suggest they should behave exactly as the Rust traits Read
and Write
.
The proper solution would be to call underlying functions read_exact_from
and write_all_to
from the GuestRegionMmap
implementation after we found the regions affected by the range, leading to a proper retry when not all data has been read or written.
We should fix logic that handles guest memory access and does not guarantee the operations will have volatile semantics. One example is the copy
here:
vm-memory/src/volatile_memory.rs
Line 277 in 08b24bf
For details check rust-vmm/community#14
Evaluate the possibility to test vm-memory interfaces & implementation using fuzzing techniques.
#95 fixed the problem of torn reads/writes caused by the implementation of read/write_obj
essentially leveraging the environment specific memcpy
, and the implicit assumption that read/write_obj
perform atomic accesses up to a certain size at aligned addresses. Meanwhile, we added the load/store
operations to Bytes
which provide explicit atomic access semantics. We should document that read/write_obj
must not be used when atomicity is required, and revert the related changes from #95.
Here are some next steps we should consider:
read/write_obj
with respect to atomicity.vm-memory
and/or discuss here about the implications of the transition.read/write_obj
s that are supposed to be atomic with load/store
s. Let's do a 0.3.x
(or later) release right before removing anything, to help with the transition.0.4.x
(or later).The mmap API returns MmapError
but it's not exported.
For callers to propagate the right errors from the vm-memory crate, this needs to be exported.
As we merge the remaining PRs, and the code base appears to settle down, we should start thinking about publishing vm-memory
on crates.io
(or identify any reasons to hold back and for how long). One semantic versioning question is whether we want to bump the version to 1.0.0
first, or we're fine with staying at 0.something
for the time being.
Let's track any work/concerns related to publishing on this issue. From our side, I'm currently working towards merging #29 and going through another full code review of the crate to make sure things look fine for the upcoming Firecracker integration.
In theory it is possible to take a pointer to a mmaped memory region and read/write outside of the region's bounds.
We could prevent this behavior by adding a guard page at the beginning and at the end of each region.
Hi,
Although there are few changes, releasing vm-memory early would allow to merge rust-vmm/vm-virtio#33.
Thanks
This function gets called a lot and appears in the performance reports when running cloud-hypervisor.
This currently has an O(n) algorithm and a simple fix I think would be to keep the vector of regions sorted and then use Vec::binary_search_by
to give us an O(lg n) lookup. The added cost of sorting the regions when new ones get added would be minimal compared to the number of times .find_region()
is called
These would return a Result<>
, similar to get_host_address
, but the resulting object would be range-checked and allow to access guest memory without unsafe code.
When implementing the fix for #95, we introduced a performance degradation on (some?) glibc builds. On Firecracker with iperf3 and we observed a performance degradation of 5% for glibc builds.
On Cloud Hypervisor, the performance degradation is significantly worse. Observed impact is up to 50%. More details here: cloud-hypervisor/cloud-hypervisor#1258
Opening this issue so that we can decide on the next steps for fixing the performance degradation.
My 2-cents: I wouldn't want to introduce this fix only for x86_64 musl builds & aarch64 glibc & musl builds because we cannot know for sure what glibc versions are people using out there, hence we cannot know for sure that glibc is doing the right thing (i.e. optimizing memcpy at a higher granularity than 1 byte).
I would say that the underlying problem which is the reason for the performance degradation & the bug, is that we lose type information about the object that needs to written/read. I would rather like us to work on improving the interface so that type information doesn't need to be sort-of inferred (by checking alignments & reading/writing in the largest possible chunks). I would need to do some experiments before having a solution here.
Another thing that I believe is of paramount importance at this moment is to add performance testing to vm-memory. Pretty much every other function in vm-memory is on the critical performance path. We should make sure to not introduce regression here as we continue the development.
CC: @sboeuf @rbradford @sameo @alexandruag @serban300 @bonzini
It just makes the documentation look nicer.
We should clarify some aspects of the GuestMemory
interface, and a good starting point is around atomicity semantics. As Gerry mentioned in this comment, there are a couple of ways to do an atomic access, but I think all of them have some downsides right now.
get_atomic_ref/array_ref
are a bit unwieldy, but more important, they're only available when the address falls within a guest region that's backed by host memory. This means that users of generic M: GuestMemory
objects must either add a fallback code path, or rely exclusively on read/write_obj
.
The main issue with read/write_obj
is they don't have well-defined semantics at the interface level. One simple contract we could go for is "read/write_obj
provide atomic accesses for certain sizes at aligned memory locations" (with the caveat that different archs provide atomicity guarantees up to different widths). A more verbose (but clearer and overall safer IMO) alternative is to expand the interface around read/write_obj
to something like the following pseudocode:
// A type (easily convertible to/from GuestAddress) which guarantees the
// inner value is aligned with respect to T.
pub struct AlignedGuestAddress<T>(/*...*/);
// Just a marker.
pub unsafe trait AtomicAccess: ByteValued { }
impl GuestMemory {
// ...
// Always atomic!
fn read_atomic<T: AtomicAccess>(addr: AlignedGuestAddress<T>) -> Result<T>;
// The next two methods are not guaranteed to be atomic.
fn read_obj<T: ByteValued>(addr: AlignedGuestAddress<T>) -> Result<T>;
fn read_obj_unaligned<T: ByteValued>(addr: GuestAddress) -> Result<T>;
// Similar for write ...
}
I think adding something like AlignedGuestAddress
is a useful abstraction, because many implementations (i.e. device model building blocks) provide/demand alignment guarantees for certain addresses, and it makes sense to explicitly leverage the additional information or model the extra requirement. It also helps with making operation semantics more explicit.
WDYT?
Currently the MmapRegion provides several methods to create MmapRegion objects, but we often need to change the existing methods to support new functionalities. So it would be better to adopt Builder pattern and implement MmapRegionBuilder, so we could add new functionality without affecting existing users.
The front page of the vm-memory crate on crates.io is pretty much blank.
We should update the manifest to specify the path of the readme so that creates.io can display it as the front page of the crate.
I think the next step for the vm-memory is to define types (Acpi, Ram, Persistent, ...) of memory regions.
The first use case I have for this is to support virtio-pmem
properly from the cloud-hypervisor VMM. The idea is that virtio-pmem memory map the backing file into the VMM address space, to make it available to the VM. So this is a clear use case where we want the portion of memory exposed from virtio-pmem to be part of the GuestMemory as a separate region. But this region cannot be considered the same as the region representing the RAM, otherwise the virtio device implementations will be broken, thinking that it is part of the VM RAM.
That's why having types associated with regions could be a simple way for any code in the VMM to check if the region should be considered or ignored, depending on what you're trying to achieve.
I think this should be added once #29 will be merged.
@bonzini @jiangliu @alexandruag WDYT?
Some asserts (such as assert!(offset < count);
in read_from
and write_to
) get triggered when the respective methods are called with count == 0
. It's an edge case, but still valid and we should not treat this as an exceptional situation.
Using ?
on GuestMemoryMmap::new
and similar, in a function returning a generic Error, doesn't work, because GuestMemoryMmap::new
returns a Result with an error type of MmapError
, and MmapError
doesn't impl std::error::Error
.
Fixing this would allow cleaning up many uses of .unwrap()
.
Currently we have methods such as with_regions
, with_regions_mut
and map_and_fold
that are just papering over the lack of iterator-based access to the regions in GuestMemory. If possible we should try to add a better and more powerful abstraction.
See also rust-vmm/linux-loader#20, rust-vmm/linux-loader#23 (comment)
struct_util
is a generic utility module that reads structs from bytes. It should be included in this utility crate and consumed by linux-loader
from here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.