I am trying to implement a display driver wrapper in (no_std) Rust, that is responsible to write an sequence of pixels into a frame buffer. The pixel data is coming from the C world, and representing valid Rgb565 16bit pixels already. The pixel buffer is referred by void* pointer to the first pixel and the length is known. My intention is to create a Color::Rgb565 slice from the pixel data, and turning that into an interator and call embedded_graphics_core::draw_target::fill_contiguous() with that pixel slice.
My (unsafe) code:
if !color_p.is_null() {
let color_p = color_p as *const Rgb565;
let colors = core::slice::from_raw_parts(color_p, (w*h) as usize );
let colors_it = colors.iter();
display.fill_contiguous(&r, colors_it);
}
Unfortunately, the fill_contiguous() function has a very specific trait bound on the iterator:
fn fill_contiguous<I>(
&mut self,
area: &Rectangle,
colors: I,
) -> Result<(), Self::Error>
where I: IntoIterator<Item = Self::Color> { ... }
This leads to the following compile error, due to the fact that my iterator is returning references to elements, instead of the elements themself:
40 | display.fill_contiguous(&r, colors_it);
| --------------- ^^^^^^^^^ expected `Rgb565`, found `&Rgb565`
| |
| required by a bound introduced by this call
The compilation can be fixed easily by copying the pixels by modifying the 4th line of my code snippet using copied() on the iterator and the code is compiling and working fine then:
let colors_it = colors.iter().copied();
However, this unnecessary copy is highly undesirable for performance reasons, as this code is executed for every changed pixel at the refresh rate of the display. Is there any more elegant way to deal with this issue without copying every pixel?
I think you're misunderstanding what a "copy" in this situation is – .copied() is not only the correct way to write this, it is actually desirable for performance purposes (the code will be no slower and might even be faster).
Rust Iterators are lazy – they only provide data at the moment it's requested. If you have a chain of iterator adaptors, then (in most cases, including the case of .copied()) they don't do anything as a bulk operation in advance: instead, they modify how the iterator returns the data. So, if you place .copied() on an iterator, it means "this iterator copies the data out of the reference as it returns it". There's no bulk copy done in advance – instead, it basically implies a copy into the code that makes use of the iterator values.
Now think about how a reference is represented on the machine at runtime: it's a memory address that lets you know where to find the particular piece of data. So if you were iterating over references, the code would, for each pixel, work like this:
colors_it finds the location in memory that contains the pixel;colors_it passes the location in memory to fill_contiguous;fill_contiguous needs to know what color the pixel has, so it reads the memory at that location to find out. This is a copy – the pixel has been copied out of memory by fill_contiguous but also still remains in memory at its original location.With .copied(), the code instead works like this:
colors_it finds the location in memory that contains the pixel;.copied() reads the memory at that location, to find out what color the pixel has, and passes that to fill_contiguous;fill_contiguous needs to know what color the pixel has, but fortunately .copied() has already calculated that for it. So it does not need to do any additional reading of memory itself.The important point here is that in both cases, the code has to read the memory for every pixel once, and adding .copied() didn't cause it to read memory any more than it otherwise would. You can think of this as a "copy-during-read" optimisation: you are reading the memory anyway, so if you have a small Copy value, it is very cheap to do the copies as you read it. After all, once the processor reads the data from memory, it will be in a processor register and also in the original memory, so you get the copy for free.
It is therefore very likely that the .copied() version will lead to the same code as you would have written otherwise (and the reason that fill_contiguous is written to take values rather than references is probably to steer you into writing the code with .copied() and getting optimal performance). If it's different, it is probably faster: the name Rgb565 makes me think that this is a 16-bit value, whereas on most platforms references are 32 bits or larger (so the reference version would be trying to pass larger values into the function than the copied version, and there would othewrise be no difference).
As a quick summary, .copied() is almost always faster if the things you are copying are no larger than a reference: the only exception is if the function that consumes the iterator ignores most of the values from it, and even that case isn't always an exception because the Rust standard library has a number of optimisations designed to handle that case specifically. (.copied() can sometimes be slower if you are copying large things.) The reason is that you have to read the values from memory anyway, and you get the copies "for free" while you're doing that. In fact, adding .copied() is often a good way to improve the performance of code, because usually there has to be a copy done anyway, and using .copied() helps to ensure that it happens at the best possible moment.
If you want to fill a rectangle with colors, you need to copy those colors from whatever source to the target rectangle. This is exactly what .copied does, there is no additional copy here.
Moreover note that Rgb565 colors are 16-bits while references are at least 32 bits (and more likely 64 bits on most modern platforms), so iterating with .copied is likely faster than iterating without it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With