[DllImport("foo.dll", CharSet = CharSet.Unicode)]
static extern void Process_utf16(string text, int text_length);
The native function is written to receive utf-16 data (which is also what .net strings use.) It does not use the string data after it returns. Therefore, I'm trying to ensure that the pointer to the string's buffer is passed directly without any unnecessary allocation or copying.
In this declaration, is a pointer to the string's buffer passed with no allocation? Or is a temporary buffer allocated and the string copied to it? If an allocation occurs, is it to the native or managed heap? And who is responsible for deallocating it?
Note that the code above has been tested and works, I'm just trying to find out whether it incurs an allocation and copy, and if so, how to avoid it.
The rules for marshaling strings are described in "Default Marshaling for Strings". For native functions (platform interop), the documentation specifies:
Platform invoke copies string arguments, converting from the .NET Framework format (Unicode) to the platform unmanaged format. Strings are immutable and are not copied back from unmanaged memory to managed memory when the call returns.
However, as can be trivially established experimentally, this is not true if no conversion is necessary at all, that is, the method is decorated with CharSet.Unicode
or the string is explicitly marked as MarshalAs(UnmanagedType.LPWStr)
. In this case, a pointer to the string contents is passed directly. This sounds very efficient, and it is, but it's also dangerous because there's nothing to stop the unmanaged function from modifying the string passed to it. This is bad because .NET strings are supposed to be immutable, and code may depend on that. It's especially bad if this ends up overwriting the string intern pool.
trample.c
:
__declspec(dllexport) void __stdcall Trample(wchar_t* text) {
memcpy(text, L"Adios", (sizeof L"Adios") - 2);
}
Program.cs
:
static class NativeMethods {
[DllImport("trample.dll", CharSet = CharSet.Unicode)]
public static extern void Trample(string text);
}
class Program {
static void Main(string[] args) {
Console.WriteLine("Hello, world!");
NativeMethods.Trample("Hello, world!");
Console.WriteLine("Hello, world!");
}
}
Output:
Hello, world!
Adios, world!
Since "Hello, world!"
is a string literal, all instances of it end up in the string intern pool, and every time it's used we're using "the same" string. Our unmanaged function overwrites this, so now whenever we think we're writing "Hello, world!"
in our managed code, we end up with something else instead. Oops.
The way to avoid this if you know the unmanaged function changes the string is to pass a StringBuilder
instead. Here you can even choose if you want to copy to be in, out or both (with InAttribute
/OutAttribute
). This does involve copying to/from buffers -- specifically, CoTaskMemAlloc
will be used to allocate memory for the unmanaged code (and CoTaskMemFree
will be called when the call is done). This code is invoked by the marshaler as part of the call; neither the managed caller nor the unmanaged callee need to concern themselves with this.
Calling a function that expects an ANSI string also involves allocating a buffer, but in this case the buffer is allocated using the localloc
instruction, not CoTaskMemAlloc
, which is bunches more efficient. That said -- if you are actually going for efficiency, what you want to do is eliminate calling unmanaged code altogether if possible, not merely optimizing string passing. Even ignoring copying memory, there's quite a bit of overhead involved in managed/unmanaged transitions. If you find yourself calling unmanaged code in a loop, it pays off to see if that code can be ported to managed code.
Source: coreclr/src/vm/ilmarshalers.cpp
, specifically ILWSTRMarshaler::EmitConvertSpaceAndContentsCLRToNativeTemp
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With