Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a "string in little-endian UTF-16 encoding" with BinaryReader

I am following this specification of this file format: https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec

utf16: string in little-endian UTF-16 encoding

How do I read this? I tried BinaryReader.ReadString() however it returns something along the lines of:

"\0e\0y\0w\0o\0r\0d\0\0 \0\0\0\0\rP\0a\0r\0a\0m\0e\0t\0e\0r\0N\0a\0m\0e\0\0 \0\0\0\0\fC\0o\0n\0f\0i\0g\0S\0t\0r\0"

That definitely isn't right.


From the specification:

ubyte: number of UTF-16 characters (not bytes) of the name of the field
utf16: name of the field
ubyte: number of UTF-16 characters (not bytes) of the alias of the field. Might be 0
utf16: alias of the field (ommitted if previous field is 0)
ubyte: field type ( 0 = int16, 1 = int32, 2 = float32, 3 = float64, 4 = string, 5 = datetime, 6 = objectid, 7 = geometry, 8 = binary, 9=raster, 10/11 = UUID, 12 = XML )

Could I somehow use the number of UTF-16 characters to read the name of the field?

like image 825
Evan Parsons Avatar asked Mar 18 '26 04:03

Evan Parsons


2 Answers

BinaryReaders ReadString() method doesn't provide an overload where you can specify the string length (instead it assumes an encoded prefixed length, which doesn't match the format of the spec you linked).

Therefore, you cannot use ReadString() directly, but you can

  1. use ReadByte() to get the string (character) length,
  2. multiply it by 2,
  3. use ReadBytes(count),
  4. use Encoding.Unicode.GetString(bytes).
like image 178
ulrichb Avatar answered Mar 23 '26 02:03

ulrichb


It should be:

BinaryReader br = new BinaryReader(File.Open("C:\\florida.gdb\\a00000002.gdbtable",
                                   FileMode.Open,
                                   FileAccess.Read,
                                   FileShare.Read | FileShare.Delete),
                      Encoding.Unicode);

Where Encoding is System.Text.Encoding.


For various historical reasons, Microsoft/Windows refer to UTF-16 (and, specifically, the little-endian variant) as "Unicode" rather than UTF-16.

like image 23
Damien_The_Unbeliever Avatar answered Mar 23 '26 03:03

Damien_The_Unbeliever