Jasper22.NET: XmlWriter, encodings and BOM

Posted by jasper22 at 10:41 | Thursday, March 18, 2010

Today I want to talk about XmlWriter and the generation of a Byte Order Mark (BOM).

XmlWriter provides an API that generates, unsurprisingly, XML. This XML will typically end up as a managed string of characters or possibly a sequence of bytes. Of course, text transformed into bytes implies an encoding, as previously discussed.

Now XML has its own ways of determining the encoding that a document has, by peeking at the first bytes that make up an opening <?xml declaration or, more explicitly, with the encoding on this declaration.

Unicode is used for all sorts of puposes, not just XML encoding, and so it also has a mechanism to distinguish between small-endian and big-endian encodings, which determine which byte comes first in UTF-16 and UTF-32. It's also allowed for UTF-8, for that matter.

How do these mechanisms interact when using the .NET Framework classes? Let's write some code!

First, we'll write a short helper method to display the contents of a byte array.

private static void ShowBuffer(string linePrefix, byte[] bytes, long length) {
int bytesOnLine = 0;
for (long i = 0; i < length; i++) {
if (bytesOnLine == 0) {
Console.Write(linePrefix);
}

Console.Write("{0:X2} ", bytes[i]);
bytesOnLine++;
if (bytesOnLine > 16) {
Console.WriteLine();
bytesOnLine = 0;
}
}
}

Jasper22.NET

XmlWriter, encodings and BOM

Archive

Random sites

Followers

Search This Blog