This is a mirror of official site: http://jasper-net.blogspot.com/

XmlWriter, encodings and BOM

| Thursday, March 18, 2010
Today I want to talk about XmlWriter and the generation of a Byte Order Mark  (BOM).

XmlWriter provides an API that generates, unsurprisingly, XML. This XML will typically end up as a managed string of characters or possibly a sequence of bytes. Of course, text transformed into bytes implies an encoding, as previously discussed.

Now XML has its own ways of determining the encoding that a document has, by peeking at the first bytes that make up an opening <?xml declaration or, more explicitly, with the encoding on this declaration.

Unicode is used for all sorts of puposes, not just XML encoding, and so it also has a mechanism to distinguish between small-endian and big-endian encodings, which determine which byte comes first in UTF-16 and UTF-32. It's also allowed for UTF-8, for that matter.

How do these mechanisms interact when using the .NET Framework classes? Let's write some code!

First, we'll write a short helper method to display the contents of a byte array.

private static void ShowBuffer(string linePrefix, byte[] bytes, long length) {
 int bytesOnLine = 0;
 for (long i = 0; i < length; i++) {
   if (bytesOnLine == 0) {
     Console.Write(linePrefix);
   }

   Console.Write("{0:X2} ", bytes[i]);
   bytesOnLine++;
   if (bytesOnLine > 16) {
     Console.WriteLine();
     bytesOnLine = 0;
   }
 }
}

Read more: Marcelo's WebLog

Posted via email from jasper22's posterous

0 comments: