This is a mirror of official site: http://jasper-net.blogspot.com/

WCF Binary XML and dictionaries

| Thursday, January 27, 2011
One of the encodings which come with WCF (since its first version, in .NET Framework 3.0) is a fast and lightweight encoding for XML documents. The WCF Binary XML format (“officially” called .NET Binary Format: XML Data Structure – http://msdn.microsoft.com/en-us/library/cc219210(v=PROT.10).aspx). It is essentially a new way of representing XML, without using the “normal” angle-bracket notation (<element attr="value">the content</element>), resulting in a smaller document (for example, all “end element” nodes are represented by a single byte in the binary format (or even none, as some text nodes have the information that they are followed by an end element). That already results in a significant compression of the data.

However, where the binary format can really shine is when we use its dictionaries. Essentially, a dictionary string is a string which can be identified by a small integer. So instead of having to output the whole name (for example, the name ‘Person’ would be encoded in 7 bytes – 1 for the string length and 6 for each of the letters), if this string belonged to one of the dictionaries agreed upon by the two parties (one which is encoding the XML in the binary format, the other which will read the encoded binary into XML), it would normally use 1 (or 2) bytes to be represented.

An example can show it easier. The following “.NET Binary XML” document represents the XML <Envelope></Envelope>:

40 08 45 6E 76 65 6C 6F 70 65 01

The first byte (0x40) indicates that this is a “ShortElement” record, followed by the length of the element name (0x08), followed by the UTF-8 encoding of the local name (0x45 … 0x65). Finally, the last byte (0x01) represents an end element.

If the string Envelope is part of a dictionary, however, the same document can be rewritten replacing the 9 bytes used for the element with its dictionary id. Assuming that the dictionary id of “Envelope” is 0x02, this is what the same document would be encoded:

42 02 01

The first byte was changed to indicate that this is a new type of record: a “ShortDictionaryElement”. It’s followed by the dictionary id of the string it represents.

Two dictionaries
Replacing the strings with its dictionary ids is without question a great size reduction strategy for the binary encoding.

Read more: Carlos' blog

Posted via email from Jasper-net

0 comments: