This is a mirror of official site: http://jasper-net.blogspot.com/

Strings, immutability and persistence

| Thursday, July 21, 2011
Todays post is based on a question from StackOverflow; I liked it so much I figured hey, let's just blog it today.

When you look at a string in C#, it looks to you like a collection of characters, end of story. But of course, behind the scenes there is a data structure in memory somewhere to implement that collection of characters. In the .NET CLR, strings are laid out in memory pretty much the same way that BSTRs were implemented in OLE Automation: as a word-aligned memory buffer consisting of a four-byte integer giving the length of the string, followed by the characters of the string in two-byte chunks of UTF-16 data, followed by two zero bytes. (Recall that BSTR originally stood for "BASIC string", because the OLE Automation team was actually part of the Visual Basic team; this is the format that Visual Basic used.)

Using this as the internal implementation of strings has a number of benefits. For example: it only requires one heap allocation per string. The length can be determined without counting characters. The string can contain embedded zero bytes, unlike formats that use zero bytes as end-of-string markers. If you disregard surrogate pairs then the nth individual character can be fetched in O(1) time, unlike in variable-width encodings like UTF-8. If the string is pinned in place and contains no zero characters then the address of the string data can be passed unmodified to unmanaged code that takes a WCHAR*. And so on.

Strings are immutable in .NET, which also has many benefits. As I've discussed many times, immutable data types are easier to reason about, are threadsafe, and are more secure. (*)

One of the benefits of the immutable data types I've talked about here previously is that they are not just immutable, they are also "persistent". By "persistent", I mean an immutable data type such that common operations on that type (like adding a new item to a queue, or removing an item from a tree) can re-use most or all of the memory of an existing data structure. Since it is all immutable, you can re-use its parts without worrying about them changing on you.

Read more: Fabulous Adventures In Coding
QR: strings-immutability-and-persistence.aspx

Posted via email from Jasper-net

0 comments: