Introduction
Errare humanum est (Latin for "to err is human") is a well known proverb which is applicable in nearly every aspect of life. You probably won't be able to find a programmer who has never encountered an error when running a program and has searched for ages to find the cause of it. Today, where code is often widely spread or even outsourced to the Cloud (think of Windows Azure, for example), this task gets even more complicated. That is the reason why efforts were always made to find mechanics that simplify debugging. When working with C++, there is also another problem that makes error finding very hard. After compilation, there is nothing left that lets us map an error onto a particular line of source code or even to a certain function. All names are "lost".
With PDB files, Windows offers us a powerful tool to deal with the problem mentioned in the last paragraph as they preserve information about the executable after compilation. With the help of those files, you are able to recover the source code as it was before compilation from the bits and bytes at runtime. The whole API that covers the work with PDB files and symbols is extremely huge and powerful. It is my goal to provide you an introduction to every aspect of that API in this article. But due to the fact that it is so big, this may take some time, and if you are interested in the topic, it is a good idea to recheck if there are new versions of this article with more things explained.
Loading symbols
Loading symbols for a process is a very easy task. All you have to do is call SymInitialize with the handle of the process for which the symbols should be loaded. If you are debugging a foreign process, you should pass the handle of that process to the function and not the handle of the application that debugs the process, because you want to load the symbols of the process that gets debugged. As a process may have loaded additional modules (DLLs) which also could have symbols, it is important to see if the function also loads the symbols for those modules. In fact, you are free to choose if they should be loaded or not. The third parameter of SymInitialize is a boolean indicating if the process should be "invaded", which means that symbols for all loaded DLLs will be loaded too, or not.
Usually, SymInitialize is called right after the process has started, and SymCleanup is used to unload all symbols when the process finishes. When calling SymInitialize multiple times for the same process, it fails, giving ERROR_INVALID_PARAMETER with GetLastError. This is a bit confusing as you cannot really distinguish multiple calls from an invalid process handle. Nevertheless, in the code attached to this article, you will see that SymInitialize is called in the constructor of a class and can therefore be called multiple times, and ERROR_INVALID_PARAMETER is used like it only indicates multiple calls. When using GetCurrentProcess, this basically is safe enough.
Now, let's imagine you load the symbols for the process at the startup and perform some tasks in that process. This may load new DLLs in the address space of the application and the symbols for those modules may not yet be loaded as they were not present at startup. To make sure symbols for those modules are also loaded, the function SymRefreshModuleList can be used, which is pretty simple to understand. It is also possible to load the symbols for a specific module only using SymLoadModule64, but I recommend you to use it only if you are debugging a process and receive an event that a module was loaded, because in that case, you have all the information to represent the symbols for the module as it is currently loaded.
Obtaining and working with a symbol
After symbols are loaded, you may want to actually get a specific symbol. To obtain symbols, there are various functions which use different approaches:
- SymFromName - Searches a symbol by its name
- SymFromAddr - Searches a symbol by its address
- SymFromIndex - Searches a symbol by its index
- SymFromToken - Searches a symbol by its managed token
Depending on how you have received the symbol, the first things you probably want to know are the name of the symbol and/or its address. Both of them can be accessed in the SYMBOL_INFO structure that gets filled by the above functions. But there is an important thing with that structure: it has a variable length and you are responsible to allocate the space. This is because the name is stored in a variable length array and truncated to fit its size (indicated by a member). As this array is the last member of the struct, you can just allocate as much space as you need to store the whole struct plus the full name, and pass that pointer to the functions. Have a look at the following example:
char memory[sizeof(SYMBOL_INFO) + MAX_SYM_NAME];
PSYMBOL_INFO sym = reinterpret_cast<PSYMBOL_INFO>(memory);
sym->NameLen = MAX_SYM_NAME;
sym->SizeOfStruct = sizeof(SYMBOL_INFO);
SymFromName(GetCurrentProcess(), "myFunction", sym);
//...
// or using malloc:
PSYMBOL_INFO sym = reinterpret_cast<PSYMBOL_INFO>(
malloc(sizeof(SYMBOL_INFO) + MAX_SYM_NAME));
sym->NameLen = MAX_SYM_NAME;
sym->SizeOfStruct = sizeof(SYMBOL_INFO);
SymFromName(GetCurrentProcess(), "myFunction", sym);
//...
free(sym);
Read more: Codeproject
0 comments:
Post a Comment