When working with a large code base, finding reasons for bizarre bugs can often be like finding a needle in a hay stack. Finding out why an object gets corrupted without no apparent reason can be quite daunting, especially when it seems to happen randomly and totally out of context.
Take the following scenario as an example. You have defined the a class that contains an array of characters that is 256 characters long. You now implement a method for filling this buffer with a string passed as an argument. At this point you mistakenly expect the buffer to be 256 characters long.
At some point you notice that you require another character buffer and you add that after the previous one in the class definition. You now figure that you don’t need the 256 characters that the first member can hold and you shorten that to 128 to conserve space. At this point you should start thinking that you also have to modify the method defined above to safeguard against buffer overflow. It so happens, however, that in this not so perfect world this does not cross your mind.
Buffer overflow is one of the most frequent sources for errors in a piece of software and often one of the most difficult ones to detect, especially when data is read from an outside source. Many mass copy functions provided by the C run-time provide versions that have boundary checking (defined with the _s suffix) but they can not guard against hard coded buffer lengths that at some point get changed.
Finding the bug
Getting back to the scenario, you’re now wondering why does the second string get modified with data that makes no sense at all. Luckily, Visual Studio provides you with a tool to help you with finding just these kinds of errors. It’s called data breakpoints.
To add a data breakpoint, you first run your application in debug mode or attach to it in the usual way, and then go to Debug, select New Breakpoint and New Data Breakpoint. In the popup that opens, you can type in the memory address and the amount of bytes you wish to monitor. You can also use an expression here, but it’s often difficult to come up with an expression for data in an object allocated on the heap when not in the context of a certain stack frame.
There are a couple of things to note about data breakpoints, however. First of all, Visual Studio supports a maximum of four data breakpoints at any given time. Another important thing to notice is that some C run-time functions modify memory in kernel space which does not trigger the data breakpoint. For instance, calling ReadFile on a buffer that is monitored by a data breakpoint will not trigger the breakpoint.
The application will now break at the address you specified it to. Often you might immediately spot the issue but the very least this feature can do is point you in the right direction in search for the real reason why the memory gets inadvertently modified.
Data breakpoints are a great feature, especially when doing a lot of low level operations where multiple locations modify the same data. With the exception of some special cases, like kernel memory modification, you can use it whenever you need to check when memory at a certain location gets changed on purpose or inadvertently.
Just recently I bumped into a very nasty bug that I had been unfortunate enough to conjure. Alignment of memory has never been my primary concern when working on the PC. As a typical C++ programmer you often don’t have to think about such things. On the PC this is usually “almost never” (when not optimizing, that is) and in a managed environment this truly should become “never”. On ARM, however, “never” becomes “almost never” again.
Having your memory aligned means storing values of different sizes at addresses that are multiples of a certain number. The typical CPU gives you bonus when you have your memory properly aligned but does not kick you when it’s not. ARM, on the other hand, is not that nice.
Issues arise when you try to write a memory block at a random location by interpreting that location has a value of a given type, as is done in the following example.
void *pMyBuffer = malloc( sizeof( int ) * 2 );
*(int *)( (char *)pMyBuffer + 2 ) = 10;
Malloc returns memory aligned at the worst case boundary so writing at the beginning of the memory block would be ok. Writing an int at a two byte offset means you’re writing at a memory address that’s not a multiple of four. Optimized code on ARM does not like this.
The easiest way around this is writing byte by byte when absolutely necessary. When compiling with GCC, it appears that disabling compiler optimizations might also get you around this issue. I think we can all agree that that’s not the best solution to this problem, however.
The problem I recently ran into was related to this issue but it occurred in a managed environment! I was running Mono on Android and was using a Vector3 structure for which I had explicitly specified the memory alignment by telling the run-time to pack the structure at byte boundaries. The issue arose when I was accessing and instance of the structure embedded in another class.
I was initially puzzled, since I had tested the structure with another class and all had worked fine. After rigorous testing, and fixing the issue by specifying the structure at a different location within the class, I finally figured that it must be a memory alignment issue.
What I find somewhat interesting is that I hadn’t specified the packing of the containing class and thus the run-time should still have been able to align my memory correctly in the containing class while still respecting the layout of my Vector3 instance. This is actually something that led me into believing that there is a bug in Mono. Feel free to comment on this post if anyone has any insight on this.
The lesson that we all should learn from this? Sometimes it helps to know what happens under the hood, even in a managed environment.