I was doing a code review this morning for a coworker and the change he made relied heavily on win32 InterlockedXXX() functions so that he didn't need to use a critical section.  He thought that this would be faster.  I have used those methods before but never really understood how they worked so I did a little detective work and discovered this old MSN Magazine article.

It turns out that the InterlockedXXX() functions merely utilize functionality (such as the XADD instruction) supplied by the underlying CPU and implemented to be multi-processor safe.  You learn something new every day! :)