Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  106 Posts | 8 Stories | 293 Comments | 162 Trackbacks

News



Article Categories

Archives

Post Categories

Image Galleries

Programming

Did you ever wonder why your system at random times hangs? Sometimes it comes back after a few seconds (could simply be paging) but at least once a day I wish I would be able to know why the system is responding so slowly. Before going into kernel land I must confess that I have never written a device driver so my knowledge to kernel mode debugging is quite limited but on the other hand if you did not do this either you will have a much easier time to follow me.

Some hangs seem to be Heisenbugs which disappear when you start looking at them. I have found when I let Process Explorer running on my machine it seems to resolve some issue by its pure presence. It could also be that some malware and Trojan software does not even install when Sysinternals tools are running.

Did you know that you can watch with Process Explorer the call stack of all applications in your system? Simply right click on a process and select Properties and select the Threads tab where you can view the stack for each thread with full function names.

ProcessExplorerStacks

image

 

Wrong Symbols

If on your machine the function names do not appear or they are of the form xxxx.dll +0xdddd where dddd is a rather big number (see below mvfs51.sys where we do not have symbols)  then you are missing the symbols. First of all you need to download Windbg. Why Windbg? To resolve the symbol names you need a good version of dbghelp.dll which is part of Windbg. Most SysInternals Tools have the possibility to configure symbols (Options – Configure Symbols …) and the path to the Windbg version of dbghelp.dll. To make it easier to copy and paste here is the one and only

Symbol Server Path: SRV*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols

The first part is the cache directory to which the symbols will be stored for later retrieval. Armed with this knowledge you should be able to find out the root cause of quite a lot of hangs by simply examining the call stacks.

Kernel Debugging?

On Windows XP you get the full stack including the kernel by simply looking at the process call stack in Process Explorer. With Windows Vista and above you need to run Process Explorer with elevated privileges (File – Run As Administrator in Process Explorer) to get also the kernel stack. The call stack above is typical for user mode only. If your thread stack contains stacks of the form

ntkrnlpa.exe!KiFastCallEntry+0x12a

then you are seeing the full stack including the kernel. When you find in your stack of interest .sys files you just have found a device driver. That is actually very useful to find out why something locks up.

Managed applications will find the native stack view less useful since dbghelp.dll is not able to show the managed call stack. There was one version of Windbg which is able view the mixed mode stack with the kv command in Windbg. But it was withdrawn from MS a few days later. I tried it of course with Process Explorer but it retrieves the call stack in a different way so I was not able see mixed mode stack there (yet).

The deeper reason why this feature has been hold back (at least what this is what the rumors say) has to do with legal reasons. The debugger team used deep CLR know how to walk the managed stack. Because other units within MS are not allowed to use internals of other products they would have to make them public. I am not interested in how illegal this might be but the MS lawyers are very well paid and should be able to sort this out. Seamless call stack tracking with Process Explorer and related tools would be one of my number one feature requests.

A surprisingly simple way to resolve hangs is to check in Google the name of the device drivers in your hang call stack and check for updated device drivers. In my department for example I did see quite a lot of hangs with the following call stack:

 

 

ntkrnlpa.exe!KiSwapContext+0x2f
ntkrnlpa.exe!KiSwapThread+0x8a
ntkrnlpa.exe!KeWaitForSingleObject+0x1c2
TmXPFlt.sys+0xc90d  // Trend Micro Virus Scanner
TmXPFlt.sys+0x306e
ntkrnlpa.exe!ObpCaptureObjectCreateInformation+0x19c
ntkrnlpa.exe!IopfCallDriver+0x31
ntkrnlpa.exe!IopParseDevice+0xa12
ntkrnlpa.exe!ObpLookupObjectName+0x53c
ntkrnlpa.exe!ObOpenObjectByName+0xea
ntkrnlpa.exe!IopCreateFile+0x407
ntkrnlpa.exe!IoCreateFile+0x8e
ntkrnlpa.exe!NtOpenFile+0x27
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntkrnlpa.exe!ZwOpenFile+0x11

mvfs51.sys+0x2df70  // Rational ClearCase Source Control Driver –> Google mvfs51.sys
mvfs51.sys+0x2f850

TmXPFlt.sys+0x1039 // Trend Micro Virus Scanner –> Google TmXPFLT.sys
mvfs51.sys+0x309e6
mvfs51.sys+0x12197


ntdll.dll!NtQueryAttributesFile+0xc
kernel32.dll!GetFileAttributesW+0x79  // User mode call to get the file attributes
csproj.dll!LUtilFileExists+0xe
csproj.dll!CVsProjHostProcInstance::PrepareHostProcExecutable+0x39
csproj.dll!CVsProjHostProcInstance::StartHostingProcessHelper+0xa0
csproj.dll!CVsProjHostProcInstance::StartHostingProcess+0x72

What can we learn from this one? The faulting process was Visual Studio which was about to start its hosting process. It first checks if the executable exists by reading the file attributes from the executable. Since the file is located on a drive with source control system the mvfs51.sys driver from ClearCase does some work. Then the Trend Micro virus scanner hooks in and causes other ClearCase driver calls which go back into the kernel and end up in the virus scanner again which seem to cause the deadlock. In the end the virus scanner did win the hook fight and locked up the process from which it will never recover.

Now you have got a hanging process that cannot be killed by any means. If you try to kill it you will end up with a process with one thread left that is still stuck in the device driver call. If you ever encounter an unkillable process which is still alive after you try to terminate them via the task manager  it is most likely stuck in a device driver call.

High CPU Spikes / Hanging Process

Ok that was the easy part. Now we are getting nearer to Windbg. If you have an application which behaves in strange ways (e.g. has high CPU spikes at some times) I have another SysInternals gem: ProcDump can take memory snapshots of an arbitrary application. It is especially useful if you want to know in which state an application was when it was hung or did eat up all CPU time.

ProcDump v1.1 - Writes process dump files
Copyright (C) 2009 Mark Russinovich
Sysinternals - www.sysinternals.com

Monitors a process and writes a dump file when the process exceeds the
specified CPU usage.

usage: procdump [-64] [-c CPU usage [-u] [-s seconds] [-n exceeds]] [-h] [-e] [-ma] [-r] [-o] [[<process name or PID> [dump file]] | [-x <image file> <dump file> [arguments]]


   -c      CPU threshold at which to create a dump of the process.
   -e      Write a dump when the process encounters an unhandled exception.

   -h      Write dump if process has a hung window.
….

Example: Write up to 3 dumps of a process named 'consume' when it exceeds
         20% CPU usage for three seconds to the directory
         c:\dump\consume with the name consume.dmp:
            C:\>procdump -c 20 -n 3 -o consume c:\dump\consume
Example: Write a dump for a process named 'hang.exe' when one of it's
         windows is unresponsive for more than 5 seconds:
            C:\>prodcump -h hang.exe hungwindow.dmp

The generated .dmp files can be analyzed with Windbg quite easily if you have matching symbols. This is pure user mode debugging but it is easier to start first in user mode and dig only deeper if one needs to.

 

Kernel Debugging / Hanging System

 

When your system has frozen you can not start any new processes so starting a debugger is of little use. Luckily there is a nice trick to force the generation of a kernel dump by pressing a magic key combination: Right Ctrl + Scroll Lock + Scroll Lock will generate a nice looking real blue screen. See instructions below how to enable it. Technically speaking it is a user initiated kernel dump. Please read the phrase again to notice that only the RIGHT Ctrl key in combination with double pressing the Scroll Lock will do the trick.

Before you can generate the blue screen (= kernel dump) you need to set the kernel dump mode to Complete Memory Dump. You can find this menu if you press the Windows Key + Pause and then look in the Advanced System Settings – Advanced – Startup and Recovery

image

 

 

To enable the magic key combination you need to edit some registry settings which are explained deeper on MSDN and a much more elaborate page dedicated to dump file generation and common pitfalls on Windows Server 2008 (especially on computers with much installed memory).

PS/2 Keyboard

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters

DWORD CrashOnCtrlScroll 1

USB Keyboard

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters

DWORD CrashOnCtrlScroll 1

 

Ok now we can successfully generate a memory dump of the kernel and examine it. It is actually quite simple to pinpoint common problems like crashing/hanging drivers with a few commands without the need to understand fully how the kernel works. After the reboot you can open the generated dump file (normally located at C:\Windows\Memory.dmp) with Windbg. Then you need to setup the symbol path (see wrong symbols at the beginning of the article) and now you can execute the !analyze –v command to find out the root cause why the blue screen did occur.

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
MANUALLY_INITIATED_CRASH (e2)
The user manually initiated this crash dump.
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000
Debugging Details:
------------------
BUGCHECK_STR:  MANUALLY_INITIATED_CRASH
DEFAULT_BUCKET_ID:  DRIVER_FAULT
PROCESS_NAME:  Idle
LAST_CONTROL_TRANSFER:  from f754e7fa to 804f8925
STACK_TEXT:  
80548d38 f754e7fa 000000e2 00000000 00000000 nt!KeBugCheckEx+0x1b
80548d54 f754e032 00c0f0d8 0190e0c6 00000000 i8042prt!I8xProcessCrashDump+0x237
80548d9c 8054071d 85904b20 85c0f020 00010009 i8042prt!I8042KeyboardInterruptService+0x21c
80548d9c f758dc46 85904b20 85c0f020 00010009 nt!KiInterruptDispatch+0x3d
80548e50 80540cc0 00000000 0000000e 00000000 processr!AcpiC1Idle+0x12
80548e54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x10

 

In our case the keyboard driver did crash. A closer look reveals that the crash provoked by the user. Lets put this dump aside and have at first a look at a “real” blue screen. A blue screen is actually a well defined exit point which can be triggered by drivers intentionally when it is no longer safe to continue. The function is KeBugCheck which causes the blue screen and dump generation when configured. This function can only be called by kernel drivers. No you can´t blue screen Windows from a user mode application. I have not tried to send to Windows the magic Right Ctrl + Scroll Lock + Scroll Lock combination from a user mode application but I do not think that this will work since the keyboard driver won´t get these events.

Lets have a look at a real crash caused by a driver on a 64 bit machine and analyze it.

7: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 0000000000000000
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------

PROCESS_NAME:  xxxxx

FAULTING_IP:
nt!KeBugCheck+0
fffff800`02261620 4883ec28        sub     rsp,28h

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: fffff80002261620 (nt!KeBugCheck)
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000001
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x0

CURRENT_IRQL:  0

MANAGED_STACK: !dumpstack -EE
OS Thread Id: 0x0 (7)
Child-SP         RetAddr          Call Site

LAST_CONTROL_TRANSFER:  from fffffa6009a2536b to fffff80002261620

STACK_TEXT: 
fffffa60`0b5e0c68 fffffa60`09a2536b : fffffa60`09a59824 00000000`00000008 00000000`00000001 fffff800`022a27d0 : nt!KeBugCheck
fffffa60`0b5e0c70 fffffa60`09a28da2 : fffffa80`231ffaf0 fffff880`10a9f634 fffffa60`00000007 fffffa80`21ad5b40 : mvfs60x64+0x1936b
fffffa60`0b5e0dc0 fffffa60`09a2e5af : fffffa60`09a65ba8 fffff880`00000017 fffffa60`0b5e0eec fffffa80`4ab8733a : mvfs60x64+0x1cda2
fffffa60`0b5e0e70 fffffa60`09a3edc0 : fffffa80`0f1fb260 fffffa80`2038ef40 fffffa80`00000017 fffffa60`09a296e0 : mvfs60x64+0x225af
fffffa60`0b5e0f40 fffffa60`09a42e34 : fffffa80`00000001 fffffa80`0f1fb260 fffffa80`2038ef40 fffffa60`09a55fc4 : mvfs60x64+0x32dc0
fffffa60`0b5e1000 fffffa60`09a48ba0 : fffffa80`215ff7d0 fffffa80`21ad5b40 fffffa60`0b5e10d0 00000000`5346564d : mvfs60x64+0x36e34
fffffa60`0b5e1090 fffffa60`09a4b52a : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x3cba0
fffffa60`0b5e1150 fffffa60`09a4e890 : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x3f52a
fffffa60`0b5e11a0 fffffa60`09a2feb3 : fffffa80`215ff7d0 fffffa80`2280cad0 fffffa60`0b5e12e0 fffffa60`0b5e14a0 : mvfs60x64+0x42890
fffffa60`0b5e12a0 fffffa60`09a4cc00 : 00000000`00000000 fffffa60`0b5e14a0 fffffa60`0b5e1400 fffffa60`0b5e13e8 : mvfs60x64+0x23eb3
fffffa60`0b5e13a0 fffffa60`09a4ee4f : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d75710 fffff800`024e58f4 : mvfs60x64+0x40c00
fffffa60`0b5e1550 fffffa60`09a25fc0 : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d759d8 fffffa80`13683010 : mvfs60x64+0x42e4f
fffffa60`0b5e1590 fffffa60`00c08e17 : fffffa80`1367e7c0 fffffa80`24d75710 fffffa80`24d75a20 fffffa80`13895af0 : mvfs60x64+0x19fc0

fffffa60`0b5e15e0 fffffa60`00c2526c : fffffa80`13895af0 fffffa80`13683010 fffffa80`24d75700 fffffa60`0b5e16a0 : fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x227
fffffa60`0b5e1650 fffff800`024e81f3 : 00000000`00000005 fffffa80`1209f010 00000000`00000040 00000000`00000000 : fltmgr!FltpCreate+0x25d
fffffa60`0b5e1700 fffff800`024e1ec9 : fffffa80`1367e7c0 00000000`00000000 fffffa80`2529a010 00000000`00000001 : nt!IopParseDevice+0x5e3
fffffa60`0b5e18a0 fffff800`024e5db4 : 00000000`00000000 fffffa80`252f7701 fffffa80`00000040 00000000`00000000 : nt!ObpLookupObjectName+0x5eb
fffffa60`0b5e19b0 fffff800`024f2360 : 00000000`80100080 00000000`0047ac08 fffffa60`09a42f01 00000000`00000000 : nt!ObOpenObjectByName+0x2f4
fffffa60`0b5e1a80 fffff800`024f2e98 : 00000000`0047ab98 00000000`80100080 fffffa80`00000000 00000000`0047abb8 : nt!IopCreateFile+0x290
fffffa60`0b5e1b20 fffff800`022610f3 : 00000000`000001e4 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtCreateFile+0x78
fffffa60`0b5e1bb0 00000000`77515fca : 00000000`773ccb6c 00000000`02818b90 00000000`00000002 00000000`009e0000 : nt!KiSystemServiceCopyEnd+0x13
00000000`0047ab28 00000000`773ccb6c : 00000000`02818b90 00000000`00000002 00000000`009e0000 00000000`00000002 : ntdll!ZwCreateFile+0xa
00000000`0047ab30 00000005`16f47c4e : 00000000`02818b90 00000000`80000000 00000000`00000005 00000005`16f46f93 : KERNEL32!CreateFileW+0x26c
00000000`0047ac80 00000005`16f49f76 : 00000000`02819a80 00000000`00000000 00000000`00000000 00000000`0047ada0 : diasymreader!IStreamCRTFile::Create+0xba
00000000`0047acf0 00000005`16f4a10c : 00000000`00000000 00000000`02819a80 00000000`00000000 00000005`16f8158b : diasymreader!MSF_HB::internalOpen+0x36
00000000`0047ad30 00000005`16f36782 : 00000000`02818b00 00000000`0047be00 00000000`00000000 00000000`00000400 : diasymreader!MSF::Open+0x5c
00000000`0047ad70 00000005`16f36ef0 : 00000000`02818b90 00000005`16f033e0 00000000`00000000 00000000`00000ef0 : diasymreader!PDB1::OpenEx2W+0xda
00000000`0047adf0 00000005`16f3753b : 00000000`0000001f 00000000`00000000 00000000`02818b90 00000000`0000000c : diasymreader!PDB1::OpenValidate4+0x7c
00000000`0047ae90 00000005`16f4b43f : 00000000`02818db0 00000000`00000ee4 00000000`02817cd0 00000000`00000ee4 : diasymreader!PDB::OpenValidate4+0x47
00000000`0047aef0 00000005`16f4c2db : 00000000`02818b90 00000000`0281854c 00000000`02817c00 00000000`02818bac : diasymreader!LOCATOR::FOpenValidate4+0x73
00000000`0047af50 00000005`16f4c62e : 00000000`0047be00 00000000`000002fb 00000000`02817cd0 00000000`00000094 : diasymreader!LOCATOR::FLocatePdbPathHelper+0x133
00000000`0047afb0 00000005`16f4c961 : 00000000`0047be00 00000000`00000003 00000000`0047be00 00000000`028139cc : diasymreader!LOCATOR::FLocatePdbPath+0x11a
00000000`0047ba40 00000005`16f37ae3 : 00000000`0047cc40 00000005`16f80e44 00000000`0047cc40 00000005`16f80e44 : diasymreader!LOCATOR::FLocatePdb+0x1b5
00000000`0047bde0 00000005`16f3417a : 00000000`028139a0 00000000`009e92d0 00000000`0047cc40 00000000`00000000 : diasymreader!PDBCommon::OpenValidate5+0x9f
00000000`0047cbb0 00000005`16f5fef0 : 00000000`028139a0 00000000`03084b14 00000000`03084b14 00000000`00000000 : diasymreader!PDB::OpenValidate5+0x36
00000000`0047cc00 00000005`16f2b747 : 00000000`00000000 00000000`00000000 00000000`02808010 00000000`009e92d0 : diasymreader!CDiaDataSource::loadDataForExe+0x90
00000000`0047cd30 00000005`16f1ce3b : 00000000`00000000 00000000`11401f6e 00000000`03074348 00000000`028139a0 : diasymreader!CDiaWrapper::Create+0xbb
00000000`0047cd90 00000005`16f1a704 : 00000000`00000079 00000000`0000004a 00000000`00800000 00000000`106ee840 : diasymreader!SymReader::Initialize+0xa7
00000000`0047cdf0 00000000`106adbfc : 00000000`00000014 00000000`0049c0e0 00000000`00000000 00000000`00000000 : diasymreader!SymBinder::GetReaderForFile+0x170
00000000`0047ce70 00000000`00000014 : 00000000`0049c0e0 00000000`00000000 00000000`00000000 00000000`036c1fc0 : CPI!CPI_GetCallbacks+0x1a52c
00000000`0047ce78 00000000`0049c0e0 : 00000000`00000000 00000000`00000000 00000000`036c1fc0 00000000`00000800 : 0x14
00000000`0047ce80 00000000`00000000 : 00000000`00000000 00000000`036c1fc0 00000000`00000800 00000000`03074348 : 0x49c0e0

STACK_COMMAND:  kb

FOLLOWUP_IP:
mvfs60x64+1936b
fffffa60`09a2536b cc              int     3

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  mvfs60x64+1936b

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: mvfs60x64

IMAGE_NAME:  mvfs60x64.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4a37fa38

FAILURE_BUCKET_ID:  X64_0x0_mvfs60x64+1936b

BUCKET_ID:  X64_0x0_mvfs60x64+1936b

Followup: MachineOwner
---------

From this real live crash we get the faulting 64 bit ClearCase driver on a silver tablet. We have the call stack, managed call stack if available, failing module name and much more presented with one command. If you want to find out why your machine blue screens from time to time the information presented here is sufficient to find the buggy driver and either uninstall the damn thing (lets hope it was not important anyway) or look at the driver vendors homepage to get an updated version. If you really care you can file a bug report and send them your dump to analyze further.

You can convert a full kernel dump into a mini dump by using the .dump outputFileName.dmp command inside Windbg.

 

Now lets see how we can debug a real hang scenario. And I won´t get sidetracked by other interesting details.

When you get a user initiated memory dump you need to find out which processes were running and examine the call stack of the interesting ones. The Windbg command !process 0 0 will give you a complete list of all processes running.

7: kd>!process 0 0

PROCESS fffffa801347a9c0
    SessionId: 1  Cid: 3ac8    Peb: 7fffffde000  ParentCid: 03b4
    DirBase: 19486c000  ObjectTable: fffff8800b78b430  HandleCount: 158.
    Image: mobsync.exe

You can select a specific process by giving the process handle to the process command. This will give you a wealth of information about its current state and all call stacks inside it. That should help to find out where the system was hanging.

7: kd> !process fffffa801347a9c0
PROCESS fffffa801347a9c0
    SessionId: 1  Cid: 3ac8    Peb: 7fffffde000  ParentCid: 03b4
    DirBase: 19486c000  ObjectTable: fffff8800b78b430  HandleCount: 158.
    Image: mobsync.exe
    VadRoot fffffa800f0b1650 Vads 80 Clone 0 Private 1129. Modified 2. Locked 0.
    DeviceMap fffff8800cb88ca0
    Token                             fffff88014d16060
    ElapsedTime                       00:00:57.613
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         155336
    QuotaPoolUsage[NonPagedPool]      7584
    Working Set Sizes (now,min,max)  (2435, 50, 345) (9740KB, 200KB, 1380KB)
    PeakWorkingSetSize                2449
    VirtualSize                       79 Mb
    PeakVirtualSize                   80 Mb
    PageFaultCount                    2545
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      1343

        THREAD fffffa8016695bb0  Cid 3ac8.3c04  Teb: 000007fffffdc000 Win32Thread: fffff900c2c08450 WAIT: (WrUserRequest) UserMode Non-Alertable
            fffffa801253e4b0  SynchronizationEvent
        Not impersonating
        DeviceMap                 fffff8800cb88ca0
        Owning Process            fffffa801347a9c0       Image:         mobsync.exe
        Attached Process          N/A            Image:         N/A
        Wait Start TickCount      3233090        Ticks: 3690 (0:00:00:57.564)
        Context Switch Count      81                 LargeStack
        UserTime                  00:00:00.000
        KernelTime                00:00:00.015
        Win32 Start Address 0x00000000ff685d38
        Stack Init fffffa601008bdb0 Current fffffa601008b720
        Base fffffa601008c000 Limit fffffa6010083000 Call 0
        Priority 10 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
        Child-SP          RetAddr           Call Site
        fffffa60`1008b760 fffff800`0226728a nt!KiSwapContext+0x7f
        fffffa60`1008b8a0 fffff800`0226868a nt!KiSwapThread+0x2fa
        fffffa60`1008b910 fffff960`001bb817 nt!KeWaitForSingleObject+0x2da
        fffffa60`1008b9a0 fffff960`001bb8ae win32k!xxxRealSleepThread+0x25f
        fffffa60`1008ba40 fffff960`001bb1fa win32k!xxxSleepThread+0x56
        fffffa60`1008ba70 fffff960`001bb4a9 win32k!xxxRealInternalGetMessage+0x72e
        fffffa60`1008bb50 fffff960`001bca15 win32k!xxxInternalGetMessage+0x35
        fffffa60`1008bb90 fffff800`022610f3 win32k!NtUserGetMessage+0x79
        fffffa60`1008bc20 00000000`772ed09a nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffffa60`1008bc20)
        00000000`0029f7a8 00000000`00000000 USER32!ZwUserGetMessage+0xa

A hang can be caused by a shared lock where different processes try to acquire it. This common deadlock scenario can be check with the !locks command to examine locks to which more than one process wants to get access:

7: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks..

Resource @ 0xfffffa800f952218    Exclusively owned
     Threads: fffffa800e585040-01<*>
KD: Scanning for held locks...

Another very useful command is !stacks to show from all processes the last stack frame where the are standing:

7: kd> !stacks


    [fffffa8012d8b7d0 csrss.exe]
2e0.0003e0  fffffa801307b7a0 ffce9c7b Blocked    cdd!PresentWorkerThread+0x479
2e0.0003ec  fffffa80130903d0 ffce9ccd Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.0003fc  fffffa8012ce8ad0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00046c  fffffa80131619c0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.000f78  fffffa800f052060 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00133c  fffffa800f1d92e0 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.0011a0  fffffa800f1d6700 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00101c  fffffa800f141800 ffce9c7c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.000dbc  fffffa800f1ffbb0 ffce9c8e Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00038c  fffffa801fd14060 ffce9c7c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.001408  fffffa80207b0360 ffce9ccd Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.00140c  fffffa8012efa060 ffce9c8c Blocked    nt!AlpcpReceiveMessagePort+0x287
2e0.001678  fffffa800f1017c0 ffce9c60 Blocked    win32k!xxxRealSleepThread+0x25f

A more thorough list has been created by Dmitry Vostokov at his famous Crash Dump Analysis web site which gives a good overview about the most used Windbg commands. To dig deeper you will need to buy the Windows Internals book by Mark Russinovich and understand how the Windows kernel and drivers do work and visit the NT Debugging blog where Microsoft escalation engineers show some advanced kernel debugging techniques.

If you have read until here you (should) have lost fear of the dreaded blue screen. Its not the end but the beginning of an interesting debugging session. It is a pity that so few people are able to analyze kernel dumps even at the most basic level. In many cases it is possible to find out which device driver is the guilty one. You then have the option to remove the faulting driver entirely or try to get an updated one. At least you know who is to blame and most of the time it is not the OS.

One last note: If you transfer the dump to another machine you not only need the dump file but also the exact same executable binaries on the analyzing machine to load the correct pdbs. You need to set them up under File – Image File Path in Windbg to successfully analyze dumps.

posted on Sunday, October 4, 2009 4:37 PM

Feedback

# re: Why Does My System Hang? Windows Kernel Debugging For Dummies 10/8/2009 5:25 PM Joshka
Great post. A sidenote though, windows 7 does not have an option to create a complete dump (only small dump and kernel dump). There is a registry setting that may be set [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\CrashDumpEnabled=1] to enable the complete dump. Is this needed or will a kernel dump generally provide enough information to diagnose issues with locked up systems?

# re: Why Does My System Hang? Windows Kernel Debugging For Dummies 10/9/2009 1:48 AM Alois Kraus
Hi Joshka,

I guess you have more than 2 GB RAM. You need to set the page file size big enough to enable a full dump. More infos can be found in the link I already did add to my article:
http://support.microsoft.com/?scid=kb%3Ben-us%3B969028&x=17&y=10

Yours,
Alois Kraus


# re: Why Does My System Hang? Windows Kernel Debugging For Dummies 10/11/2009 6:50 PM Joshka
Thanks. I must have missed that sentence. I'm running win 7 with 6GB.

# re: Why Does My System Hang? Windows Kernel Debugging For Dummies 1/12/2013 5:26 PM me me
explain english

Post A Comment
Title:
Name:
Email:
Comment:
Verification: