I recently watched a webcast on windows crash dumps presented by Mark Russinovich.  Not only is he a great presenter, but his sessions are always jam-packed with useful information.  If you can’t find the webcast, go get Windows Internals 4th Edition and peruse Chapter 10.  You should also check out the help file in WindDbg which is uncharacteristcally good.  Finally, there are special knowldgebase articles at microsoft.com/ddk/debugging.

There were a lot of great tips in this webcast, but here are some of my favorites:

- First, you can manually crashdump a system by setting the crashonctrlscroll key in the registry… but if that doesn’t work for you you can forcefully crash a system by using the NMI button installed on some servers.  If you don’t have a server with an NMI button you can create one with an PCI interface.  See this link for the instructions: http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx

- Something I always forget to do: if you are looking at a crash dump from a multiprocessor system make sure you check all CPUs by using the ~ command to change which CPU you’re looking at.  For example: ~1 to change to the 2nd processor, etc.

- Speaking of not forgetting… don’t forget to set your symbols folder to go here: srv*c:\symbols*http://msdl.microsoft.com/download/symbols

- Use the !thread command to see the drivers that are loaded – and check the dates of each driver.  If you see one that seems old, go find a new one and replace it.

- Use LMKV command on the thread to get detailed info on the thread

- Use !analyze -v to get the automated analysis.  If the “probably caused by” doesn’t seem right, it could be buffer corruption that happened some time long before the error appeared.  Most likely it’s not an nt!xxxx processor a native Microsoft OS file that is causing the crash. 

- Use !locks to look for deadlocks

- Use a live dump to analyze your system without a crash.  Get LiveKD from sysinternals and do a .dump to look at the live system (or dump /f to generate a full dump). 

- Remote debugger (F8 during boot) can be useful but don’t use it with default setting over serial port at 19200… either change it to 115200 or USB2 (Vista) or Firewire (Windows 2003).  This loads the kernel debugger at boot time and does not affect performance.  If a remote system is set up to do a remote debug session and it BSODs it will wait until you connect to it with the remote debugger before it does anything else. 

- Use Driver Verifier to check 3rd party and unsigned drivers (enables ‘special pool’).  You won’t find it on the Start menu – go find verifier.exe (use create custom settings option) and then select individual settings from a full list.  Select everything except ‘low resource simulation’.  Next select ‘select drivers from list’ and select ones you think are suspicious.  If that doesn’t yield good results in getting good crash dumps, go back and change the settings to “all 3rd party and unsigned drivers.  If that doesn’t yield good crash dumps you might have to go through several iterations of selecting drivers in groups of 10-20 at a time. 

- Don’t forget to run the Windows Memory Diagnostic tool

© 2010 LANalyze Suffusion WordPress theme by Sayontan Sinha