사용자 도구

사이트 도구


kb:exceptionhandlingtips

차이

문서의 선택한 두 판 사이의 차이를 보여줍니다.

차이 보기로 링크

kb:exceptionhandlingtips [2014/11/10 16:21] (현재)
줄 1: 줄 1:
 +{{INLINETOC}}
 +\\
 +
 +====== Exception Handling Tips ======
 +예외 처리 관련된 내용들
 +
 +====== 처리되지 않은 예외가 어디서 발생했는지 찾아내기 ======
 +from [[http://​blogs.msdn.com/​jmstall/​archive/​2005/​01/​18/​355697.aspx | Finding where unmanaged exceptions came from]]
 +
 +Sometimes you're looking at the callstack that's in a handler after an exception was thrown. This is very common if you attached at an unhandled exception that popped up a watson dialog.
 +
 +It might look this like:
 +<​code>​
 +kernel32!WaitForSingleObject+0xf
 +devenv!DwCreateProcess+0xbb
 +devenv!fExceptionHandling+0x1cb
 +devenv!DwExceptionFilter+0x8b
 +0x535ef48
 +</​code>​
 +
 +That's not exactly useful. What you really want is to see the callstack at the time the exception was thrown. There'​s a trick for doing this on x86. (This could be adjusted to work on 64-bit platforms.) It works in both live debugging and minidumps, and it even works if you don't have any symbols. I'll first give the quick steps for how to do this, and then I'll explain why it works.
 +
 +**How do I find it?**
 +
 +These instructions use WinDbg, but any native debugger with memory-search ('​s'​) and set-context ('​.cxr'​) commands can do this too.
 +    ​
 +1. Go to the thread of interest.
 +
 +2. Search for the first address on the stack containing the dword 0x1003f. ​ In Windbg, type “s -d esp L1000 1003f”. Exceptions effectively push 0x1003f onto the stack, so this will effectively look for the context of the exception on the thread'​s current stack.
 +
 +3. That should show at least one result that looks like: \\
 +<​code>​
 +0535ef48 ​ 0001003f 00000000 00000000 00000000 ​ ?​...............
 +</​code>​
 +The first number (0535ef48) in each row is the address, and the rest of the row is the contexts at that address. It turns out that 0x1003f if the first dword of the [[http://​msdn.microsoft.com/​library/​default.asp?​url=/​library/​en-us/​debug/​base/​context_str.asp | CONTEXT]] structure of the exception, and so 0535ef48 will be the address of the context. If you get multiple entries, use the first row, since that would correspond to the most recent exception.
 +
 +4.Set the current context to point at the first number in the result from step #3 (that'​s 0535ef48 in this case). In windbg, type ".cxr 0535ef48"​.
 +
 +Your callstack and registers should now look more sane. In my example, it looks like:
 +<​code>​
 +mscordbi!CordbHashTable::​GetBase
 +mscordbi!CordbThread::​RefreshStack+0x349
 +mscordbi!CordbProcess::​DispatchRCEvent+0x1291
 +mscordbi!CordbRCEventThread::​ThreadProc+0x9
 +</​code>​
 +<​code>​
 +eax=00000000 ebx=04700168 ecx=00000048 edx=0535f230 esi=00000000 edi=0535f254
 +eip=636ac786 esp=0535f214 ebp=0535f228 iopl=0 ​        nv up ei pl zr na po nc
 +cs=001b ​ ss=0023 ​ ds=0023 ​ es=0023 ​ fs=003b ​ gs=0000 ​            ​efl=00210246
 +mscordbi!CordbHashTable::​GetBase:​
 +636ac786 80791400 ​        ​cmp ​    byte ptr [ecx+0x14],​0x0 ds:​0023:​0000005c=??​
 +</​code>​
 +
 +My program was dereferencing 0x5c (=0x48+0x14). No wonder it crashed.
 +
 +Note that we didn't need any symbols for any of the modules on the original stack to find this. Even if we didn't have any symbol at all, we'd still have the disassembly of the code that crashed.
 +
 +**Why it works?**
 +
 +There'​s some key data here.
 +
 +  - When the OS throws an exception, it pushes the [[http://​msdn.microsoft.com/​library/​default.asp?​url=/​library/​en-us/​debug/​base/​context_str.asp | CONTEXT]] of the original throw site on the stack. (This is part of the [[http://​msdn.microsoft.com/​library/​default.asp?​url=/​library/​en-us/​debug/​base/​exception_pointers_str.asp | EXCEPTION_POINTERS]]).
 +  - On x86, The first field in the [[http://​msdn.microsoft.com/​library/​default.asp?​url=/​library/​en-us/​debug/​base/​context_str.asp | CONTEXT]] structure is a flags field which is always set to 0x1003f. It is also very unlikely for this value to randomly appear for some other reason at the top of your stack. ​
 +  - On x86, stacks grow down. Thus the current stack pointer (esp) represents the high end of the range in which the 0x1003f will appear. Thus if we're searching for something that's close to the top of the stack (within 0x1000 bytes), we can search in the range (esp, esp-0x1000).
 +  - Windbg'​s "s -d" command searches memory for dwords. The format is "s ?d <​range>​ <​value>"​. <​range>​ can be of the form "<​address>​ L<​length>",​ which will search for '​value'​ in the range (address, address-length). So "s -d esp L1000 1003f" means "​search for the dword 0x1003f in the range (esp, esp-0x1000). ​ 0x1000 is an arbitrary number here that seem sufficient.
 +  - A debugger can do a stackwalk from any context. Most debugger'​s just automatically use a thread'​s current context (via [[http://​msdn.microsoft.com/​library/​default.asp?​url=/​library/​en-us/​debug/​base/​getthreadcontext.asp |  kernel32!GetThreadContext]]),​ but there'​s no reason that  a debugger couldn'​t take an arbitrary context. Windbg provides a great command, "​.cxr",​ which lets you do just that. You can set the context that you want to inspect at. (VS is adding this command too). 
 +
 +So if you look back over the original steps, you can see that step #2 searches the thread'​s stack for a context pushed by the exception, and then step #4 tells the debugger to view that current context.
 +  ​
 +If step #3 gives you multiple rows, that may indicate a case of nested exceptions. Or it may be that random case that you have a local variable "int i= 0x1003f"​. In either case, you can try .cxr on all the values (starting with the most recent) to find the callstack that makes sense.
 +
 +Just for kicks, inspect the CONTEXT pointer we supplied to .cxr, and you can see for yourself it matches the output to the register command:
 +  ​
 +<​code>​
 +0:015> dt _CONTEXT 0535ef48
 +   ​+0x000 ContextFlags ​    : 0x1003f
 +   ​+0x004 Dr0              : 0
 +   ​+0x008 Dr1              : 0
 +   ​+0x00c Dr2              : 0
 +   ​+0x010 Dr3              : 0
 +   ​+0x014 Dr6              : 0
 +   ​+0x018 Dr7              : 0
 +   ​+0x01c FloatSave ​       : _FLOATING_SAVE_AREA
 +   ​+0x08c SegGs            : 0
 +   ​+0x090 SegFs            : 0x3b
 +   ​+0x094 SegEs            : 0x23
 +   ​+0x098 SegDs            : 0x23
 +   ​+0x09c Edi              : 0x535f254
 +   ​+0x0a0 Esi              : 0
 +   ​+0x0a4 Ebx              : 0x4700168
 +   ​+0x0a8 Edx              : 0x535f230
 +   ​+0x0ac Ecx              : 0x48
 +   ​+0x0b0 Eax              : 0
 +   ​+0x0b4 Ebp              : 0x535f228
 +   ​+0x0b8 Eip              : 0x636ac786
 +   ​+0x0bc SegCs            : 0x1b
 +   ​+0x0c0 EFlags ​          : 0x210246
 +   ​+0x0c4 Esp              : 0x535f214
 +   ​+0x0c8 SegSs            : 0x23
 +   ​+0x0cc ExtendedRegisters : [512]  "???"​
 +</​code>​
 +
 +<​code>​
 +0:015> r
 +Last set context:
 +eax=00000000 ebx=04700168 ecx=00000048 edx=0535f230 esi=00000000 edi=0535f254
 +eip=636ac786 esp=0535f214 ebp=0535f228 iopl=0 ​        nv up ei pl zr na po nc
 +cs=001b ​ ss=0023 ​ ds=0023 ​ es=0023 ​ fs=003b ​ gs=0000 ​            ​efl=00210246
 +</​code>​
 +
 +This is mostly for native exceptions; but would work with managed exceptions + SOS too. 
 +
 +Empirically,​ I've been using this technique for a long time now and it's worked perfectly every single time. And I've never found a stray local with value=0x1003f.
 +
 +
 +====== 예외를 그냥 무시하지 말고, 어떤 예외가 발생했는지 보고하는 기능을 만들자 ======
 +from [[http://​blogs.msdn.com/​greggm/​archive/​2005/​01/​24/​359692.aspx | Report exceptions, don't ignore them!]]
 +
 +Here is a great example of code that you should never write:
 +<code cpp>
 +__try
 +{
 +    RunCode();
 +}
 +__except (EXCEPTION_EXECUTE_HANDLER)
 +{
 +};
 +</​code>​
 +
 +Or, here is the only somewhat less evil managed equivalent:
 +
 +<code cpp>
 +try
 +{
 +    RunCode();
 +}
 +catch
 +{
 +}
 +</​code>​
 +
 +I know why programmers do it:
 +
 +  * '​RunCode'​ isn't really that important. I don't care if it fails.
 +  * I need the system to be really reliable
 +  * I don't want the user to have data loss
 +  * This code has always been this way. I don't want to change it now.
 +  ​
 +These are all really nice sounding reasons. However, there are a couple of problems with this logic:
 +
 +  - It is impossible to write code that can take an exception at an arbitrary point. Even for managed code, cleanup is a hard problem. After an exception has been ignored, there is usually at least some variable that is in the wrong state. Locks will still be taken, variables won't be cleared, persistent data won't be deleted, memory will be leaked. ​
 +  - The user has no idea that an exception happened. Given #1, when an exception happens, the user is probably going to run into problems. However, since the exception has been ignored, when they do run into problems, they will have no idea why. 
 +  - The product will not improve. These days, Microsoft is big on gathering data from customers on how to improve our products. However, this code just ignored the exception, so this product will never get better. ​
 +  - Bugs cost more to fix. If the code had crashed or reported the exception, the developer would have an easier time identifying the problem. As it is, the tester is going to have to find a good repro, and the developer is going to need to run the repro under the debugger. You better hope that this isn't some timing bug that goes away when the product is run under the debugger. ​
 +  - The product is less secure. Why is the code crashing? Maybe it is crashing because someone malicious has found a buffer overrun. By continuing to execute after the overrun, the likelihood that this overrun can be exploited only increases. ​
 +  - The native code does not properly deal with stack overflow. After a stack overflow occurs, you need to call _resetstkoflw() to reset the page protection attributes on the finial stack page. Failure to do so will mean that if your code overflows a second time, the process will just vanish.
 +
 +**Better Alternatives**
 +
 +Okay, so hopefully by now I have convinced you that just ignoring all exceptions is the wrong thing to do. So, what should you be doing? You need to come up with a system for reporting unexpected exceptions. For client applications,​ this means notifying the user. For server applications,​ this means notifying the administrator. What should be included in this notification?​ Anything that _you_ will find useful. There should be details that are not for the user encountering the problem but rather the support person or developer that is called upon to solve it.
 +
 +A few more suggestions:​
 +
 +  * Call SetUnhandledExceptionFilter from your native application. ​
 +  * Save a stack trace in your managed application. ​
 +  * If you are going to continue past a stack overflow, call _resetstkoflw(). ​
 +  * Test your '​unexpected exception'​ code. This code is easy to break since you don't normally see it run. You should test it by injecting a fault and making sure that the fault is properly reported. I learned this leason the hard way. 
 +  * If you have so evil code that is catching all exceptions that you cannot change, consider using a vectored exception handler, or report exceptions from a try/catch or <​nowiki>​__try/​__except</​nowiki>​.
 +
 +In a future blog, I will create some sample code for reporting exceptions.
 +
 +In conclusion, plan for imperfect code. Everyone has bugs. By reporting them instead of ignoring them, you make it easier to find and fix these problems at all stages in the product'​s life cycle. ​
 +
 +----
 +  * see also [[ExceptionHandling]]
  
kb/exceptionhandlingtips.txt · 마지막으로 수정됨: 2014/11/10 16:21 (바깥 편집)