Summary
!PrintException -lines intermittently fails to resolve the [file @ N] source-line bracket for individual frames in an exception's StackTrace. The failure is deterministic for any given dump but appears flaky in CI because it depends on whether the affected method is currently on a live thread's stack and on the JIT's native-code offset for the exception throw point.
Sample repro below shows the bug fires with high reliability when the throwing method is unwound off the live stack and is large enough that its return IP lies in a different NibbleMap byte than its entry IP. This affects all CoreCLR runtimes when consuming heap-type (not heap2) minidumps.
Found after investigating CI failure on SOSExceptionTests.TaskNestedException(config: projectk.sdk.prebuilt.9.0.14) (Windows x86 Release leg, internal pipeline).
Reproducer
Debuggee (HeavyExn.csproj, net9.0/win-x86/Release)
using System;
using System.Runtime.CompilerServices;
internal static class Program
{
[MethodImpl(MethodImplOptions.NoInlining)]
static long HeavyMethod(long start)
{
long x = start;
// 400 lines of trivial arithmetic, expanding to ~30 KB of x86 codegen
x = x * 31 + 0; if (x == long.MinValue) Environment.Exit(99);
x = x * 31 + 1; if (x == long.MinValue) Environment.Exit(99);
// ...
x = x * 31 + 399; if (x == long.MinValue) Environment.Exit(99);
throw new InvalidOperationException("thrown from end of HeavyMethod");
}
[MethodImpl(MethodImplOptions.NoInlining)]
static Exception Capture(long seed)
{
try { HeavyMethod(seed); return null; }
catch (Exception e) { return e; }
}
static void Main(string[] args)
{
long seed = args.Length > 0 ? long.Parse(args[0]) : 0;
Exception inner = Capture(seed);
throw new AggregateException("outer holds the inner", inner);
}
}
Capture script
$env:DOTNET_DbgEnableMiniDump = '1'
$env:DOTNET_DbgMiniDumpType = '2' # Heap (MiniDumpWithPrivateReadWriteMemory)
$env:DOTNET_DbgMiniDumpName = "C:\dumps\heavy.dmp"
.\HeavyExn.exe
Investigation
0:000> !PrintException -lines <innerExceptionAddr>
StackTrace (generated):
SP IP Function
02F7E580 08D8A4FB HeavyExn!HeavyExn.Program.HeavyMethod(Int64)+0x76e3 ← NO bracket
02F7F218 08D82DEE HeavyExn!HeavyExn.Program.Capture(Int64)+0x26
[C:\...\Program.cs @ 418] ← bracket OK
Repro rate (Windows x86 Release)
Ran HeavyExn 30 times, capturing a heap dump and running !PrintException -lines on each via cdb + locally-built SOS:
| Runtime |
DOTNET_EnableFastHeapDumps |
HeavyMethod bracket present |
Failure rate |
| net9.0.17 |
unset |
0 / 30 |
100% |
| net9.0.17 |
1 (env var ignored on 9.0) |
0 / 30 |
100% |
| net8.0.28 |
unset |
0 / 30 |
100% |
| net8.0.28 |
1 |
30 / 30 |
0% |
The bug is deterministic for the same target binary. CI flakiness comes from method-size proximity to the NibbleMap-granularity boundary — TaskNestedException's RandomUserTask.InnerException() at +0x54 happens to straddle the boundary, while a larger method like HeavyMethod at +0x76e3 fails every time.
Root cause
On Windows, createdump spawns and calls MiniDumpWriteDump(..., NULL, NULL, NULL). The OS dbgcore.dll recognizes loaded coreclr.dll and invokes the CLR DAC's ICLRDataEnumMemoryRegions::EnumMemoryRegions to collect runtime-aware memory regions. On net9 with heap dump type, this goes through ClrDataAccess::EnumMemoryRegions → EnumMemoryRegionsWorkerHeap(CLRDATA_ENUM_MEM_HEAP) (enummem.cpp).
EnumMemoryRegionsWorkerHeap then calls EnumMemDumpAllThreadsStack(flags), which for each managed thread:
- Walks the current stack and for live frames calls
EECodeInfo(actualReturnIP) — this touches the NibbleMap byte for the return-IP and pulls it into the dump.
- Walks each exception's
_stackTrace via DumpManagedExcepObject (source). For each StackTraceElement, it does:
for (size_t i = 0; i < stackTrace.Size(); i++)
{
MethodDesc* pMD = stackTrace[i].pFunc;
if (!DacHasMethodDescBeenEnumerated(pMD) && DacValidateMD(pMD))
{
pMD->EnumMemoryRegions(flags);
FindLoadedMethodRefOrDef(pMD->GetMethodTable()->GetModule(), pMD->GetMemberDef());
DebugInfoManager::EnumMemoryRegionsForMethodDebugInfo(flags, pMD);
PCODE addr = pMD->GetNativeCode(); // ← METHOD ENTRY, not stackTrace[i].ip
if (addr != (PCODE)NULL)
{
EECodeInfo codeInfo(addr); // ← touches NibbleMap for ENTRY only
if (codeInfo.IsValid())
{
IJitManager::MethodRegionInfo methodRegionInfo = { 0 };
codeInfo.GetMethodRegionInfo(&methodRegionInfo);
}
}
}
DacEnumCodeForStackwalk(stackTrace[i].ip); // small window around IP
}
The EECodeInfo is constructed from pMD->GetNativeCode() (the method entry), not from stackTrace[i].ip (the actual return IP). EECodeInfo::Init walks the RangeSection and reads the NibbleMap byte covering the entry. For methods larger than NIBBLE_GRANULARITY (~16 bytes per byte / 256 bytes per cache line on x86), the byte for entry and the byte for entry+0x54 (or entry+0x76e3) live in different addresses — only the entry-IP byte is pulled in.
DacEnumCodeForStackwalk(stackTrace[i].ip) does not touch the NibbleMap for the IP; it grabs a small window of executable bytes.
Why live frames are unaffected
EnumMemWalkStackHelper calls EECodeInfo(addr) where addr = GetControlPC(®Disp) — the actual return address. The NibbleMap byte for that exact return IP gets captured.
Why small methods are unaffected
If method size ≤ NibbleMap granularity, the entry NibbleMap byte covers all return IPs within the method too.
What !PrintException -lines does
Per-frame, it calls GetLineByOffset(ste.ip) → ConvertNativeToIlOffset(ip) → GetClrMethodInstance(ip) → DAC IXCLRDataProcess::StartEnumMethodInstancesByAddress(ip) → ExecutionManager::GetCodeMethodDesc(ip) → FindCodeRange(ip) → EEJitManager::FindMethodCode which reads the NibbleMap byte for ip. If that byte isn't in the dump, this returns FALSE → GetClrMethodInstance fails → no IL-offset mapping → no [file @ N] bracket.
Confirming the mechanism with cdb
0:000> dd 0907E910 L8 # RealCodeHeader for InnerException
0907e910 0907e938 00000000 0907e920 090828d4
0907e920 ???????? ???????? ???????? ???????? ← GCInfo blob, NOT captured
0:000> dd 08A3AB1C L1 # NibbleMap byte for return IP 0x09041A14
08a3ab1c ???????? ← NibbleMap entry, NOT captured
0:000> !IP2MD 090419C0 # method entry
MethodDesc: 090828d4
Source file: ...RandomUserTask.cs @ 37 ← entry-NibbleMap-cell IS captured
0:000> !IP2MD 09041A14 # return IP
Failed to request MethodData, not in JIT code range
The DAC has only the method entry's NibbleMap byte — not the one for the return IP.
Fix options
A. SOS-side fallback (preferred)
Thread an optional MethodDesc hint through ConvertNativeToIlOffset. When GetClrMethodInstance(ip) (the IP→RangeSection lookup) fails and the caller supplied a hint, derive an IXCLRDataMethodInstance from the MD instead:
HRESULT
ConvertNativeToIlOffset(ULONG64 nativeOffset, BOOL bAdjust,
ULONG64 methodDescHint, // NEW
IXCLRDataModule** ppModule,
mdMethodDef* methodToken, PULONG32 methodOffs)
{
ToRelease<IXCLRDataMethodInstance> pMethodInst(NULL);
HRESULT Status = GetClrMethodInstance(nativeOffset, &pMethodInst);
if (Status != S_OK && methodDescHint != 0)
{
// Bypass the RangeSection walk via MD → Module → MethodDef → EnumInstance
Status = GetClrMethodInstanceFromMethodDesc(methodDescHint, &pMethodInst);
}
if (FAILED(Status)) return Status;
// ... rest unchanged: GetILOffsetsByAddress / GetTokenAndScope / etc.
}
FormatException passes (ULONG64)ste.pFunc as the hint.
Pros:
- ~30 LOC in SOS only, no DAC/runtime changes
- Works on every existing dump in the wild (no runtime/createdump update needed)
- Backward compatible (existing callers default the param to 0)
- Preserves IP-keyed path as the primary (it's more precise for tiered methods when memory is available)
Cons:
- Resolves
@ 37 instead of @ 38 in the failing TaskNestedException case (off by one due to sequence-point boundary differences when computing IL offset on the MD-derived instance — needs investigation, or accept it and loosen the test regex)
- Only applies when caller has a MD (i.e. exception stack-trace path); doesn't help generic
GetLineByOffset(ip) callers
Validated locally: 100% → 0% failure on HeavyExn repro; CI dump now resolves [file @ 37].
B. Runtime-side: extra EECodeInfo enumeration in DAC for stack-trace IPs
Modify the DAC's DumpManagedExcepObject loop to also call EECodeInfo(stackTrace[i].ip) (using the actual return IP, not the entry). This touches the correct NibbleMap byte and pulls it into the dump.
for (size_t i = 0; i < stackTrace.Size(); i++)
{
// ... existing ...
if (pMD->GetNativeCode() != (PCODE)NULL)
{
EECodeInfo entryCodeInfo(pMD->GetNativeCode()); // existing
if (entryCodeInfo.IsValid()) entryCodeInfo.GetMethodRegionInfo(...);
// NEW: touch the NibbleMap byte for the actual return IP
EECodeInfo ipCodeInfo(stackTrace[i].ip);
if (ipCodeInfo.IsValid()) ipCodeInfo.GetMethodRegionInfo(...);
}
DacEnumCodeForStackwalk(stackTrace[i].ip);
}
Pros:
- Fixes the root cause for all consumers, not just
!PrintException -lines
- Trivial code change
Cons:
- Runtime fix → requires backport to release/9.0 (likely won't happen) and won't help dumps already produced
- Doesn't help dumps from older runtimes
C. Test/runtime opt-in: use HEAP2 via DOTNET_EnableFastHeapDumps
Set DOTNET_EnableFastHeapDumps=1 on the debuggee for net8 / net10 test configurations. EEJitManager::EnumMemoryRegions under CLRDATA_ENUM_MEM_HEAP2 dumps every code heap's entire NibbleMap (heap->pHdrMap) wholesale, eliminating the gap.
- DOTNET_DbgEnableMiniDump=1, DOTNET_DbgMiniDumpType=2
+ DOTNET_DbgEnableMiniDump=1, DOTNET_DbgMiniDumpType=2, DOTNET_EnableFastHeapDumps=1
Pros:
- Validated: 0% failure on net8 with this env var
- No SOS or DAC change
Cons:
- Doesn't fix net9 —
g_EnableFastHeapDumps global was added in 8.0 and 10.0 but never backported to 9.0 (net9 enummem.cpp has no reference to it)
- Doesn't fix dumps already produced by users in the field
- Adds environment-dependent test behavior
- Doesn't help users debugging their own production dumps
Summary
!PrintException -linesintermittently fails to resolve the[file @ N]source-line bracket for individual frames in an exception'sStackTrace. The failure is deterministic for any given dump but appears flaky in CI because it depends on whether the affected method is currently on a live thread's stack and on the JIT's native-code offset for the exception throw point.Sample repro below shows the bug fires with high reliability when the throwing method is unwound off the live stack and is large enough that its return IP lies in a different NibbleMap byte than its entry IP. This affects all CoreCLR runtimes when consuming heap-type (not heap2) minidumps.
Found after investigating CI failure on
SOSExceptionTests.TaskNestedException(config: projectk.sdk.prebuilt.9.0.14)(Windows x86 Release leg, internal pipeline).Reproducer
Debuggee (
HeavyExn.csproj, net9.0/win-x86/Release)Capture script
Investigation
Repro rate (Windows x86 Release)
Ran
HeavyExn30 times, capturing a heap dump and running!PrintException -lineson each via cdb + locally-built SOS:DOTNET_EnableFastHeapDumps1(env var ignored on 9.0)1The bug is deterministic for the same target binary. CI flakiness comes from method-size proximity to the NibbleMap-granularity boundary —
TaskNestedException'sRandomUserTask.InnerException()at+0x54happens to straddle the boundary, while a larger method likeHeavyMethodat+0x76e3fails every time.Root cause
On Windows,
createdumpspawns and callsMiniDumpWriteDump(..., NULL, NULL, NULL). The OSdbgcore.dllrecognizes loadedcoreclr.dlland invokes the CLR DAC'sICLRDataEnumMemoryRegions::EnumMemoryRegionsto collect runtime-aware memory regions. On net9 with heap dump type, this goes throughClrDataAccess::EnumMemoryRegions→EnumMemoryRegionsWorkerHeap(CLRDATA_ENUM_MEM_HEAP)(enummem.cpp).EnumMemoryRegionsWorkerHeapthen callsEnumMemDumpAllThreadsStack(flags), which for each managed thread:EECodeInfo(actualReturnIP)— this touches the NibbleMap byte for the return-IP and pulls it into the dump._stackTraceviaDumpManagedExcepObject(source). For eachStackTraceElement, it does:The
EECodeInfois constructed frompMD->GetNativeCode()(the method entry), not fromstackTrace[i].ip(the actual return IP).EECodeInfo::Initwalks the RangeSection and reads the NibbleMap byte covering the entry. For methods larger thanNIBBLE_GRANULARITY(~16 bytes per byte / 256 bytes per cache line on x86), the byte forentryand the byte forentry+0x54(orentry+0x76e3) live in different addresses — only the entry-IP byte is pulled in.DacEnumCodeForStackwalk(stackTrace[i].ip)does not touch the NibbleMap for the IP; it grabs a small window of executable bytes.Why live frames are unaffected
EnumMemWalkStackHelpercallsEECodeInfo(addr)whereaddr = GetControlPC(®Disp)— the actual return address. The NibbleMap byte for that exact return IP gets captured.Why small methods are unaffected
If method size ≤ NibbleMap granularity, the entry NibbleMap byte covers all return IPs within the method too.
What
!PrintException -linesdoesPer-frame, it calls
GetLineByOffset(ste.ip)→ConvertNativeToIlOffset(ip)→GetClrMethodInstance(ip)→ DACIXCLRDataProcess::StartEnumMethodInstancesByAddress(ip)→ExecutionManager::GetCodeMethodDesc(ip)→FindCodeRange(ip)→EEJitManager::FindMethodCodewhich reads the NibbleMap byte forip. If that byte isn't in the dump, this returnsFALSE→GetClrMethodInstancefails → no IL-offset mapping → no[file @ N]bracket.Confirming the mechanism with cdb
The DAC has only the method entry's NibbleMap byte — not the one for the return IP.
Fix options
A. SOS-side fallback (preferred)
Thread an optional
MethodDeschint throughConvertNativeToIlOffset. WhenGetClrMethodInstance(ip)(the IP→RangeSection lookup) fails and the caller supplied a hint, derive anIXCLRDataMethodInstancefrom the MD instead:FormatExceptionpasses(ULONG64)ste.pFuncas the hint.Pros:
Cons:
@ 37instead of@ 38in the failing TaskNestedException case (off by one due to sequence-point boundary differences when computing IL offset on the MD-derived instance — needs investigation, or accept it and loosen the test regex)GetLineByOffset(ip)callersValidated locally: 100% → 0% failure on HeavyExn repro; CI dump now resolves
[file @ 37].B. Runtime-side: extra
EECodeInfoenumeration in DAC for stack-trace IPsModify the DAC's
DumpManagedExcepObjectloop to also callEECodeInfo(stackTrace[i].ip)(using the actual return IP, not the entry). This touches the correct NibbleMap byte and pulls it into the dump.Pros:
!PrintException -linesCons:
C. Test/runtime opt-in: use HEAP2 via
DOTNET_EnableFastHeapDumpsSet
DOTNET_EnableFastHeapDumps=1on the debuggee for net8 / net10 test configurations.EEJitManager::EnumMemoryRegionsunderCLRDATA_ENUM_MEM_HEAP2dumps every code heap's entire NibbleMap (heap->pHdrMap) wholesale, eliminating the gap.Pros:
Cons:
g_EnableFastHeapDumpsglobal was added in 8.0 and 10.0 but never backported to 9.0 (net9 enummem.cpp has no reference to it)