Thursday 25 April 2024

Issues Resolving Symbols on Windows 11 on ARM64

This is a short blog post about an issue I encountered during some development work on my OleViewDotNet tool and how I resolved it. It might help others if they come across a similar problem, although I'm not sure if I took the best approach.

OleViewDotNet has the ability to parse the internal COM structures in a process and show important information such as the list of current IPIDs exported by the process and the access security descriptor. 

PS C:\> $p = Get-ComProcess -ProcessId $pid
PS C:\> $p.Ipids
IPID                                 Interface Name  PID    Process Name
----                                 --------------  ---    ------------
00008800-4bd8-0000-c3f9-170a9f197e11 IRundown        19416  powershell.exe
00009401-4bd8-ffff-45b0-a43d5764a731 IRundown        19416  powershell.exe
0000a002-4bd8-5264-7f87-e6cbe82784aa IRundown        19416  powershell.exe

To achieve this task we need access to the symbols of the COMBASE DLL so that we can resolve various root pointers to hash tables and other runtime artifacts. The majority of the code to parse the process information is in the COMProcessParser class, which uses the DBGHELP library to resolve symbols to an address. My code also supports a mechanism to cache the resolved pointers into a text file which can be subsequently used on other systems with the same COMBASE DLL rather than needing to pull down a 30+ MiB symbol file.

This works fine on Windows 11 x64, but I noticed that I would get incorrect results on ARM64. In the past I've encountered similar issues that have been down to changes in the internal structures used during parsing. Microsoft provides private symbols for COMBASE so its pretty easy to check if the structures were different between x64 and ARM64 versions of Windows 11. They were no differences that I could see. In any case, I noticed this also impacted trivial values, for example the symbol gSecDesc contains a pointer to the COM access security descriptor. However, when reading that pointer it was always NULL even though it should have been initialized.

To add the my confusion when I checked the symbol in WinDBG it showed the pointer was correctly initialized. However, if I did a search for the expected symbol using the x command in WinDBG I found something interesting:

0:010> x combase!gSecDesc
00007ffa`d0aecb08 combase!gSecDesc = 0x00000000`00000000
00007ffa`d0aed1c8 combase!gSecDesc = 0x00000180`59fdb750

We can see from the output that there's two symbols for gSecDesc, not one. The first one has a NULL value while the second has the initialized value. When I checked what address my symbol resolver was returning it was the first one, where as WinDBG knew better and would return the second. What on earth is going on?

This is an artifact of a new feature in Windows 11 on ARM64 to simplify the emulation of x64 executables, ARM64X. This is a clever (or terrible) trick to avoid needing separate ARM64 and x64 binaries on the system. Instead both ARM64 and x64 compatible code, referred to as ARM64EC (Emulation Compatible), are merged into a single system binary. Presumably in some cases this means that global data structures need to be duplicated, once for the ARM64 code, and once for the ARM64EC code. In this case it doesn't seem like there should be two separate global data values as a pointer is a pointer, but I suppose there might be edge cases where that isn't true and it's simpler to just duplicate the values to avoid conflicts. The details are pretty interesting and there's a few places where this has been reverse engineered, I'd at least recommend this blog post.

My code is using the SymFromName API to query the symbol address, and this would just return the first symbol it finds which in this case was the ARM64EC one which wasn't initialized in an ARM64 process. I don't know if this is a bug in DBGHELP, perhaps it should try and return the symbol which matches the binary's machine type, or perhaps I'm holding it wrong. Regardless, I needed a way of getting the correct symbol, but after going through the DBGHELP library there was no obvious way of disambiguating the two. However, clearly WinDBG can do it, so there must be a way.

After a bit of hunting around I found that the Debug Interface Access (DIA) library has an IDiaSymbol::get_machineType method which returns the machine type for the symbol, either ARM64 (0xAA64) or ARM64EC (0xA641). Unfortunately I'd intentionally used DBGHELP as it's installed by default on Windows where as DIA needs to be installed separately. There didn't seem to be an equivalent in the DBGHELP library. 

Fortunately after poking around the DBGHELP library looking for a solution an opportunity presented itself. Internally in DBGHELP (at least recent versions) it uses a private copy of the DIA library. That in itself wouldn't be that helpful, except the library exports a couple of private APIs that allow a caller to query the current DIA state. For example, there's the SymGetDiaSession API which returns an instance of the IDiaSession interface. From that interface you can query for an instance of the IDiaSymbol interface and then query the machine type. I'm not sure how compatible the version of DIA inside DBGHELP is relative to the publicly released version, but it's compatible enough for my purposes.

Update 2024/04/26: it was pointed out to me that the machine type is present in the SYMBOL_INFO::Reserved[1] field so you don't need to do this whole approach with the DIA interface. The point still stands that you need to enumerate the symbols on ARM64 platforms as there could be multiple ones and you still need to check the machine type.

To resolve this issue the code in OleViewDotNet takes the following steps on ARM64 systems:

  1. Instead of calling SymFromName the code enumerate all symbols for a name.
  2. The SymGetDiaSession is called to get an instance of the IDiaSession interface.
  3. The IDiaSession::findSymbolByVA method is called to get an instance the IDiaSymbol interface for the symbol.
  4. The IDiaSymbol::get_machineType method is called to get the machine type for the symbol.
  5. The symbol is filtered based on the context, e.g. if parsing an ARM64 process it uses the ARM64 symbol.
This is much more complicated that I think it needs to be, but I've yet to find an alternative approach. Ideally the SYMBOL_INFO structure in DBGHELP should contain a machine type field, but I guess it's hard to change the interface now. The relatively simple code to do the machine type query is here. If anyone has found a better way of doing it with just the public interface to DBGHELP I'd appreciate the information :)