Skip to content

🍒 [lldb] SymbolLocatorSymStore 2/2 (http lookup)#12911

Open
weliveindetail wants to merge 30 commits intoswiftlang:swift/release/6.4.xfrom
weliveindetail:lldb-symbollocator-symstore-cherry-picks
Open

🍒 [lldb] SymbolLocatorSymStore 2/2 (http lookup)#12911
weliveindetail wants to merge 30 commits intoswiftlang:swift/release/6.4.xfrom
weliveindetail:lldb-symbollocator-symstore-cherry-picks

Conversation

@weliveindetail
Copy link
Copy Markdown
Member

Upstream LLDB gained support for fetching debug info from SymStore with a new symbol locator implementation. This PR aims to bring the feature to Swift ahead of the next rebranch.

There are 3 kinds of cherry-picks here:

  1. SymbolLocatorSymStore logic and tests
  2. WinHTTP support, tests and refactors around Debuginfod in LLVM
  3. Fixes for the NativePDB plugin and tests in LLDB

Commits from category 3 are independent from the others and were picked first. We need them, because the SymbolLocatorSymStore implementation required a change in the SymbolFilePDB/NativePDB plugins: They are loaded through the SymbolVendorPECOFF now and not fall back to the generic SymbolVendor anymore. This was a father simple change upstream, because it had switched the default to SymbolFileNativePDB already and test inconsistencies had been fixed. This change wasn't cherry-picked, so SymbolFilePDB is the default still and required these compatibility patches.

Commits from categories 1 and 2 are interleaved, because they are partially overlapping. Order matches upstream. LLVM test for WinHTTP use the existing Debuginfod infrastructure that was tailored for CURL. Applying the patches went pretty smooth. I saw conflicts due to permanent differences downstream that I could resolve cleanly, but nothing critical. I skipped two commits that are loosely related as they'd have blown up the patch:

cdbe28887bc9 [LLDB] Set and verify paths of properties from tablegen (#179524)
fb36a54ef6bc [lldb] Rename formatv verbose log call, misc log cleanups [NFC] (#186951)

I back-ported the 1-line tablegen diff from cdbe288 in my code. And eventually I didn't need fb36a54, because the symbol locator switched to LLDB_LOG anyway.

DavidSpickett and others added 30 commits May 7, 2026 11:05
…#156306)

This uses split DWARF and from looking at other tests, it should not be
running on Darwin or Windows.

It does pass using the DIA PDB plugin but I think this is misleading
because it's not actually testing the intended feature.

When the native PDB plugin is used it fails because it cannot set a
breakpoint. I don't see a point to running this test on Windows at all.

Native PDB plugin test failures are being tracked in llvm#114906.
link.exe discards DWARF information. Other linkers on non-Windows do
not.

Uses the same solution as TestFrameFunctionInlined.test.

This test was failing with the native PDB plugin but shouldn't have been
using PDB anyway (see llvm#114906). Passes with DWARF and lld.
…lvm#153160)

Relands llvm#152295.

Checking for the overloads did not account for them being out of order.
For example, [the failed
output](llvm#152295 (comment))
contained the overloads, but out of order. The last commit here fixes
that by using `-DAG`.

---------

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
…153382)

Some functions don't have the `FunctionType` set. We can't look these up
and won't be able to call them, so ignore them when caching the function
names.

This does fix the failures caused by
llvm#153160 mentioned in
llvm#153160 (comment).
However, in `lldb-shell::msstl_smoke.cpp` there's another failure not
introduced by llvm#153160 (fixed with llvm#153386).
This PR implements `SymbolFileNativePDB::AddSymbols` which adds public
symbols to the symbol table.

These symbols are found in the publics stream. It contains mangled names
coupled with addresses. Addresses are a pair of (segment, offset).
If I understood correctly, then the segment is the section ID from the
COFF header. Sections are already
[constructed](https://github.com/llvm/llvm-project/blob/c48ec7fb60b5e0b4100731d75f82ea63c0ec7b45/lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp#L1048)
using this 1-based index ([MS
docs](https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers)).
This allows us to use `section_list->FindSectionByID`.
In llvm#165604, a test was skipped on Windows, because the native PDB plugin
didn't set sizes on symbols. While the test isn't compiled with debug
info, it's linked with `-gdwarf`, causing a PDB to be created on
Windows. This PDB will only contain the public symbols (written by the
linker) and section information. The symbols themselves don't have a
size, however the DIA SDK sets a size for them.
It seems like, for these data symbols, the size given from DIA is the
distance to the next symbol (or the section end).

This PR implements the naive approach for the native plugin. The main
difference is in function/code symbols. There, DIA searches for a
corresponding `S_GPROC32` which have a "code size" that is sometimes
slightly smaller than the difference to the next symbol.
When a PDB is loaded through `target symbols add <pdb-path>`, its
`m_objectfile_sp` is an `ObjectFilePDB` instead of `ObjectFilePECOFF`
(the debugged module). In both the native and DIA plugin, some paths
assumed that `m_objectfile_sp` is the debugged module. With this PR,
they go through `m_objfile_sp->GetModule()->GetObjectFile()`.

For the DIA plugin, this lead to an assertion failure
(llvm#169628 (comment))
and for both plugins, it meant that the symbol table wasn't loaded.
…vm#185658)

Minimal infrastructure for a the SymbolLocator plugin that fetches debug
info from Microsoft SymStore repositories. This can work cross-platform
and for various debug info formats in principle, but the current plan is
focussed on PE/COFF on Windows with debug info in PDB files. Once we
have a stable first version, we'd like to add features like download,
environment variables, caching and progress feedback for users.

SymbolVendorPECOFF was tailored towards DWARF debug info so far. I added
code to load the PDB path from the executable (it only checked
gnu_debuglink so far) and not bail out if DWARF sections are missing, so
that in the PDB case we still call AddSymbolFileRepresentation() in the
very end of CreateInstance().

The API test in this patch mocks the directory layout from SymStore, so
it doesn't depend on SymStore.exe from the Windows SDK. It runs on all
platforms that link debug info in a PDB file, which is still just
Windows, but it could be cross-platform in principle.

-----

Relands with minor fixes: API tests create mocked SymStore in the test's
build directory. One log instruction was moved. One more object
access goes through module in SymbolFile.
…ib (NFC)" (llvm#186074)

Relocate HTTPClient and HTTPServer from the Debuginfod library to
llvm/Support/HTTP so they can be reused by other components.

---------

Relanding with fixes in CMakeLists.txt to account for dependency to new LLVMSupportHTTP in tools.
Relanding with one more fix in libSupportHTTP that adds it as a component in libLLVM.so

---------

Co-authored-by: Alexandre Ganea <aganea@havenstudios.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Referencing libSupportHTTP under LINK_LIBS of add_lldb_library() pulls
in the archive even in a build configuration with
LLVM_LINK_LLVM_DYLIB=On, where libSupportHTTP is part of libLLVM. This
patch moves it to LINK_COMPONENTS to fix the issue.
Deleting the executable at the end of this API test-case fails with a
permission error, likely because lldb still holds a reference to the
EXE. Exit explicitly to avoid that.
…lvm#187072)

Deleting the inferior binary after an API test-case causes issues on one
of the Windows bots. The previous the fix attempt in ca15db1
didn't succeed. We have to use isolated subfolders for each test-case.
This is achieved easily by disabling SHARED_BUILD_TESTCASE.
…187227)

Deleting anything in the build directory of a test-case is causing an
issue on one of the Windows bots. After the previous attempts in
ca15db1 and
fdd2437 didn't help, we now skip the
file cleanup altogether.
The initial version of SymbolLocatorSymStore supported servers only on
local paths. This patch extends it to HTTP/HTTPS end-points. For that to
work on Windows, we add a WinHTTP-based HTTP client backend in LLVM next
to the existing CURL-based implementation.

We don't add a HTTP server implementation, because there is no use right
now. Test coverage for the LLVM part is built on llvm-debuginfod-find
and works server-less, since it checks textual output of request
headers. The existing CURL-based implementation uses the same approach.
The LLDB API test for the specific SymbolLocatorSymStore feature spawns
a HTTP server from Python.

To keep the size of this patch within reasonable limits, the initial
implementation of the SymbolLocatorSymStore feature is dump: There is no
caching, no verification of downloaded files and no protection against
file corruptions. We use a local implementation of LLVM's
HTTPResponseHandler, but should think about extracting and reusing
Debuginfod's StreamedHTTPResponseHandler in the future.

The goal of this patch is a basic working implementation that is
testable in isolation. We will iterate on it to improve it further. This
should be fine since downloading from SymStores is not default-enabled
yet. Users have to set their server URLs explicitly.

---------

Co-authored-by: Alexandre Ganea <aganea@havenstudios.com>
…case (llvm#187705)

Listening on all interfaces is probably not permitted on the bots and
causes failures of llvm-debuginfod-find/headers-winhttp.test after
39d6bb2. Restricting them to localhost
should fix that.
…lvm#187727)

a3db68a seemed t be the obvious fix for
the winhttp issue from 39d6bb2 in
llvm-debuginfod-find, but there are still bots failing. This patch
disables the test on all bots that cannot spawn an HTTP server in Python
and record request headers. Ideally it turns all affected bots back to
green and gives us an error message to investigate.
…ots temporarily (llvm#187753)

Windows bots are still failing after a3db68a and
d7dbba5. This test is new, let's take it off while
we investigate.
The cache path we get from the `llvm::sys::path::cache_directory()`
library function is not writeable on some Windows bots as they point us
to the Default user's AppData: C:\Users\Default\AppData\Local\lldb.

This patch avoids the problem and downloads into the temporary directory
that we query from `llvm::sys::path::system_temp_directory()`. Once we
introduce real caching, this would have changed anyway.
…g on plain HTTP (llvm#188970)

This patch only adds the condition, so the flags are applied only for
HTTPS URLs. No change in implementation.
Since libSupportHTTP is part of the LLVM dylib, we must link it as a
component now. Fixes llvm#189978
…rSymStore (llvm#187687)

SymbolLocatorSymStore used a simple local implementation of
HTTPResponseHandler so far. That was fine for basic usage, but it would
cause issues down the line. This patch hoists the
StreamedHTTPResponseHandler class from libDebuginfod to SupportHTTP and
integrates it in SymbolLocatorSymStore. PDB file downloads will now be
buffered on disk, which is necessary since they can be huge.

We use the opportunity an stop logging 404 responses (file not found on
server) and print warnings for all other erroneous HTTP responses. It
was more complicated before, because the old response handler created
the underlying file in any case. The new one does that only once the
first content package comes in.
The mechanism to disable `SHARED_BUILD_TESTCASE` for tests
that set `TEST_WITH_PDB_DEBUG_INFO` doesn't work. The property 
was set on the wrong object. This patch fixes it and moves the assignment
after the for-loop, since the respective dict only exists there.
We explicitly link against the winhttp library by declaring it in CMake,
in llvm/lib/Support/HTTP/CMakeLists.txt. We therefore don't need to
specify linking against it through a pragma (which embeds a directive in
the object file).

This fixes warnings when building in mingw mode, where this kind of
pragma isn't supported:

llvm-project/llvm/lib/Support/HTTP/HTTPClient.cpp:149:9: warning:
unknown pragma ignored [-Wunknown-pragmas]
      149 | #pragma comment(lib, "winhttp.lib")
          |         ^
The HTTP implementation depends on CURL and would preferably not be part
of the LLVM dylib. This was not possible as a nested library under
libSupport, because libSupport itself is part of the LLVM dylib. This
patch moves the HTTP code into a separate top-level library that is
independent from libSupport and excluded from the LLVM dylib.
…re (llvm#191782)

The _NT_SYMBOL_PATH environment variable is the idiomatic way to set a
system-wide lookup order of symbol servers and a local cache for
SymStore. It holds a semicolon-separated list of entries in the
following notations:
* srv*[<cache>*]<source> sets a source and an optional explicit cache
* cache*<cache> sets an implicit cache for all subsequent entries
* all other entries are bare local directories

Since symbol paths are closely intertwined with the caching of symbol
files, this patch proposes support in LLDB for both features at once.
ParseEnvSymbolPaths() implements the parsing logic, which processes
entries of the symbol path string from left to right to create a series
of LookupEntry objects that each store a source and a cache location.
The source of a LookupEntry can be a local directory or an HTTP server
address. The cache is a local directory or empty. This representation
unifies the implicit vs. explicit caching options from the SymStore
protocol.

The lookup remains in LocateSymStoreEntry() which we now invoke for
each LookupEntry. Here we distinguish sources between HTTP servers and
local directories. The latter doesn't change. We just moved the logging
to clearly express the new cases. For HTTP downloads we now check the
cache first and add files to it after download. The download itself
keeps targeting a temporary file to avoid corrupt cache entries in case
of interruptions. After all, the SymStore protocol has no check-sums!

We define a default cache path that is used as a fallback, if the symbol
path doesn't specify any. It can be overridden through the
plugin.symbol-locator.symstore.cache property. This is analog to
Debuginfod and very handy since we want to move files out from temp
after download. The default cache path is what we use if there are no
other options.

The symbol path notation in the SymStore protocol carries quite some
legacy. The SymStoreTest unittest checks that we can parse all the
obscure combinations of possible server and cache entries.

---------

Co-authored-by: Nerixyz <nero.9@hotmail.de>
Using self-signed certificates is the only way forward for testing
security features on the HTTPS path. As we don't want to allow
any arbitrary certificate, we add a new property that pins a
fingerprint and any self-signed certificate is only accepted if it
matches this fingerprint.
@weliveindetail
Copy link
Copy Markdown
Member Author

@swift-ci please test

@weliveindetail
Copy link
Copy Markdown
Member Author

weliveindetail commented May 7, 2026

Test failure on macOS seems unrelated to this PR:

15:01:09  Timed Out Tests (1):
15:01:09    lldb-api :: macosx/thread_suspend/TestInternalThreadSuspension.py

@weliveindetail
Copy link
Copy Markdown
Member Author

The Windows bot reports 3 test failures:

16:50:53  Failed Tests (3):
16:50:53    lldb-api :: functionalities/breakpoint/swift_exception/TestExpressionErrorBreakpoint.py
16:50:53    lldb-api :: symstore/TestSymStore.py
16:50:53    lldb-shell :: SymbolFile/PDB/add-symbols.cpp

(1) Tests in TestExpressionErrorBreakpoint.py were enabled only recently. test_swift_error_matching_base_typename passed in dwarf_dwarfimporter mode, but failed in dwarf_clangimporter mode, because the breakpoint didn't hit here

self.assertEqual(len(breakpoint_threads), 1,
"We didn't stop at the error breakpoint")

(2) The SymbolFile/PDB/add-symbols.cpp originates from one of the cherry-picks. It passed for me locally, but it might need further upstream commits in this context?

error: can't find global variable 'a'

(3) Finally, the test for the SymStore feature itself failed: symstore/TestSymStore.py. The affected test cases are exactly the ones that download the PDB from localhost HTTP server. Seems to be due to environment restrictions:

17:27:10  AssertionError: "warning: failed to download from SymStor[102 chars]ls\n" != ''
17:27:10  - warning: failed to download from SymStore 'http://127.0.0.1:59755/a.pdb/1FB7CFE05E55BFCB4C4C44205044422E1/a.pdb': Failed to set secure protocols

The build against stable/21.x doesn't reproduce (1) while it has the test enabled as well.

17:27:34  Failed Tests (2):
17:27:34    lldb-api :: symstore/TestSymStore.py
17:27:34    lldb-shell :: SymbolFile/PDB/add-symbols.cpp

@charles-zablit
Copy link
Copy Markdown

17:27:10 AssertionError: "warning: failed to download from SymStor[102 chars]ls\n" != ''
17:27:10 - warning: failed to download from SymStore 'http://127.0.0.1:59755/a.pdb/1FB7CFE05E55BFCB4C4C44205044422E1/a.pdb': Failed to set secure protocols

@justice-adams-apple are the bots preventing running http requests to localhost?

@weliveindetail
Copy link
Copy Markdown
Member Author

I split off the non-HTTP parts into #12917

@weliveindetail weliveindetail changed the title 🍒 [lldb] SymbolLocatorSymStore cherry picks 🍒 [lldb] SymbolLocatorSymStore 2/2 (http lookup) May 7, 2026
@weliveindetail
Copy link
Copy Markdown
Member Author

@swift-ci please test macOS platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants