Skip to content

Identify DLLs from the stage#23

Open
johnwparent wants to merge 11 commits intospack:mainfrom
johnwparent:inject-dll-identifiers
Open

Identify DLLs from the stage#23
johnwparent wants to merge 11 commits intospack:mainfrom
johnwparent:inject-dll-identifiers

Conversation

@johnwparent
Copy link
Copy Markdown
Collaborator

@johnwparent johnwparent commented Nov 20, 2025

Relocating DLL and their corresponding libs from the stage currently has a pitfall, where post installation, the paths inside import libraries to their corresponding DLLs is no longer valid, as it points to the DLL as it existed at link time, which is the stage. We cannot presume where the DLL will end up in the install tree vs the stage, and we need valid "rpaths" at test time, so we are forced to marshall libs and their dlls post install.

Since we cannot 1:1 map a location in an install tree to a location in the stage, we need some other mechanism for aligning these two binaries. Previously we simply used the symbols exported by each, but since import libraries and DLLs can export different sets of symbols, despite the fact they are a import lib/dll pair, we cannot use that approach.

Instead, inject the stage path inside the dll via a resource file, so when we go to do relocation post install, we can simply query the stage dll path from the import library, search for a dll with a corresponding entry in the string table, and know we've identified a pair.

This PR:

Creates a resource descriptor file (res) and injects the current path to the dll (stage time) into the string table with the identifier "59673" (randomly selected number, it just needs to be any consistent identifier) it into the first pass link line that creates the initial dll.
The identifier needs to be a numeric, and this is a random one I chose.

Based on #32

@johnwparent johnwparent force-pushed the inject-dll-identifiers branch from 0f18e5d to 10f0031 Compare November 24, 2025 23:40
Copy link
Copy Markdown
Member

@scheibelp scheibelp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few preliminary questions/requests

std::string const name_obj = this->objs_.front();
// std::string const filename =
// strip(strip(split(name_obj, "\\").back(), ".lib"), ".obj");
// this->output_ = join({GetCWD(), filename}, "\\") + ext;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(question) is this something you want to keep around?

src/ld.cxx Outdated
// This string table ID is completely arbitrary, HOWEVER
// Spack Core relies on this specific value
// If it is changed here, it must be changed in Spack Core
const std::string string_table_id = " 59673 ";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(question) is there a possibility of this ID clashing with some other value?

Is it possible to define an arbitrary type to insert (and then there would be no clash as long as no one else defined something of the same type)?

src/ld.cxx Outdated
rc_executor.Execute();
DWORD const err_code = rc_executor.Join();
if (err_code != 0) {
return std::string();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(question) is it not catastrophic if this fails? Should it raise an exception?

src/ld.cxx Outdated
std::string const rc_file = createRC(link_run.get_out());
// Add produced RC file to linker CLI to inject ID
// This needs to be at the end
this->lib_args.push_back(rc_file);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rc_file can be empty: should it be added to this list if it is?

src/ld.cxx Outdated
return ret_code;
}

std::string LdInvocation::createRC(const std::string& pe_stage_name) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring would be useful like "given a pe file, place an resource script alongside it to insert a stringtable entry into the pe that..."

Adds support for

* Path canonicalization
* Making paths absolute
* Add path padding character to padding signature
* Stripping quotes
* Backslash escaping
* Corrects regex "match" bug
* Adds error handling to SFN name processing
* Adds file creation to sfn processing so sfns can be generated for "non existent" files
* Adds method to validate a path is the proper length
* Updates path padding to leverage the above api additions
* Corrects a missing return statement
* Adds new error types for current and future use

Signed-off-by: John Parent <john.parent@kitware.com>
Signed-off-by: John Parent <john.parent@kitware.com>
Ensures child stderr pipes are handled gracefully

Corrects an issue where the error pipes were not copied on assign or copy or included in the process start info or closed.

Signed-off-by: John Parent <john.parent@kitware.com>
Some packages install themselves as readonly, or with other file permissions that conflict with what we're doing with this compiler wrapper. Thus, we need the ability to override (and restore) those settings.

For this compiler wrapper to perform it's duties, it needs to be able to read and write the various binaries artifacts from a build. We thus need to be able to:
1. Obtain and store current permissions
2. Grant ourselves the required permissions
3. Do our reading/writing
4. Restore previous permissions

On Windows, file access is handled by the read/write bit, as on other platforms, but also by Access Control Lists (ACLs) which are populated by Discretionary Access Control Lists (DACLs) which are the atomic units responsible for granting rights.

This PR adds an interface to inspect, store, update, and restore the read/write bits and DACLs of a given file. This functionality is wrapped up in a RAII based class that essentially provides scoped permissions to a file for scope of the permissions object.

Note: this assumes the user creating the permissions is the user driving this wrapper, or this user has admin rights, otherwise this will fail (gracefully, but still a failure)
Signed-off-by: John Parent <john.parent@kitware.com>
Previously we were searching a set of prefixes for DLLs during relocation. If we found a dll that matched the dll we were looking for (based on file name) we performed relocation based on that path. This is both dangerous and extraneous.

This is dangerous as mutli config layouts may have the same binary with the same name in mutliple different paths for different configs or variations. Since we were previously only checking the filename, this could lead to a false positive detection and bad relocation not detected until runtime.

This is extraneous as we should never need to search. We have the dll locations before and after relocation, whether from the stage to install prefix or from buildcache to buildcache, so rather than a filesystem search, we can have a linear time operation where we search through a list of relocation old->new prefix mappings.

Spack core will set an environment variable of the structure:
old_prefix|new_prefix and the compiler wrapper now composes a map out of that list and then PE files looking to relocate their internal dll references get a constant time lookup.

Signed-off-by: John Parent <john.parent@kitware.com>
    * Adds api calls for obtaining the PE names stored in coff files
      Supports both short and long names, or detecting both. Typically only one is defined in an import library, so whichever is returned first.

    * Adds access guards to coff reading

Signed-off-by: John Parent <john.parent@kitware.com>
We're no longer modifying binaries on BC push

Signed-off-by: John Parent <john.parent@kitware.com>
* Makes all PE paths absolute
* Removes rename logic that dealt with the spack sigil and BC pushes
* Uses new relocation logic to avoid FS search and instead use Spack env variable and util support added in prior PR
* Adds access rights scoping for read and write operations
* General code cleanup

Signed-off-by: John Parent <john.parent@kitware.com>
Reduce the degree to which we decompose an incoming command line in the toolchain. The odering and context of CLI arguments is highly fragile and important to the contents and naming of build artifacts, in particular w.r.t. maintaining user expectations and logic regarding name reasoning.

Instead toolchain is now a thin wrapper to pass the toolchain commands through the compiler wrapper itself, preserving inputs and order and injecting Spack libs, includes, etc safely.

We leave command line parsing to specialized tools designed for the linker/compiler/etc.

Signed-off-by: John Parent <john.parent@kitware.com>
* Stop disabiguating between obj, lib, and lo files, they're treated the same by the linker and we don't need extra logic for each, and we need to be able to process them in order.
* Carefully parses the command line and .def file to determine the intended output name and ensures the internal modeling of that name in the compiler wrapper is consistent
* Adds behavior to allow for extremely long command lines (Paraviews are 75kb) by composing arguments into an RSP file
* Adds logic to expand the contents of an RSP file so we can inspect the full command line to better reason about naming and eventual length
* Small bug fixes of QOL improvements

Signed-off-by: John Parent <john.parent@kitware.com>
…has a pitfall, where post installation, the paths inside import libraries to their corresponding DLLs is no longer valid, as it points to the DLL as it existed at link time, which is the stage. We cannot presume where the DLL will end up in the install tree vs the stage, and we need valid "rpaths" at test time, so we are forced to marshall libs and their dlls post install.

Since we cannot 1:1 map a location in an install tree to a location in the stage, we need some other mechanism for aligning these two binaries. Previously we simply used the symbols exported by each, but since import libraries and DLLs can export different sets of symbols, despite the fact they are a import lib/dll pair, we cannot use that approach.

Instead, inject the stage path inside the dll via a resource file, so when we go to do relocation post install, we can simply query the stage dll path from the import library, search for a dll with a corresponding entry in the string table, and know we've identified a pair.

This PR:

Creates a resource descriptor file (res) and injects the current path to the dll (stage time) into the string table with the identifier "59673" (randomly selected number, it just needs to be any consistent identifier) it into the first pass link line that creates the initial dll.
The identifier needs to be a numeric, and this is a random one I chose.

Signed-off-by: John Parent <john.parent@kitware.com>
@johnwparent johnwparent force-pushed the inject-dll-identifiers branch from 76ba236 to e2c35b1 Compare January 31, 2026 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants