Skip to content

ld is unable to find libraries in the host_injections #225

@casparvl

Description

@casparvl

So, lets be clear: this is not necessarily a bug, as the runtime linker (ld.so) picks up those libs just fine, and we might consider it intentional that ld doesn't. But... it does raise some issues with patching ctypes - see below.

ld is unable to find libraries in the host_injections

$ ld -t -lcuda
ld: cannot find -lcuda: No such file or directory
$ ld -t -lz
/cvmfs/software.eessi.io/versions/2023.06/compat/linux/x86_64/usr/lib64/libz.so
ld: warning: cannot find entry symbol _start; not setting start address

While ld.so finds it just fine:

$ ld.so --list $(which osu_bw) | grep cuda
        libcudart.so.12 => /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc80/software/CUDA/12.4.0/lib64/libcudart.so.12 (0x0000149d7e800000)
        libcuda.so.1 => /cvmfs/software.eessi.io/host_injections/2023.06/compat/linux/x86_64/lib/libcuda.so.1 (0x0000149d78f85000)

The reason is that we inject the host_injections into GLIBC directly, see here and here.

This becomes an issue when trying to patch ctypes for EESSI, as is implemented here. The issue is that ctypes normally only searches runtime paths, and doesn't handle RPATH-ed things. We've patched that in the linked PR, but the one thing we are not able to do is locate libraries in the host_injections using ctypes.util.find_libraries. The reason is that ctypes relies on ld (among other things) to find libraries using ld -t -L <some_path_from_LD_LIBRARY_PATH> -l<lib_to_find>. A solution on that side could be to query the runtime linker, but that requires a binary which is compiled against that library. Of course, we could make ctypes run gcc to compile & link a small test binary against <lib_to_find>. Then, we can invoke ld.so --list <binary> and grep for the <lib_to_find> and how it is resolved. Of course, this is a lot heavier, since it requires some compilation to be done every time find_libraries is called.

An alternative might be to put host_injections dirs into ld.so.conf. However, the problem there is that the runtime linker (ld.so) only uses that indirectly: it relies on the ld.so.cache which is created using the dirs in ld.so.conf. Or at least, that's what the docs suggest (see point 4):

ld.so(8)                                                                                                                        System Manager's Manual                                                                                                                       ld.so(8)

NAME
       ld.so, ld-linux.so - dynamic linker/loader

...

       If a shared object dependency does not contain a slash, then it is searched for in the following order:

       (1)  Using the directories specified in the DT_RPATH dynamic section attribute of the binary if present and DT_RUNPATH attribute does not exist.  Use of DT_RPATH is deprecated.

       (2)  Using the environment variable LD_LIBRARY_PATH, unless the executable is being run in secure-execution mode (see below), in which case this variable is ignored.

       (3)  Using the directories specified in the DT_RUNPATH dynamic section attribute of the binary if present.  Such directories are searched only to find those objects required by DT_NEEDED (direct dependencies) entries and do not apply to  those  objects'  children,  which
            must themselves have their own DT_RUNPATH entries.  This is unlike DT_RPATH, which is applied to searches for all children in the dependency tree.

       (4)  From  the cache file /etc/ld.so.cache, which contains a compiled list of candidate shared objects previously found in the augmented library path.  If, however, the binary was linked with the -z nodeflib linker option, shared objects in the default paths are skipped.
            Shared objects installed in hardware capability directories (see below) are preferred to other shared objects.

       (5)  In the default path /lib, and then /usr/lib.  (On some 64-bit architectures, the default paths for 64-bit shared objects are /lib64, and then /usr/lib64.)  If the binary was linked with the -z nodeflib linker option, this step is skipped.

I am wondering if point 5 really only searches those libs, or wether /etc/ld.so.conf defines what it considers to be default paths. It could be worth to see if this also works.

For ld, the docs seem to suggest that dirs from /etc/ld.so.conf are also searched at runtime (i.e. not through the cache), see point 8:

           The linker uses the following search paths to locate required shared libraries:

           1.  Any directories specified by -rpath-link options.

           2.  Any directories specified by -rpath options.  The difference between -rpath and -rpath-link is that directories specified by -rpath options are included in the executable and used at runtime, whereas the -rpath-link option is only effective at link time.
               Searching -rpath in this way is only supported by native linkers and cross linkers which have been configured with the --with-sysroot option.

           3.  On an ELF system, for native linkers, if the -rpath and -rpath-link options were not used, search the contents of the environment variable "LD_RUN_PATH".

           4.  On SunOS, if the -rpath option was not used, search any directories specified using -L options.

           5.  For a native linker, search the contents of the environment variable "LD_LIBRARY_PATH".

           6.  For a native ELF linker, the directories in "DT_RUNPATH" or "DT_RPATH" of a shared library are searched for shared libraries needed by it. The "DT_RPATH" entries are ignored if "DT_RUNPATH" entries exist.

           7.  The default directories, normally /lib and /usr/lib.

           8.  For a linker for a Linux system, if the file /etc/ld.so.conf exists, the list of directories found in that file.  Note: the path to this file is prefixed with the "sysroot" value, if that is defined, and then any "prefix" string if the linker was configured with
               the --prefix=<path> option.

           9.  For a native linker on a FreeBSD system, any directories specified by the "_PATH_ELF_HINTS" macro defined in the elf-hints.h header file.

           10. Any directories specifed by a "SEARCH_DIR" command in the linker script being used.

           If the required shared library is not found, the linker will issue a warning and continue with the link.

We could investigate this again for the next compat layer version. One option could also be to specify both. One thing to also consider is that this does mean that there's a risk that if users build new stuff (e.g. through EESSI-extend), that they link against libs from host_injections as well. Normally, stuff from the LIBRARY_PATH comes first, so it should pick that up first. But if one is trying to link against something that is not on the LIBRARY_PATH (e.g. because the right dependent module isn't loaded), and if that is in host_injections, then it will pick that up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions