Skip to content

Conversation

@letitz
Copy link
Collaborator

@letitz letitz commented Jan 27, 2026

Opening on behalf of @mverde to start reviewing early.

Copy link
Collaborator Author

@letitz letitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round. I'm excited to see this land!

from typing import List
from typing import Optional
from typing import Union
from typing import BinaryIO, Callable, List, Optional, Union
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might run afoul of the linter with this change.

Comment on lines +228 to +230
"""The manifest may not exist for earlier versions of archives. In this
case, default to schema version 0.
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A docstring here surprises me. This should be a comment.


def __init__(self, reader: archive.ArchiveReader):
super().__init__(reader)
manifest_path = os.path.join(self.root_dir(), 'clusterfuzz_manifest.json')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to handle archives where clusterfuzz_manifest.json is within a nested directory. We can declare that for simplicity, manifests must appear at the root of the zip.

Suggested change
manifest_path = os.path.join(self.root_dir(), 'clusterfuzz_manifest.json')
manifest_path = 'clusterfuzz_manifest.json'

"""
if self.file_exists(manifest_path):
archive_schema_json = json.loads(self.open(manifest_path).read().decode())
self._archive_schema_version = archive_schema_json.get('version', 0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log a warning when a manifest is present but the version field is missing - that's an unexpected case.

case, default to schema version 0.
"""
if self.file_exists(manifest_path):
archive_schema_json = json.loads(self.open(manifest_path).read().decode())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would just call this manifest.

Comment on lines +259 to +263
"""For newer archive versions, runtime_deps that were formerly stored
under {self.root_dir()}/src_root/ are now stored in the root directory,
while the build artifacts formerly stored in the root directory are now
stored under {self.root_dir()}/out/msan/.
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re: docstrings vs comments.

"""For newer archive versions, runtime_deps that were formerly stored
under {self.root_dir()}/src_root/ are now stored in the root directory,
while the build artifacts formerly stored in the root directory are now
stored under {self.root_dir()}/out/msan/.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact path under out depends on the build.

Comment on lines +265 to +266
elif path.startswith('./'):
path = path.replace('./', 'out/msan/')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should instead rework this logic to interpret the deps path relative to a .runtime_deps file, which is really its intended meaning.

def get_target_dependencies(
self, fuzz_target: str) -> List[archive.ArchiveMemberInfo]:
target_path = self.to_archive_path(fuzz_target)
target_path = self.to_archive_path(fuzz_target,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using to_archive_path for both this fuzz target name and its dependencies' paths (from .runtime_deps) below, but the behavior we want is really quite different in each case:

  • for fuzz target names (these are not always paths), we really want to use get_path_for_target() (which is as yet conspicuously unused)
  • for dependency paths, we want to do the ../.. conditional remapping, relative to target_path / deps_file

All that to say, I think we want to do:

target_path = self.get_path_for_target(fuzz_target)

And rename to_archive_path() to get_dependency_path().

self, fuzz_target: str) -> List[archive.ArchiveMemberInfo]:
target_path = self.to_archive_path(fuzz_target)
target_path = self.to_archive_path(fuzz_target,
self._archive_schema_version)
Copy link
Collaborator Author

@letitz letitz Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a bug with the current code, if I'm reasoning through it correctly: the fuzz target name passed as an argument to this method comes from list_fuzz_targets(). That method 1 uses normalize_target_name() 2, which keeps only the base name of the fuzz target path to derive a fuzz target name: foo/bar/baz_fuzzer.exe -> baz_fuzzer.

This is compensated for by to_archive_path() which prepends root_dir() to the fuzz target name again to derive the full path. It only works because all fuzz targets are assumed to belong to the root dir. With our new archive structure, the root dir will be . yet the fuzz targets will live under out/$BUILD_DIR. So when we try to derive the path to the fuzzer here, we will fail to find a fuzzer executable and its runtime deps file.

We can fix this by using get_path_for_target() instead, which will do the right thing and recall the original fuzz target path.

Now, this brings me to an additional improvement I would love to see land, and which is unlocked by manifests. If we directly list the fuzz targets in the manifest:

{
  archive_schema_version: 1,
  fuzz_targets: [
    'out/msan/foo_fuzzer',
    'out/msan/bar_fuzzer',
  ]
}

Then we can avoid guessing which files in the archive are fuzz targets using heuristics (see list_fuzz_targets() again - it looks at filenames and even sometimes inspects the symbols defined in the binaries - this has long irked me). Better yet, it would fix crbug.com/445765265.

The manifest provides the perfect place to do this. At construction time we can read the list from JSON and populate self._fuzz_targets directly, or extract some common functionality:

def set_fuzz_target_paths(paths: Sequence[str]):
  self._fuzz_targets = {
    fuzzer_utils.normalize_target_name(path): path
    for path in paths
  }

Then list_fuzz_targets()'s lazy initialization logic will work in our favor, and we will never fall back to heuristics.

It would likely make sense to land this as a separate CL - I initially thought we would have to do this to fix this CL, but we don't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants