-
Notifications
You must be signed in to change notification settings - Fork 600
Add handling for reformatted archives conditional on archive format version #5145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
letitz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round. I'm excited to see this land!
| from typing import List | ||
| from typing import Optional | ||
| from typing import Union | ||
| from typing import BinaryIO, Callable, List, Optional, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might run afoul of the linter with this change.
| """The manifest may not exist for earlier versions of archives. In this | ||
| case, default to schema version 0. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A docstring here surprises me. This should be a comment.
|
|
||
| def __init__(self, reader: archive.ArchiveReader): | ||
| super().__init__(reader) | ||
| manifest_path = os.path.join(self.root_dir(), 'clusterfuzz_manifest.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to handle archives where clusterfuzz_manifest.json is within a nested directory. We can declare that for simplicity, manifests must appear at the root of the zip.
| manifest_path = os.path.join(self.root_dir(), 'clusterfuzz_manifest.json') | |
| manifest_path = 'clusterfuzz_manifest.json' |
| """ | ||
| if self.file_exists(manifest_path): | ||
| archive_schema_json = json.loads(self.open(manifest_path).read().decode()) | ||
| self._archive_schema_version = archive_schema_json.get('version', 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log a warning when a manifest is present but the version field is missing - that's an unexpected case.
| case, default to schema version 0. | ||
| """ | ||
| if self.file_exists(manifest_path): | ||
| archive_schema_json = json.loads(self.open(manifest_path).read().decode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would just call this manifest.
| """For newer archive versions, runtime_deps that were formerly stored | ||
| under {self.root_dir()}/src_root/ are now stored in the root directory, | ||
| while the build artifacts formerly stored in the root directory are now | ||
| stored under {self.root_dir()}/out/msan/. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment re: docstrings vs comments.
| """For newer archive versions, runtime_deps that were formerly stored | ||
| under {self.root_dir()}/src_root/ are now stored in the root directory, | ||
| while the build artifacts formerly stored in the root directory are now | ||
| stored under {self.root_dir()}/out/msan/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact path under out depends on the build.
| elif path.startswith('./'): | ||
| path = path.replace('./', 'out/msan/') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should instead rework this logic to interpret the deps path relative to a .runtime_deps file, which is really its intended meaning.
| def get_target_dependencies( | ||
| self, fuzz_target: str) -> List[archive.ArchiveMemberInfo]: | ||
| target_path = self.to_archive_path(fuzz_target) | ||
| target_path = self.to_archive_path(fuzz_target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using to_archive_path for both this fuzz target name and its dependencies' paths (from .runtime_deps) below, but the behavior we want is really quite different in each case:
- for fuzz target names (these are not always paths), we really want to use
get_path_for_target()(which is as yet conspicuously unused) - for dependency paths, we want to do the
../..conditional remapping, relative totarget_path/deps_file
All that to say, I think we want to do:
target_path = self.get_path_for_target(fuzz_target)And rename to_archive_path() to get_dependency_path().
| self, fuzz_target: str) -> List[archive.ArchiveMemberInfo]: | ||
| target_path = self.to_archive_path(fuzz_target) | ||
| target_path = self.to_archive_path(fuzz_target, | ||
| self._archive_schema_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a bug with the current code, if I'm reasoning through it correctly: the fuzz target name passed as an argument to this method comes from list_fuzz_targets(). That method 1 uses normalize_target_name() 2, which keeps only the base name of the fuzz target path to derive a fuzz target name: foo/bar/baz_fuzzer.exe -> baz_fuzzer.
This is compensated for by to_archive_path() which prepends root_dir() to the fuzz target name again to derive the full path. It only works because all fuzz targets are assumed to belong to the root dir. With our new archive structure, the root dir will be . yet the fuzz targets will live under out/$BUILD_DIR. So when we try to derive the path to the fuzzer here, we will fail to find a fuzzer executable and its runtime deps file.
We can fix this by using get_path_for_target() instead, which will do the right thing and recall the original fuzz target path.
Now, this brings me to an additional improvement I would love to see land, and which is unlocked by manifests. If we directly list the fuzz targets in the manifest:
{
archive_schema_version: 1,
fuzz_targets: [
'out/msan/foo_fuzzer',
'out/msan/bar_fuzzer',
]
}Then we can avoid guessing which files in the archive are fuzz targets using heuristics (see list_fuzz_targets() again - it looks at filenames and even sometimes inspects the symbols defined in the binaries - this has long irked me). Better yet, it would fix crbug.com/445765265.
The manifest provides the perfect place to do this. At construction time we can read the list from JSON and populate self._fuzz_targets directly, or extract some common functionality:
def set_fuzz_target_paths(paths: Sequence[str]):
self._fuzz_targets = {
fuzzer_utils.normalize_target_name(path): path
for path in paths
}Then list_fuzz_targets()'s lazy initialization logic will work in our favor, and we will never fall back to heuristics.
It would likely make sense to land this as a separate CL - I initially thought we would have to do this to fix this CL, but we don't.
Opening on behalf of @mverde to start reviewing early.