diff --git a/docs/how-tos/graph-commands/extracting-subsets.rst b/docs/how-tos/graph-commands/extracting-subsets.rst new file mode 100644 index 00000000..b00f85fb --- /dev/null +++ b/docs/how-tos/graph-commands/extracting-subsets.rst @@ -0,0 +1,123 @@ +Extracting Graph Subsets +======================== + +The ``fromager graph subset`` command extracts a focused subgraph containing only the dependencies and dependents of a specific package. This is useful for understanding the impact scope of a particular package, debugging specific dependency issues, or creating smaller, more manageable graphs for analysis. + +Basic Usage +----------- + +To extract a subset graph for a specific package: + +.. code-block:: bash + + fromager graph subset + +Example +------- + +Using the example graph file from the e2e test, let's extract a subset for the ``keyring`` package: + +.. code-block:: bash + + fromager graph subset e2e/build-parallel/graph.json keyring + +This command will output a JSON graph containing: + +- The ``keyring`` package itself +- All packages that depend on ``keyring`` (dependents) +- All packages that ``keyring`` depends on (dependencies) +- The ROOT node if ``keyring`` is a top-level dependency + +The resulting subset will include packages like: + +- ``keyring==25.6.0`` (the target package) +- ``imapautofiler==1.14.0`` (depends on keyring) +- ``jaraco-classes==3.4.0`` (keyring dependency) +- ``jaraco-context==6.0.1`` (keyring dependency) +- ``jaraco-functools==4.1.0`` (keyring dependency) +- And their transitive dependencies + +Version Filtering +----------------- + +You can limit the subset to a specific version of the target package using the ``--version`` flag: + +.. code-block:: bash + + fromager graph subset e2e/build-parallel/graph.json setuptools --version 80.8.0 + +This is particularly useful when dealing with packages that have multiple versions in the graph, allowing you to focus on the relationships of a specific version. + +File Output +----------- + +Save the subset graph to a file instead of printing to stdout: + +.. code-block:: bash + + fromager graph subset e2e/build-parallel/graph.json jinja2 -o jinja2-subset.json + +The output file will be in the same JSON format as the original graph file and can be used as input to other ``fromager graph`` commands. + +Use Cases +--------- + +**Debugging Dependency Issues** + When a specific package is causing build problems, extract its subset to focus on just the relevant dependencies without the noise of the full graph. + +**Impact Analysis** + Before upgrading or removing a package, understand what other packages would be affected by examining its dependents. + +**Creating Focused Build Graphs** + Generate smaller graphs for specific components of your application, making it easier to understand and manage complex dependency trees. + +**Documentation and Communication** + Create focused dependency diagrams for specific packages when documenting or explaining system architecture to team members. + +**Performance Optimization** + When working with very large dependency graphs, extract subsets to improve performance of analysis tools and reduce memory usage. + +Example Workflow +---------------- + +Here's a typical workflow for investigating a package's dependencies: + +.. code-block:: bash + + # Extract subset for a problematic package + fromager graph subset my-project-graph.json problematic-package -o debug-subset.json + + # Visualize the subset + fromager graph to-dot debug-subset.json -o debug-subset.dot + dot -Tpng debug-subset.dot -o debug-subset.png + + # Analyze why specific dependencies appear + fromager graph why debug-subset.json some-unexpected-dependency + +This workflow helps you quickly isolate and understand issues within a complex dependency tree. + +Output Format +------------- + +The subset command preserves the original graph structure and format. The output is a valid dependency graph that: + +- Maintains all edge relationships between included nodes +- Preserves requirement specifications and constraint information +- Can be used as input to other graph commands +- Is compatible with existing fromager workflows + +Error Handling +-------------- + +The command will report an error if: + +- The specified package is not found in the graph +- The specified version of a package is not found +- The graph file is invalid or corrupted + +Example error output: + +.. code-block:: bash + + $ fromager graph subset e2e/build-parallel/graph.json nonexistent-package + Error: Package nonexistent-package not found in graph diff --git a/docs/how-tos/graph-commands/index.rst b/docs/how-tos/graph-commands/index.rst index b7557764..eb2626a0 100644 --- a/docs/how-tos/graph-commands/index.rst +++ b/docs/how-tos/graph-commands/index.rst @@ -9,17 +9,18 @@ All examples use the sample graph file ``e2e/build-parallel/graph.json`` which c :maxdepth: 1 :glob: - [uvw]* + [euvw]* Overview of Graph Commands -------------------------- The ``fromager graph`` command group provides several subcommands for analyzing dependency graphs: +- ``subset``: Extract a focused subgraph containing only dependencies and dependents of a specific package - ``why``: Understand why a package appears in the dependency graph - ``to-dot``: Convert graph to DOT format for visualization with Graphviz - ``explain-duplicates``: Analyze multiple versions of packages in the graph - ``to-constraints``: Convert graph to constraints file format - ``migrate-graph``: Convert old graph formats to the current format -These tools help you understand complex dependency relationships, debug unexpected dependencies, and create visual representations of your build requirements. +These tools help you understand complex dependency relationships, debug unexpected dependencies, create focused subgraphs for analysis, and create visual representations of your build requirements. diff --git a/src/fromager/commands/graph.py b/src/fromager/commands/graph.py index 4b350b6c..d249ed13 100644 --- a/src/fromager/commands/graph.py +++ b/src/fromager/commands/graph.py @@ -459,6 +459,190 @@ def why( find_why(graph, node, depth, 0, requirement_type) +@graph.command() +@click.option( + "-o", + "--output", + type=clickext.ClickPath(), + help="Output file path for the subset graph", +) +@click.option( + "--version", + type=clickext.PackageVersion(), + help="Limit subset to specific version of the package", +) +@click.argument( + "graph-file", + type=str, +) +@click.argument("package-name", type=str) +@click.pass_obj +def subset( + wkctx: context.WorkContext, + graph_file: str, + package_name: str, + output: pathlib.Path | None, + version: Version | None, +) -> None: + """Extract a subset of a build graph related to a specific package. + + Creates a new graph containing only nodes that depend on the specified package + and the dependencies of that package. By default includes all versions of the + package, but can be limited to a specific version with --version. + """ + try: + graph = DependencyGraph.from_file(graph_file) + subset_graph = extract_package_subset(graph, package_name, version) + + if output: + with open(output, "w") as f: + subset_graph.serialize(f) + else: + subset_graph.serialize(sys.stdout) + except ValueError as e: + raise click.ClickException(str(e)) from e + + +def extract_package_subset( + graph: DependencyGraph, + package_name: str, + version: Version | None = None, +) -> DependencyGraph: + """Extract a subset of the graph containing nodes related to a specific package. + + Creates a new graph containing: + - All nodes matching the package name (optionally filtered by version) + - All nodes that depend on the target package (dependents) + - All dependencies of the target package + + Args: + graph: The source dependency graph + package_name: Name of the package to extract subset for + version: Optional version to filter target nodes + + Returns: + A new DependencyGraph containing only the related nodes + + Raises: + ValueError: If package not found in graph + """ + # Find target nodes matching the package name + target_nodes = graph.get_nodes_by_name(package_name) + if version: + target_nodes = [node for node in target_nodes if node.version == version] + + if not target_nodes: + version_msg = f" version {version}" if version else "" + raise ValueError(f"Package {package_name}{version_msg} not found in graph") + + # Collect all related nodes + related_nodes: set[str] = set() + + # Add target nodes + for node in target_nodes: + related_nodes.add(node.key) + + # Traverse up to find dependents (what depends on our package) + visited_up: set[str] = set() + for target_node in target_nodes: + _collect_dependents(target_node, related_nodes, visited_up) + + # Traverse down to find dependencies (what our package depends on) + visited_down: set[str] = set() + for target_node in target_nodes: + _collect_dependencies(target_node, related_nodes, visited_down) + + # Always include ROOT if any target nodes are top-level dependencies + for target_node in target_nodes: + for parent_edge in target_node.parents: + if parent_edge.destination_node.key == ROOT: + related_nodes.add(ROOT) + break + + # Create new graph with only related nodes + subset_graph = DependencyGraph() + _build_subset_graph(graph, subset_graph, related_nodes) + + return subset_graph + + +def _collect_dependents( + node: DependencyNode, + related_nodes: set[str], + visited: set[str], +) -> None: + """Recursively collect all nodes that depend on the given node.""" + if node.key in visited: + return + visited.add(node.key) + + for parent_edge in node.parents: + parent_node = parent_edge.destination_node + related_nodes.add(parent_node.key) + _collect_dependents(parent_node, related_nodes, visited) + + +def _collect_dependencies( + node: DependencyNode, + related_nodes: set[str], + visited: set[str], +) -> None: + """Recursively collect all dependencies of the given node.""" + if node.key in visited: + return + visited.add(node.key) + + for child_edge in node.children: + child_node = child_edge.destination_node + related_nodes.add(child_node.key) + _collect_dependencies(child_node, related_nodes, visited) + + +def _build_subset_graph( + source_graph: DependencyGraph, + target_graph: DependencyGraph, + included_nodes: set[str], +) -> None: + """Build the subset graph with only the included nodes and their edges.""" + # First pass: add all included nodes + for node_key in included_nodes: + source_node = source_graph.nodes[node_key] + if node_key == ROOT: + continue # ROOT is already created in the new graph + + # Add the node to target graph + target_graph._add_node( + req_name=source_node.canonicalized_name, + version=source_node.version, + download_url=source_node.download_url, + pre_built=source_node.pre_built, + constraint=source_node.constraint, + ) + + # Second pass: add edges between included nodes + for node_key in included_nodes: + source_node = source_graph.nodes[node_key] + for child_edge in source_node.children: + child_key = child_edge.destination_node.key + # Only add edge if both parent and child are in the subset + if child_key in included_nodes: + child_node = child_edge.destination_node + target_graph.add_dependency( + parent_name=source_node.canonicalized_name + if source_node.canonicalized_name + else None, + parent_version=source_node.version + if source_node.canonicalized_name + else None, + req_type=child_edge.req_type, + req=child_edge.req, + req_version=child_node.version, + download_url=child_node.download_url, + pre_built=child_node.pre_built, + constraint=child_node.constraint, + ) + + def find_why( graph: DependencyGraph, node: DependencyNode, diff --git a/tests/test_commands_graph.py b/tests/test_commands_graph.py index 5c9c44bd..f2408b38 100644 --- a/tests/test_commands_graph.py +++ b/tests/test_commands_graph.py @@ -1,3 +1,4 @@ +import json import pathlib from click.testing import CliRunner @@ -11,3 +12,112 @@ def test_fromager_version(cli_runner: CliRunner, e2e_path: pathlib.Path) -> None assert result.exit_code == 0 assert "1. flit-core==3.12.0, setuptools==80.8.0" in result.stdout assert "Building 16 packages in 4 rounds" in result.stdout + + +def test_graph_subset_basic(cli_runner: CliRunner, e2e_path: pathlib.Path) -> None: + """Test basic subset extraction for a package with dependencies.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + result = cli_runner.invoke( + fromager, ["graph", "subset", str(graph_json), "keyring"] + ) + + assert result.exit_code == 0 + subset_data = json.loads(result.stdout) + + # Should include keyring and its dependencies and dependents + assert "keyring==25.6.0" in subset_data + assert "jaraco-classes==3.4.0" in subset_data # keyring dependency + assert "imapautofiler==1.14.0" in subset_data # depends on keyring + assert "" in subset_data # ROOT node should be included + + +def test_graph_subset_with_version( + cli_runner: CliRunner, e2e_path: pathlib.Path +) -> None: + """Test subset extraction with specific version filtering.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + result = cli_runner.invoke( + fromager, + ["graph", "subset", str(graph_json), "setuptools", "--version", "80.8.0"], + ) + + assert result.exit_code == 0 + subset_data = json.loads(result.stdout) + + # Should include only the specific version + assert "setuptools==80.8.0" in subset_data + # Should include packages that depend on setuptools + assert "keyring==25.6.0" in subset_data + assert "imapautofiler==1.14.0" in subset_data + + +def test_graph_subset_output_to_file( + cli_runner: CliRunner, e2e_path: pathlib.Path, tmp_path: pathlib.Path +) -> None: + """Test subset extraction with output to file.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + output_file = tmp_path / "subset.json" + + result = cli_runner.invoke( + fromager, ["graph", "subset", str(graph_json), "jinja2", "-o", str(output_file)] + ) + + assert result.exit_code == 0 + assert output_file.exists() + + with open(output_file) as f: + subset_data = json.load(f) + + assert "jinja2==3.1.6" in subset_data + assert "markupsafe==3.0.2" in subset_data # jinja2 dependency + assert "imapautofiler==1.14.0" in subset_data # depends on jinja2 + + +def test_graph_subset_nonexistent_package( + cli_runner: CliRunner, e2e_path: pathlib.Path +) -> None: + """Test error handling for non-existent package.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + result = cli_runner.invoke( + fromager, ["graph", "subset", str(graph_json), "nonexistent"] + ) + + assert result.exit_code != 0 + assert "not found in graph" in result.output + + +def test_graph_subset_nonexistent_version( + cli_runner: CliRunner, e2e_path: pathlib.Path +) -> None: + """Test error handling for non-existent version of existing package.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + result = cli_runner.invoke( + fromager, + ["graph", "subset", str(graph_json), "setuptools", "--version", "999.0.0"], + ) + + assert result.exit_code != 0 + assert "not found in graph" in result.output + + +def test_graph_subset_structure_integrity( + cli_runner: CliRunner, e2e_path: pathlib.Path +) -> None: + """Test that subset graph maintains proper structure and references.""" + graph_json = e2e_path / "build-parallel" / "graph.json" + result = cli_runner.invoke(fromager, ["graph", "subset", str(graph_json), "pyyaml"]) + + assert result.exit_code == 0 + subset_data = json.loads(result.stdout) + + # Verify all referenced nodes exist + for _node_key, node_data in subset_data.items(): + for edge in node_data.get("edges", []): + assert edge["key"] in subset_data, ( + f"Referenced node {edge['key']} not found in subset" + ) + + # Verify PyYAML is included + assert "pyyaml==6.0.2" in subset_data + # Verify its dependent is included + assert "imapautofiler==1.14.0" in subset_data