fix: optimize import modules #170

ChenZiHong-Gavin · 2026-01-30T09:53:36Z

This pull request introduces optimizations to the module import strategy across the graphgen library. The primary goal is to enhance application performance by minimizing startup time and reducing memory consumption.

gemini-code-assist · 2026-01-30T09:54:14Z

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive optimizations to the module import strategy across the graphgen library. The primary goal is to enhance application performance by minimizing startup time and reducing memory consumption. This is achieved through the strategic implementation of lazy imports for large dependencies, conditional imports for type-checking-only modules, and dynamic loading patterns for submodules. These changes collectively contribute to a more efficient and responsive system, particularly beneficial in environments where resource utilization is critical.

Highlights

Lazy and Conditional Imports: Many heavy libraries such as NumPy, Pandas, Ray, iGraph, RDFLib, and HuggingFace Transformers are now imported lazily (within functions/methods) or conditionally using typing.TYPE_CHECKING. This significantly reduces the initial startup time and memory footprint of the application by only loading modules when they are actually needed at runtime.
Dynamic Module Loading via getattr: The graphgen/models/__init__.py and graphgen/models/llm/__init__.py files have been refactored to utilize Python's __getattr__ mechanism. This enables dynamic loading of submodules, meaning that modules are imported only when their attributes are accessed, further optimizing import performance.
Improved Type Hinting with Forward References: The use of from __future__ import annotations and string literal type hints (e.g., "np.number", "pd.DataFrame", "ray.actor.ActorHandle") has been consistently applied. This ensures that type hints remain valid for static analysis tools without causing circular import issues or runtime dependencies on modules only used for type checking.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive optimization of module imports across the codebase to improve startup time and reduce memory usage. The changes primarily involve lazy-loading dependencies by moving imports into functions/methods where they are used, leveraging typing.TYPE_CHECKING for type-only imports, and using PEP 562 for lazy-loading submodules in packages.

Overall, the changes are well-executed and follow best practices for import optimization. My review includes a couple of suggestions for improving maintainability in cases where a core dependency is repeatedly imported within multiple methods of the same class. Consolidating these imports would reduce code duplication while still respecting the goal of lazy loading.

gemini-code-assist · 2026-01-30T09:55:51Z

graphgen/common/init_storage.py

        self.actor = actor_handle

    def data(self) -> Dict[str, Any]:
+        import ray


Importing ray in every method of RemoteKVStorageProxy and RemoteGraphStorageProxy introduces significant code duplication. Since these classes are tightly coupled with Ray, consider importing ray once at the top of the file. This would simplify the code and improve maintainability. Python's import caching mechanism ensures that subsequent imports are very fast, so there's minimal performance overhead.

gemini-code-assist · 2026-01-30T09:55:51Z

graphgen/storage/graph/networkx_storage.py

        return self._graph.number_of_edges()

    def get_connected_components(self, undirected: bool = True) -> List[Set[str]]:
+        import networkx as nx


Importing networkx in multiple methods within the NetworkXStorage class leads to code duplication. Since this class is a wrapper around a NetworkX graph, networkx is a core dependency. Consider importing it once at the top of the file. This will make the code cleaner and more maintainable without a significant performance penalty, thanks to Python's module caching.

graphgen/bases/base_operator.py

-import pandas as pd
-import ray
+if TYPE_CHECKING:
+    import numpy as np


graphgen/operators/chunk/chunk_service.py

+    from graphgen.models import (
+        ChineseRecursiveTextSplitter,
+        RecursiveCharacterSplitter,
+        Tokenizer,
+    )


graphgen/models/reader/csv_reader.py

 from graphgen.bases.base_reader import BaseReader

+if TYPE_CHECKING:
+    import ray


graphgen/models/partitioner/leiden_partitioner.py

 from graphgen.bases.datatypes import Community

+if TYPE_CHECKING:
+    import igraph as ig


graphgen/models/reader/parquet_reader.py

 from graphgen.bases.base_reader import BaseReader

+if TYPE_CHECKING:
+    import ray


graphgen/models/reader/pdf_reader.py

 from graphgen.utils import logger, pick_device

+if TYPE_CHECKING:
+    import ray


graphgen/models/reader/rdf_reader.py

 from graphgen.utils import logger

+if TYPE_CHECKING:
+    import ray


graphgen/models/reader/rdf_reader.py


+if TYPE_CHECKING:
+    import ray
+    import rdflib


graphgen/models/reader/rdf_reader.py

+    import ray
+    import rdflib
+    from ray.data import Dataset
+    from rdflib import Literal


graphgen/models/reader/txt_reader.py

 from graphgen.bases.base_reader import BaseReader

+if TYPE_CHECKING:
+    import ray


ChenZiHong-Gavin · 2026-01-30T10:19:30Z

/gemini review

gemini-code-assist

Code Review

This pull request does an excellent job of optimizing module imports throughout the codebase. The use of TYPE_CHECKING blocks, lazy imports within functions, and the PEP 562 __getattr__ pattern are all effective strategies for improving startup performance and managing dependencies. The changes are applied consistently and correctly. My review includes a couple of minor suggestions to address some stylistic formatting issues related to excessive blank lines, which would improve code readability.

gemini-code-assist · 2026-01-30T10:21:27Z

graphgen/common/init_storage.py

        return ray.get(self.actor.get_all_node_degrees.remote())

    def get_node_count(self) -> int:
+


This blank line is unnecessary and harms readability by making the method less compact. This applies to many other simple proxy methods in this class as well (e.g., get_edge_count, has_node). For simple one-line proxy methods, it's best to keep them compact.

gemini-code-assist · 2026-01-30T10:21:27Z

graphgen/storage/graph/networkx_storage.py

        return self._graph.number_of_edges()

    def get_connected_components(self, undirected: bool = True) -> List[Set[str]]:
+


This added blank line is unnecessary. Please consider removing it to keep the code compact. This comment also applies to other places in this file where extra blank lines have been added within method bodies (e.g., in load_nx_graph, _stabilize_graph).

fix: optimize import modules

8fa5bc8

github-actions bot added the core label Jan 30, 2026

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

github-code-quality bot found potential problems Jan 30, 2026

View reviewed changes

ChenZiHong-Gavin added 2 commits January 30, 2026 18:13

fix: adjust ray import

b75d042

merge

64dd98c

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

ChenZiHong-Gavin merged commit 33cc281 into main Jan 30, 2026
9 checks passed

ChenZiHong-Gavin deleted the fix/optimize-import-modules branch January 30, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: optimize import modules #170

fix: optimize import modules #170

Uh oh!

ChenZiHong-Gavin commented Jan 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

ChenZiHong-Gavin commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return ray.get(self.actor.get_all_node_degrees.remote())

		def get_node_count(self) -> int:

		return self._graph.number_of_edges()

		def get_connected_components(self, undirected: bool = True) -> List[Set[str]]:

fix: optimize import modules #170

fix: optimize import modules #170

Uh oh!

Conversation

ChenZiHong-Gavin commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

ChenZiHong-Gavin commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenZiHong-Gavin commented Jan 30, 2026 •

edited

Loading