From 9a0b59d63f07c903d60f48452856a71c1ba1727d Mon Sep 17 00:00:00 2001 From: Edward Li Date: Mon, 10 Mar 2025 15:12:47 -0700 Subject: [PATCH 1/5] Add initial docs on import and export resolution --- .../2. parsing/B. AST Construction.md | 2 +- .../2. parsing/C. Directory Parsing.md | 48 ++++++++++++ architecture/3. imports-exports/A. Imports.md | 55 +++++++++++++- architecture/3. imports-exports/B. Exports.md | 64 +++++++++++++++- .../3. imports-exports/C. TSConfig.md | 75 ++++++++++++++++++- 5 files changed, 239 insertions(+), 5 deletions(-) create mode 100644 architecture/2. parsing/C. Directory Parsing.md diff --git a/architecture/2. parsing/B. AST Construction.md b/architecture/2. parsing/B. AST Construction.md index c6484aaba..06a1cd48c 100644 --- a/architecture/2. parsing/B. AST Construction.md +++ b/architecture/2. parsing/B. AST Construction.md @@ -74,4 +74,4 @@ Statements have another layer of complexity. They are essentially pattern based ## Next Step -After the AST is constructed, the system moves on to [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files. +After the AST is constructed, the system moves on to [Directory Parsing](./C.%20Directory%20Parsing.md) to build a hierarchical representation of the codebase's directory structure. diff --git a/architecture/2. parsing/C. Directory Parsing.md b/architecture/2. parsing/C. Directory Parsing.md new file mode 100644 index 000000000..55f7e3d36 --- /dev/null +++ b/architecture/2. parsing/C. Directory Parsing.md @@ -0,0 +1,48 @@ +# Directory Parsing + +The Directory Parsing system is responsible for creating and maintaining a hierarchical representation of the codebase's directory structure in memory. Directories do not hold references to the file itself, but instead holds the names to the files and does a dynamic lookup when needed. + +In addition to providing a more cohesive API for listing directory files, the Directory API is also used for [TSConfig](../3.%20imports-exports/C.%20TSConfig.md)-based (Import Resolution)[../3.%20imports-exports/A.%20Imports.md]. + +## Core Components + +The Directory Tree is constructed during the initial build_graph step in codebase_context.py, and is recreated from scratch on every re-sync. More details are below: + +## Directory Tree Construction + +The directory tree is built through the following process: + +1. The `build_directory_tree` method in `CodebaseContext` is called during graph initialization or when the codebase structure changes. +2. The method iterates through all files in the repository, creating directory objects for each directory path encountered. +3. For each file, it adds the file to its parent directory using the `_add_file` method. +4. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True`. + +## Directory Representation + +The `Directory` class provides a rich interface for working with directories: + +- **Hierarchy Navigation**: Access parent directories and subdirectories +- **File Access**: Retrieve files by name or extension +- **Symbol Access**: Find symbols (classes, functions, etc.) within files in the directory +- **Directory Operations**: Rename, remove, or update directories + +Each `Directory` instance maintains: +- A reference to its parent directory +- Lists of files and subdirectories +- Methods to recursively traverse the directory tree + +## File Representation + +Files are represented by the `File` class and its subclasses: + +- `File`: Base class for all files, supporting basic operations like reading and writing content +- `SourceFile`: Specialized class for source code files that can be parsed into an AST + +Files maintain references to: +- Their parent directory +- Their content (loaded dynamically to preserve the source of truth) +- For source files, the parsed AST and symbols + +## Next Step + +After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files. \ No newline at end of file diff --git a/architecture/3. imports-exports/A. Imports.md b/architecture/3. imports-exports/A. Imports.md index 09d70d902..16d2517fd 100644 --- a/architecture/3. imports-exports/A. Imports.md +++ b/architecture/3. imports-exports/A. Imports.md @@ -1,7 +1,58 @@ # Import Resolution -TODO +Import resolution follows AST construction in the code analysis pipeline. It identifies dependencies between modules and builds a graph of relationships across the codebase. + +> NOTE: This is an actively evolving part of Codegen SDK, so some details here may be imcomplete, outdated, or incorrect. + +## Purpose + +The import resolution system serves these purposes: + +1. **Dependency Tracking**: Maps relationships between files by resolving import statements. +2. **Symbol Resolution**: Connects imported symbols to their definitions. +3. **Module Graph Construction**: Builds a directed graph of module dependencies. +4. **(WIP) Cross-Language Support**: Provides implementations for different programming languages. + +## Core Components + +### ImportResolution Class + +The `ImportResolution` class represents the outcome of resolving an import statement. It contains: +- The source file containing the imported symbol +- The specific symbol being imported (if applicable) +- Whether the import references an entire file/module + +### Import Base Class + +The `Import` class is the foundation for language-specific import implementations. It: +- Stores metadata about the import (module path, symbol name, alias) +- Provides the abstract `resolve_import()` method +- Adds symbol resolution edges to the codebase graph + +### Language-Specific Implementations + +#### Python Import Resolution + +The `PyImport` class extends the base `Import` class with Python-specific logic: + +- Handles relative imports +- Supports module imports, named imports, and wildcard imports +- Resolves imports using configurable resolution paths and `sys.path` +- Handles special cases like `__init__.py` files + +#### TypeScript Import Resolution + +The `TSImport` class implements TypeScript-specific resolution: + +- Supports named imports, default imports, and namespace imports +- Handles type imports and dynamic imports +- Resolves imports using TSConfig path mappings +- Supports file extension resolution + +## Implementation + +After file and directory parse, we loop through all import nodes and perform `add_symbol_resolution_edge`. This then invokes the language-specific `resolve_import` method that converts the import statement into a resolvable `ImportResolution` object (or None if the import cannot be resolved). This import symbol and the `ImportResolution` object are then used to add a symbol resolution edge to the graph, where it can then be used in future steps to resolve symbols. ## Next Step -After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by comprehensive [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md). +After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md). diff --git a/architecture/3. imports-exports/B. Exports.md b/architecture/3. imports-exports/B. Exports.md index 9da67fcb4..58e05ec1b 100644 --- a/architecture/3. imports-exports/B. Exports.md +++ b/architecture/3. imports-exports/B. Exports.md @@ -1,6 +1,68 @@ # Export Analysis -TODO +Some languages contain additional metadata on "exported" symbols, specifying which symbols are made available to other modules. Export analysis follows import resolution in the code analysis pipeline. It identifies and processes exported symbols from modules, enabling the system to track what each module makes available to others. + +## Core Components + +### Export Base Class + +The `Export` class serves as the foundation for language-specific export implementations. It: +- Stores metadata about the export (symbol name, is default, etc.) +- Tracks the relationship between the export and its declared symbol +- Adds export edges to the codebase graph + +### TypeScript Export Implementation + +The `TSExport` class implements TypeScript-specific export handling: + +- Supports various export styles (named exports, default exports, re-exports) +- Handles export declarations with and without values +- Processes wildcard exports (`export * from 'module'`) +- Manages export statements with multiple exports + +#### Export Types and Symbol Resolution + +The TypeScript implementation handles several types of exports: + +1. **Declaration Exports** + - Function declarations (including generators) + - Class declarations + - Interface declarations + - Type alias declarations + - Enum declarations + - Namespace declarations + - Variable/constant declarations + +2. **Value Exports** + - Object literals with property exports + - Arrow functions and function expressions + - Classes and class expressions + - Assignment expressions + - Primitive values and expressions + +3. **Special Export Forms** + - Wildcard exports (`export * from 'module'`) + - Named re-exports (`export { name as alias } from 'module'`) + - Default exports with various value types + +#### Symbol Tracking and Dependencies + +The export system: +- Maintains relationships between exported symbols and their declarations +- Validates export names match their declared symbols +- Tracks dependencies through the codebase graph +- Handles complex scenarios like: + - Shorthand property exports in objects + - Nested function and class declarations + - Re-exports from other modules + +#### Integration with Type System + +Exports are tightly integrated with the type system: +- Exported type declarations are properly tracked +- Symbol resolution considers both value and type exports +- Re-exports preserve type information +- Export edges in the codebase graph maintain type relationships ## Next Step diff --git a/architecture/3. imports-exports/C. TSConfig.md b/architecture/3. imports-exports/C. TSConfig.md index e9c77ae0c..56ae037fc 100644 --- a/architecture/3. imports-exports/C. TSConfig.md +++ b/architecture/3. imports-exports/C. TSConfig.md @@ -1,6 +1,79 @@ # TSConfig Support -TODO +TSConfig support is a critical component for TypeScript projects in the import resolution system. It processes TypeScript configuration files (tsconfig.json) to correctly resolve module paths and dependencies. + +## Purpose + +The TSConfig support system serves these purposes: + +1. **Path Mapping**: Resolves custom module path aliases defined in the tsconfig.json file. +2. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration. +3. **Project References**: Manages dependencies between TypeScript projects using the references field. +4. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures. + +## Core Components + +### TSConfig Class + +The `TSConfig` class represents a parsed TypeScript configuration file. It: +- Parses and stores the configuration settings from tsconfig.json +- Handles inheritance through the "extends" field +- Provides methods for translating between import paths and absolute file paths +- Caches computed values for performance optimization + +## Configuration Processing + +### Configuration Inheritance + +TSConfig files can extend other configuration files through the "extends" field: + +1. Base configurations are loaded and parsed first +2. Child configurations inherit and can override settings from their parent +3. Path mappings, base URLs, and other settings are merged appropriately + +### Path Mapping Resolution + +The system processes the "paths" field in tsconfig.json to create a mapping between import aliases and file paths: + +1. Path patterns are normalized (removing wildcards, trailing slashes) +2. Relative paths are converted to absolute paths +3. Mappings are stored for efficient lookup during import resolution + +### Project References + +The "references" field defines dependencies between TypeScript projects: + +1. Referenced projects are identified and loaded +2. Their configurations are analyzed to determine import paths +3. Import resolution can cross project boundaries using these references + +## Import Resolution Process + +### Path Translation + +When resolving an import path in TypeScript: + +1. Check if the path matches any path alias in the tsconfig.json +2. If a match is found, translate the path according to the mapping +3. Apply baseUrl resolution for non-relative imports +4. Handle project references for cross-project imports + +### Optimization Techniques + +The system employs several optimizations: + +1. Caching computed values to avoid redundant processing +2. Early path checking for common patterns (e.g., paths starting with "@" or "~") +3. Hierarchical resolution that respects the configuration inheritance chain + +## Integration with Import Resolution + +The TSConfig support integrates with the broader import resolution system: + +1. Each TypeScript file is associated with its nearest tsconfig.json +2. Import statements are processed using the file's associated configuration +3. Path mappings are applied during the module resolution process +4. Project references are considered when resolving imports across project boundaries ## Next Step From eea2b8550030debf386e7025cb2b67f6cc53afb7 Mon Sep 17 00:00:00 2001 From: Edward Li Date: Mon, 10 Mar 2025 15:14:01 -0700 Subject: [PATCH 2/5] Add docs on transactionmanager --- .../5. performing-edits/A. Edit Operations.md | 7 -- .../5. performing-edits/A. Transactions.md | 52 +++++++++++ .../B. Transaction Manager.md | 86 ++++++++++++++++++- 3 files changed, 137 insertions(+), 8 deletions(-) delete mode 100644 architecture/5. performing-edits/A. Edit Operations.md create mode 100644 architecture/5. performing-edits/A. Transactions.md diff --git a/architecture/5. performing-edits/A. Edit Operations.md b/architecture/5. performing-edits/A. Edit Operations.md deleted file mode 100644 index 850b8e103..000000000 --- a/architecture/5. performing-edits/A. Edit Operations.md +++ /dev/null @@ -1,7 +0,0 @@ -# Edit Operations - -TODO - -## Next Step - -After preparing edits, they are managed by the [Transaction Manager](./B.%20Transaction%20Manager.md) to ensure consistency and atomicity. diff --git a/architecture/5. performing-edits/A. Transactions.md b/architecture/5. performing-edits/A. Transactions.md new file mode 100644 index 000000000..a901831f0 --- /dev/null +++ b/architecture/5. performing-edits/A. Transactions.md @@ -0,0 +1,52 @@ +# Transactions + +Transactions represent atomic changes to files in the codebase. Each transaction defines a specific modification that can be queued, validated, and executed. + +## Transaction Types + +The transaction system is built around a base `Transaction` class with specialized subclasses: + +### Content Transactions +- **RemoveTransaction**: Removes content between specified byte positions +- **InsertTransaction**: Inserts new content at a specified byte position +- **EditTransaction**: Replaces content between specified byte positions + +### File Transactions +- **FileAddTransaction**: Creates a new file +- **FileRenameTransaction**: Renames an existing file +- **FileRemoveTransaction**: Deletes a file + +## Transaction Priority + +Transactions are executed in a specific order defined by the `TransactionPriority` enum: + +1. **Remove** (highest priority) +2. **Edit** +3. **Insert** +4. **FileAdd** +5. **FileRename** +6. **FileRemove** + +This ordering ensures that content is removed before editing or inserting, and that all content operations happen before file operations. + +## Key Concepts + +### Byte-Level Operations + +All content transactions operate at the byte level rather than on lines or characters. This provides precise control over modifications and allows transactions to work with any file type, regardless of encoding or line ending conventions. + +### Content Generation + +Transactions support both static content (direct strings) and dynamic content (generated at execution time). This flexibility allows for complex transformations where the new content depends on the state of the codebase at execution time. + +Most content transactions use static content, but dynamic content is supported for rare cases where the new content depends on the state of other transactions. One common example is handling whitespace during add and remove transactions. + +### File Operations + +File transactions are used to create, rename, and delete files. + +> NOTE: It is important to note that most file transactions such as `FileAddTransaction` are no-ops (AKA skiping Transaction Manager) and instead applied immediately once the `create_file` API is called. This allows for created files to be immediately available for edit and use. The reason file operations are still added to Transaction Manager is to help with optimizing graph re-parse and diff generation. (Keeping track of which files exist and don't exist anymore). + +## Next Step + +After understanding the transaction system, they are managed by the [Transaction Manager](./B.%20Transaction%20Manager.md) to ensure consistency and atomicity. diff --git a/architecture/5. performing-edits/B. Transaction Manager.md b/architecture/5. performing-edits/B. Transaction Manager.md index a41d91270..efb0895f7 100644 --- a/architecture/5. performing-edits/B. Transaction Manager.md +++ b/architecture/5. performing-edits/B. Transaction Manager.md @@ -1,6 +1,90 @@ # Transaction Manager -TODO +The Transaction Manager coordinates the execution of transactions across multiple files, handling conflict resolution, and enforcing resource limits. + +## High-level Concept + +Since all node operations are on byte positions of the original file, multiple operations that change the total byte length of the file will result in offset errors and broken code. + +Give this example over here: +``` +Original: FooBar +Operations: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) + Remove "Bar" (bytes 3-6), Insert "World" (bytes 3-7) +``` + +If these operations were applied in order, the result would be: +``` +Result: FooBar +Operation: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) +Result: HelloBar +Operation: Remove "Bar" (bytes 3-6), Insert "World" (bytes 3-7) +Result: HelWorldar +``` + +Resulting in an invalid output. + +⭐ The key with TransactionManager is that it queues up all transactions in a given Codemod run, the applies all of the ***backwards*** from the last byte range to the first. Given the same example as above but applied backwards: + +``` +Result: FooBar +Operation: Remove "Bar" (bytes 3-6), Insert "World" (bytes 3-7) +Result: FooWorld +Operation: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) +Result: HelloWorld +``` + +TransactionManager also performs some additional operations such detecting conflicts and coordinating (some basic) conflict resolutions. Overall, the core responsibilities are as follows: + +1. **Transaction Queueing**: Maintains a queue of pending transactions organized by file +2. **Conflict Resolution**: Detects and resolves conflicts between transactions +3. **Transaction Execution**: Applies transactions in the correct order +4. **Resource Management**: Enforces limits on transaction count and execution time +5. **Change Tracking**: Generates diffs for applied changes + +## Sorting Transactions + +Before execution, transactions are sorted based on (in this priority): + +1. Position in the file (higher byte positions first) +2. Transaction type (following the priority order) +3. User-defined priority +4. Creation order + +This sorting ensures that transactions are applied in a deterministic order that minimizes conflicts. Larger byte ranges are always edited first, removals happen before insertions, and older transactions are applied before newer ones. + +## Conflict Resolution + +### Conflict Types + +The manager identifies several types of conflicts: + +1. **Overlapping Transactions**: Multiple transactions affecting the same byte range +2. **Contained Transactions**: One transaction completely contained within another +3. **Adjacent Transactions**: Transactions affecting adjacent byte ranges + +In it's current implementation, TransactionManager only handles Contained Transactions that are trivially sovable. (If a remove transaction completely overlaps with another remove transaction, only the larger one will be kept) + +## Resource Management + +The Transaction Manager enforces two types of limits: + +1. **Transaction Count**: Optional maximum number of transactions +2. **Execution Time**: Optional time limit for transaction processing + +These limits prevent excessive resource usage and allow for early termination of long-running operations. + +## Commit Process + +The commit process applies queued transactions to the codebase: + +1. Transactions are sorted according to priority rules +2. Files are processed one by one +3. For each file, transactions are executed in order +4. Diffs are collected for each modified file +5. The queue is cleared after successful commit + +The diff's are later used during resyc to efficiently update the codebase graph as changes occur. See [Incremental Computation](../6.%20incremental-computation/A.%20Overview.md) for more details. ## Next Step From c00ab32620de1a641535aadedad8c96e492ea923 Mon Sep 17 00:00:00 2001 From: Edward Li Date: Mon, 10 Mar 2025 15:35:04 -0700 Subject: [PATCH 3/5] Add Dependency Manager Docs --- architecture/external/dependency-manager.md | 92 ++++++++++++++++++++- 1 file changed, 91 insertions(+), 1 deletion(-) diff --git a/architecture/external/dependency-manager.md b/architecture/external/dependency-manager.md index 071a10526..9fce83b58 100644 --- a/architecture/external/dependency-manager.md +++ b/architecture/external/dependency-manager.md @@ -1,6 +1,96 @@ # Dependency Manager -TODO +> WARNING: Dependency manager is an experimental feature designed for Codegen Cloud! The current implementation WILL delete any existing `node_modules` folder! + +## Motivation + +A future goal of Codegen is to support resolving symbols directly from dependencies, instead of falling back to `ExternalModule`s. (In fact, some experimental Codegen features such as [Type Engine](./type-engine.md) already parse and use 3rd party dependencies from `node_modules`) + +This requires us to pull and install dependencies from a repository's `package.json`. However, simply installing dependencies from `package.json` is not enough, as many projects require internal dependencies that use custom NPM registries. Others require custom post-install scripts that may not run on our codemod environments. + +Dependency Manager is an experimental solution to this problem. It creates a shadow tree of `package.json` files that includes all core dependencies and settings from the repository's original `package.json` without any custom registries or potentially problematic settings. + +## Implementation + +Given this example codebase structure: + +``` +repo/ +├── package.json +├── node_modules/ +├── src/ +│ ├── frontend/ +│ │ └── package.json +│ └── backend/ +│ └── package.json +└── tests/ + └── package.json +``` + +Dependency Manager first deletes any existing `node_modules` folder in the user's repository. After this step, Dependency Manager initializes itself to use the correct version of NPM, Yarn, or PNPM for the user's repository. + +Dependency Manager then creates a "shadow copy" of the repository's original `package.json` file. This shadow copy is used to later revert any changes made by Codegen before running codemods. With these steps, the codebase structure now looks like this: + +``` +repo/ +├── package.json +├── package.json.gs_internal.bak +├── src/ +│ ├── frontend/ +│ │ └── package.json +│ │ └── package.json.gs_internal.bak +│ └── backend/ +│ └── package.json +│ └── package.json.gs_internal.bak +└── tests/ + └── package.json + └── package.json.gs_internal.bak +``` + +Next, Dependency Manager iterates through all the `package.json` files and creates a "clean" version of each file. This "clean" version only includes a subset of information from the original, including: +- Name +- Version +- Package Manager Details +- Workspaces + +Most importantly, this step iterates through `dependencies` and `devDependencies` of each `package.json` file and validates them against the npm registry. If a package is not found, it is added to a list of invalid dependencies and removed from the `package.json` file. + +After this step, the codebase structure now looks like this: + +``` +repo/ +├── package.json (modified) +├── package.json.gs_internal.bak +├── src/ +│ ├── frontend/ +│ │ └── package.json (modified) +│ │ └── package.json.gs_internal.bak +│ └── backend/ +│ └── package.json (modified) +│ └── package.json.gs_internal.bak +└── tests/ + └── package.json (modified) + └── package.json.gs_internal.bak +``` + +After the shadow and cleaning steps, Dependency Manager proceeds to install the user's dependencies through NPM, Yarn, or PNPM, depending on the detected installer type. Finally, Dependency Manager restores the original `package.json` files and removes the shadow copies. + +The final codebase structure looks like this: + +``` +repo/ +├── package.json +├── node_modules/ +├── src/ +│ ├── frontend/ +│ │ └── package.json +│ └── backend/ +│ └── package.json +└── tests/ + └── package.json +``` + +If all goes well, Dependency Manager will have successfully installed the user's dependencies and prepared the codebase for codemods. ## Next Step From 9323f46a2fa7f439712d19ecab477ade63da531a Mon Sep 17 00:00:00 2001 From: Edward Li Date: Mon, 10 Mar 2025 15:56:29 -0700 Subject: [PATCH 4/5] Add docs on Type Engine --- architecture/external/dependency-manager.md | 2 ++ architecture/external/type-engine.md | 20 +++++++++++++++++++- 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/architecture/external/dependency-manager.md b/architecture/external/dependency-manager.md index 9fce83b58..e2dfa270b 100644 --- a/architecture/external/dependency-manager.md +++ b/architecture/external/dependency-manager.md @@ -10,6 +10,8 @@ This requires us to pull and install dependencies from a repository's `package.j Dependency Manager is an experimental solution to this problem. It creates a shadow tree of `package.json` files that includes all core dependencies and settings from the repository's original `package.json` without any custom registries or potentially problematic settings. +> NOTE: Currently, this is only implemented for TypeScript projects. + ## Implementation Given this example codebase structure: diff --git a/architecture/external/type-engine.md b/architecture/external/type-engine.md index 54313a82b..42b96f643 100644 --- a/architecture/external/type-engine.md +++ b/architecture/external/type-engine.md @@ -1,6 +1,24 @@ # Type Engine -TODO +Type Engine is an experimental feature of Codegen that leverages the [TypeScript Compiler API](https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API) to provide deeper insight into a user's codebase (such as resolving return types). + +> NOTE: Currently, this is only implemented for TypeScript projects. + +There are currently two experimental implementations of TypeScript's Type Engine: an external process-based implementation and a V8-based implementation. + +## Implementation (External Process) + +During codebase parsing, the Type Engine spawns a type inference subprocess (defined in `src/codegen/sdk/typescript/external/typescript_analyzer/run_full.ts`) that concurrently parses the codebase with the TypeScript API to resolve return types. The final analyzer output is placed in `/tmp/typescript-analysis.json` and is read in by Codegen to resolve return types. + +## Implementation (V8) + +The V8-based implementation is much more flexible and powerful in comparison but is currently not as stable. It uses the [PyMiniRacer](https://github.com/sqreen/py_mini_racer) package to spawn a V8-based JavaScript engine that can parse the codebase with the TypeScript API to resolve return types. + +The entirety of `src/codegen/sdk/typescript/external/typescript_analyzer` is compiled down using [Rollup.js](https://rollupjs.org/) into a single `index.js` file. A couple of patches are applied to the engine source to remove `require` and `export` statements, which are not supported by MiniRacer. + +Then, the entire `index.js` file is loaded into the MiniRacer context. To work around file read limitations with V8, an in-memory shadow filesystem is created that mimics the user's repository's filesystem. These are defined in `fsi.ts` (`FileSystemInterface`) and `fs_proxy.ts` (`ProxyFileSystem`). The TypeScript Compiler then uses the custom `ProxyFileSystem.readFile` function instead of the traditional `fs.readFile`. + +Once the analyzer is initialized and the codebase is parsed, the entire TypeScript Compiler API is available in the MiniRacer context. The analyzer can then be used to resolve return types for any function in the codebase or to parse the codebase and generate a full type analysis. ## Next Step From 144b39767f4cfe1d9e042a38ed49d406e184e683 Mon Sep 17 00:00:00 2001 From: EdwardJXLi <20020059+EdwardJXLi@users.noreply.github.com> Date: Mon, 10 Mar 2025 22:57:36 +0000 Subject: [PATCH 5/5] Automated pre-commit update --- .../2. parsing/C. Directory Parsing.md | 10 +++--- architecture/3. imports-exports/A. Imports.md | 8 +++-- architecture/3. imports-exports/B. Exports.md | 10 ++++-- .../3. imports-exports/C. TSConfig.md | 35 ++++++++++--------- .../5. performing-edits/A. Transactions.md | 12 ++++--- .../B. Transaction Manager.md | 30 ++++++++-------- architecture/external/dependency-manager.md | 1 + 7 files changed, 61 insertions(+), 45 deletions(-) diff --git a/architecture/2. parsing/C. Directory Parsing.md b/architecture/2. parsing/C. Directory Parsing.md index 55f7e3d36..f25de2e29 100644 --- a/architecture/2. parsing/C. Directory Parsing.md +++ b/architecture/2. parsing/C. Directory Parsing.md @@ -13,9 +13,9 @@ The Directory Tree is constructed during the initial build_graph step in codebas The directory tree is built through the following process: 1. The `build_directory_tree` method in `CodebaseContext` is called during graph initialization or when the codebase structure changes. -2. The method iterates through all files in the repository, creating directory objects for each directory path encountered. -3. For each file, it adds the file to its parent directory using the `_add_file` method. -4. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True`. +1. The method iterates through all files in the repository, creating directory objects for each directory path encountered. +1. For each file, it adds the file to its parent directory using the `_add_file` method. +1. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True\`. ## Directory Representation @@ -27,6 +27,7 @@ The `Directory` class provides a rich interface for working with directories: - **Directory Operations**: Rename, remove, or update directories Each `Directory` instance maintains: + - A reference to its parent directory - Lists of files and subdirectories - Methods to recursively traverse the directory tree @@ -39,10 +40,11 @@ Files are represented by the `File` class and its subclasses: - `SourceFile`: Specialized class for source code files that can be parsed into an AST Files maintain references to: + - Their parent directory - Their content (loaded dynamically to preserve the source of truth) - For source files, the parsed AST and symbols ## Next Step -After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files. \ No newline at end of file +After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files. diff --git a/architecture/3. imports-exports/A. Imports.md b/architecture/3. imports-exports/A. Imports.md index 16d2517fd..cca5951ab 100644 --- a/architecture/3. imports-exports/A. Imports.md +++ b/architecture/3. imports-exports/A. Imports.md @@ -9,15 +9,16 @@ Import resolution follows AST construction in the code analysis pipeline. It ide The import resolution system serves these purposes: 1. **Dependency Tracking**: Maps relationships between files by resolving import statements. -2. **Symbol Resolution**: Connects imported symbols to their definitions. -3. **Module Graph Construction**: Builds a directed graph of module dependencies. -4. **(WIP) Cross-Language Support**: Provides implementations for different programming languages. +1. **Symbol Resolution**: Connects imported symbols to their definitions. +1. **Module Graph Construction**: Builds a directed graph of module dependencies. +1. **(WIP) Cross-Language Support**: Provides implementations for different programming languages. ## Core Components ### ImportResolution Class The `ImportResolution` class represents the outcome of resolving an import statement. It contains: + - The source file containing the imported symbol - The specific symbol being imported (if applicable) - Whether the import references an entire file/module @@ -25,6 +26,7 @@ The `ImportResolution` class represents the outcome of resolving an import state ### Import Base Class The `Import` class is the foundation for language-specific import implementations. It: + - Stores metadata about the import (module path, symbol name, alias) - Provides the abstract `resolve_import()` method - Adds symbol resolution edges to the codebase graph diff --git a/architecture/3. imports-exports/B. Exports.md b/architecture/3. imports-exports/B. Exports.md index 58e05ec1b..0e42c98c4 100644 --- a/architecture/3. imports-exports/B. Exports.md +++ b/architecture/3. imports-exports/B. Exports.md @@ -7,6 +7,7 @@ Some languages contain additional metadata on "exported" symbols, specifying whi ### Export Base Class The `Export` class serves as the foundation for language-specific export implementations. It: + - Stores metadata about the export (symbol name, is default, etc.) - Tracks the relationship between the export and its declared symbol - Adds export edges to the codebase graph @@ -25,6 +26,7 @@ The `TSExport` class implements TypeScript-specific export handling: The TypeScript implementation handles several types of exports: 1. **Declaration Exports** + - Function declarations (including generators) - Class declarations - Interface declarations @@ -33,14 +35,16 @@ The TypeScript implementation handles several types of exports: - Namespace declarations - Variable/constant declarations -2. **Value Exports** +1. **Value Exports** + - Object literals with property exports - Arrow functions and function expressions - Classes and class expressions - Assignment expressions - Primitive values and expressions -3. **Special Export Forms** +1. **Special Export Forms** + - Wildcard exports (`export * from 'module'`) - Named re-exports (`export { name as alias } from 'module'`) - Default exports with various value types @@ -48,6 +52,7 @@ The TypeScript implementation handles several types of exports: #### Symbol Tracking and Dependencies The export system: + - Maintains relationships between exported symbols and their declarations - Validates export names match their declared symbols - Tracks dependencies through the codebase graph @@ -59,6 +64,7 @@ The export system: #### Integration with Type System Exports are tightly integrated with the type system: + - Exported type declarations are properly tracked - Symbol resolution considers both value and type exports - Re-exports preserve type information diff --git a/architecture/3. imports-exports/C. TSConfig.md b/architecture/3. imports-exports/C. TSConfig.md index 56ae037fc..b2362a7c8 100644 --- a/architecture/3. imports-exports/C. TSConfig.md +++ b/architecture/3. imports-exports/C. TSConfig.md @@ -7,15 +7,16 @@ TSConfig support is a critical component for TypeScript projects in the import r The TSConfig support system serves these purposes: 1. **Path Mapping**: Resolves custom module path aliases defined in the tsconfig.json file. -2. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration. -3. **Project References**: Manages dependencies between TypeScript projects using the references field. -4. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures. +1. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration. +1. **Project References**: Manages dependencies between TypeScript projects using the references field. +1. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures. ## Core Components ### TSConfig Class The `TSConfig` class represents a parsed TypeScript configuration file. It: + - Parses and stores the configuration settings from tsconfig.json - Handles inheritance through the "extends" field - Provides methods for translating between import paths and absolute file paths @@ -28,24 +29,24 @@ The `TSConfig` class represents a parsed TypeScript configuration file. It: TSConfig files can extend other configuration files through the "extends" field: 1. Base configurations are loaded and parsed first -2. Child configurations inherit and can override settings from their parent -3. Path mappings, base URLs, and other settings are merged appropriately +1. Child configurations inherit and can override settings from their parent +1. Path mappings, base URLs, and other settings are merged appropriately ### Path Mapping Resolution The system processes the "paths" field in tsconfig.json to create a mapping between import aliases and file paths: 1. Path patterns are normalized (removing wildcards, trailing slashes) -2. Relative paths are converted to absolute paths -3. Mappings are stored for efficient lookup during import resolution +1. Relative paths are converted to absolute paths +1. Mappings are stored for efficient lookup during import resolution ### Project References The "references" field defines dependencies between TypeScript projects: 1. Referenced projects are identified and loaded -2. Their configurations are analyzed to determine import paths -3. Import resolution can cross project boundaries using these references +1. Their configurations are analyzed to determine import paths +1. Import resolution can cross project boundaries using these references ## Import Resolution Process @@ -54,26 +55,26 @@ The "references" field defines dependencies between TypeScript projects: When resolving an import path in TypeScript: 1. Check if the path matches any path alias in the tsconfig.json -2. If a match is found, translate the path according to the mapping -3. Apply baseUrl resolution for non-relative imports -4. Handle project references for cross-project imports +1. If a match is found, translate the path according to the mapping +1. Apply baseUrl resolution for non-relative imports +1. Handle project references for cross-project imports ### Optimization Techniques The system employs several optimizations: 1. Caching computed values to avoid redundant processing -2. Early path checking for common patterns (e.g., paths starting with "@" or "~") -3. Hierarchical resolution that respects the configuration inheritance chain +1. Early path checking for common patterns (e.g., paths starting with "@" or "~") +1. Hierarchical resolution that respects the configuration inheritance chain ## Integration with Import Resolution The TSConfig support integrates with the broader import resolution system: 1. Each TypeScript file is associated with its nearest tsconfig.json -2. Import statements are processed using the file's associated configuration -3. Path mappings are applied during the module resolution process -4. Project references are considered when resolving imports across project boundaries +1. Import statements are processed using the file's associated configuration +1. Path mappings are applied during the module resolution process +1. Project references are considered when resolving imports across project boundaries ## Next Step diff --git a/architecture/5. performing-edits/A. Transactions.md b/architecture/5. performing-edits/A. Transactions.md index a901831f0..c27c7e65f 100644 --- a/architecture/5. performing-edits/A. Transactions.md +++ b/architecture/5. performing-edits/A. Transactions.md @@ -7,11 +7,13 @@ Transactions represent atomic changes to files in the codebase. Each transaction The transaction system is built around a base `Transaction` class with specialized subclasses: ### Content Transactions + - **RemoveTransaction**: Removes content between specified byte positions - **InsertTransaction**: Inserts new content at a specified byte position - **EditTransaction**: Replaces content between specified byte positions ### File Transactions + - **FileAddTransaction**: Creates a new file - **FileRenameTransaction**: Renames an existing file - **FileRemoveTransaction**: Deletes a file @@ -21,11 +23,11 @@ The transaction system is built around a base `Transaction` class with specializ Transactions are executed in a specific order defined by the `TransactionPriority` enum: 1. **Remove** (highest priority) -2. **Edit** -3. **Insert** -4. **FileAdd** -5. **FileRename** -6. **FileRemove** +1. **Edit** +1. **Insert** +1. **FileAdd** +1. **FileRename** +1. **FileRemove** This ordering ensures that content is removed before editing or inserting, and that all content operations happen before file operations. diff --git a/architecture/5. performing-edits/B. Transaction Manager.md b/architecture/5. performing-edits/B. Transaction Manager.md index efb0895f7..4ed78a750 100644 --- a/architecture/5. performing-edits/B. Transaction Manager.md +++ b/architecture/5. performing-edits/B. Transaction Manager.md @@ -7,6 +7,7 @@ The Transaction Manager coordinates the execution of transactions across multipl Since all node operations are on byte positions of the original file, multiple operations that change the total byte length of the file will result in offset errors and broken code. Give this example over here: + ``` Original: FooBar Operations: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) @@ -14,6 +15,7 @@ Operations: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) ``` If these operations were applied in order, the result would be: + ``` Result: FooBar Operation: Remove "Foo" (bytes 0-3), Insert "Hello" (bytes 0-5) @@ -37,19 +39,19 @@ Result: HelloWorld TransactionManager also performs some additional operations such detecting conflicts and coordinating (some basic) conflict resolutions. Overall, the core responsibilities are as follows: 1. **Transaction Queueing**: Maintains a queue of pending transactions organized by file -2. **Conflict Resolution**: Detects and resolves conflicts between transactions -3. **Transaction Execution**: Applies transactions in the correct order -4. **Resource Management**: Enforces limits on transaction count and execution time -5. **Change Tracking**: Generates diffs for applied changes +1. **Conflict Resolution**: Detects and resolves conflicts between transactions +1. **Transaction Execution**: Applies transactions in the correct order +1. **Resource Management**: Enforces limits on transaction count and execution time +1. **Change Tracking**: Generates diffs for applied changes ## Sorting Transactions Before execution, transactions are sorted based on (in this priority): 1. Position in the file (higher byte positions first) -2. Transaction type (following the priority order) -3. User-defined priority -4. Creation order +1. Transaction type (following the priority order) +1. User-defined priority +1. Creation order This sorting ensures that transactions are applied in a deterministic order that minimizes conflicts. Larger byte ranges are always edited first, removals happen before insertions, and older transactions are applied before newer ones. @@ -60,8 +62,8 @@ This sorting ensures that transactions are applied in a deterministic order that The manager identifies several types of conflicts: 1. **Overlapping Transactions**: Multiple transactions affecting the same byte range -2. **Contained Transactions**: One transaction completely contained within another -3. **Adjacent Transactions**: Transactions affecting adjacent byte ranges +1. **Contained Transactions**: One transaction completely contained within another +1. **Adjacent Transactions**: Transactions affecting adjacent byte ranges In it's current implementation, TransactionManager only handles Contained Transactions that are trivially sovable. (If a remove transaction completely overlaps with another remove transaction, only the larger one will be kept) @@ -70,7 +72,7 @@ In it's current implementation, TransactionManager only handles Contained Transa The Transaction Manager enforces two types of limits: 1. **Transaction Count**: Optional maximum number of transactions -2. **Execution Time**: Optional time limit for transaction processing +1. **Execution Time**: Optional time limit for transaction processing These limits prevent excessive resource usage and allow for early termination of long-running operations. @@ -79,10 +81,10 @@ These limits prevent excessive resource usage and allow for early termination of The commit process applies queued transactions to the codebase: 1. Transactions are sorted according to priority rules -2. Files are processed one by one -3. For each file, transactions are executed in order -4. Diffs are collected for each modified file -5. The queue is cleared after successful commit +1. Files are processed one by one +1. For each file, transactions are executed in order +1. Diffs are collected for each modified file +1. The queue is cleared after successful commit The diff's are later used during resyc to efficiently update the codebase graph as changes occur. See [Incremental Computation](../6.%20incremental-computation/A.%20Overview.md) for more details. diff --git a/architecture/external/dependency-manager.md b/architecture/external/dependency-manager.md index e2dfa270b..ed8e42a3d 100644 --- a/architecture/external/dependency-manager.md +++ b/architecture/external/dependency-manager.md @@ -50,6 +50,7 @@ repo/ ``` Next, Dependency Manager iterates through all the `package.json` files and creates a "clean" version of each file. This "clean" version only includes a subset of information from the original, including: + - Name - Version - Package Manager Details