Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 69 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,27 @@

# 🐞 EthDebug.py

> A Python library of debugging primitives for Ethereum developer tools — breakpoint debuggers, testing frameworks, and static analyzers.

[![Discord](https://img.shields.io/badge/discord-join-7289da)](https://discord.gg/CurfmXNtbN)
[![License](https://img.shields.io/badge/license-BSD--3-orange)](LICENSE)

</div>

EthDebug.py is a library offering debugging primitives that are commonly used by developer tools, such as breakpoint-style debuggers, testing frameworks, or static analyzers/linters. Notably, it includes a complete debugger-side implementation of the EthDebug format. The main function is reading Solidity runtime information (like local variables) from a running Ethereum Virtual Machine.
EthDebug.py includes a complete debugger-side implementation of the [EthDebug format](https://ethdebug.github.io/format/). Its core capability is reading Solidity runtime information (such as local variables) from a running Ethereum Virtual Machine.

**What you can do with EthDebug.py:**

Things you can do with EthDebug.py:
- Read the value of a Solidity variable at a paused machine state
- See which variables are in scope at a specific source location
- Provide better log and error messages by replacing unreadable EVM details with their human-readable Solidity counterparts
- Work in progress: Add ethdebug data to existing compilation units.
- Replace unreadable EVM details with their human-readable Solidity counterparts in logs and error messages
- Work in Progress: Generate ethdebug data for existing compilation units

---

This library is agnostic of any specific virtual machine implementation and compiler. The following diagram shows the relationship between the different components:
## Architecture

This library is agnostic of any specific virtual machine implementation and compiler. The diagram below shows how the components relate:

```mermaid
flowchart TD
Expand All @@ -39,55 +46,81 @@ flowchart TD
Dereference-->View-->Debugger
```

---

## Goals and Non-Goals

- Improve ecosystem-wide developer experience by providing a rich set of debugging primitives
- Provide feedback on the specification and implementation of the EthDebug format
- Assist compilers when implementing the counterpart of the EthDebug protocol
- It is explicitly beyond the scope of this project to develop a fully-featured stand-alone debugger. For a debugger that uses this library, see [Simbolik](https://simbolik.runtimeverification.com/). In fact, this library used to be a part of Simbolik but has since been extracted into its own project.
| | |
|---|---|
| ✅ | Improve ecosystem-wide developer experience by providing a rich set of debugging primitives |
| ✅ | Provide feedback on the specification and implementation of the EthDebug format |
| ✅ | Assist compilers when implementing the counterpart of the EthDebug protocol |
| ❌ | Develop a fully-featured stand-alone debugger — see [Simbolik](https://simbolik.runtimeverification.com/) for that (this library was originally extracted from Simbolik) |

---

## API Docs

The [Project Structure](#project-structure) section provides a high-level overview of the provided modules. Inside each module, you'll find extensive pydoc comments detailing how the module is meant to be used.
The [Project Structure](#project-structure) section provides a high-level overview of the provided modules. Inside each module you'll find extensive pydoc comments detailing how it is meant to be used.

For examples of how to use the library for a specific task, the tests generally offer a good starting point.
For concrete usage examples, the tests are a good starting point.

### Project Structure

- `src/ethdebug/format` \
This module contains parsers and generators for all EthDebug schemas. The module structure closely follows the sub-schema hierarchy. These models are auto-generated directly from the spec and kept up to date as the spec evolves.
- `src/ethdebug/evaluate.py` \
This module contains data structures and algorithms for evaluating pointers in the context of a paused machine state. Notice that "evaluating" here is not the same as "dereferencing."
- `src/ethdebug/dereference` \
This module offers a complete pointer dereferencing algorithm. This algorithm is a rewrite of the TypeScript reference implementation in Python. It has support for all pointer regions, collections, expressions, and templates.
- `src/ethdebug/cursor.py` \
This module defines the result of dereferencing a pointer.
- `src/ethdebug/data.py` \
The data module defines low-level primitives to convert between different data representations, such as converting between raw bytes and unsigned integers.
- `src/ethdebug/machine.py` \
This module defines abstract protocols `Machine`, `MachineTrace`, and `MachineState`. EthDebug.py aims to be agnostic of any specific EVM implementation. Users of the library must implement these protocols themselves.
- `tests` contains all sorts of automated tests. Some tests are ported from the reference implementation to ensure consistency. Other tests are specifically developed to test the integration with the Solidity compiler.
| Module | Description |
|---|---|
| `src/ethdebug/format` | Parsers and generators for all EthDebug schemas. Structure mirrors the sub-schema hierarchy. Auto-generated from the spec and kept in sync as it evolves. |
| `src/ethdebug/evaluate.py` | Data structures and algorithms for evaluating pointers in the context of a paused machine state. Note: "evaluating" is distinct from "dereferencing." |
| `src/ethdebug/dereference` | Complete pointer dereferencing algorithm. A Python rewrite of the TypeScript reference implementation, with support for all pointer regions, collections, expressions, and templates. |
| `src/ethdebug/cursor.py` | Defines the result of dereferencing a pointer. |
| `src/ethdebug/data.py` | Low-level primitives for converting between data representations (e.g. raw bytes ↔ unsigned integers). |
| `src/ethdebug/machine.py` | Abstract protocols `Machine`, `MachineTrace`, and `MachineState`. Users of the library implement these to integrate their own EVM. |
| `tests/` | Automated tests. Some are ported from the reference implementation to ensure consistency; others test integration with the Solidity compiler. |

---

## EthDebug Annotation Tool

> **Work in Progress:** The annotator is under active development and not yet feature-complete. Expect gaps in coverage, breaking changes, and incomplete output.

The solc compiler does not yet generate EthDebug data, but the annotation tool can be used to add it to existing solc output. This is useful for testing and prototyping while compiler support is in progress.
It can also be used as backwards-compatibility layer for tools that want to support EthDebug but rely on solc output.

```bash
# Compile
solc --standard-json < input.json > output.json

# Annotate
python -m annotator output.json -o annotated.json

# Pipeline
solc --standard-json < input.json | python -m annotator > annotated.json
```

Run `python -m annotator --help` for full CLI options.

---

## For Contributors and Maintainers

### Regenerating the Validators

The data models used for parsing and validating the EthDebug format are generated from the JSON schema using the `generate_model.py` script. The files should be regenerated when the JSON schema files change or when the `datamodel-code-generator` library is updated.

~~~bash
uv run python ./generate_model.py
~~~
The data models for parsing and validating the EthDebug format are generated from the JSON schema using `generate_model.py`. Regenerate them whenever the JSON schema files change or `datamodel-code-generator` is updated:

The `datamodel-code-generator` library we use to generate the validators has some custom changes to make it work with the EthDebug JSON schema files. The library is therefore embedded as a subtree in the `datamodel-code-generator` directory. To update the library, you can run the following command:
```bash
uv run python ./generate_model.py
```

~~~bash
git subtree pull --prefix=datamodel-code-generator git@github.com:koxudaxi/datamodel-code-generator.git main --squash
~~~
> **Note:** The `datamodel-code-generator` library has custom patches to work with the EthDebug JSON schema files and is embedded as a subtree in the `datamodel-code-generator/` directory. To update it:
>
> ```bash
> git subtree pull --prefix=datamodel-code-generator git@github.com:koxudaxi/datamodel-code-generator.git main --squash
> ```

### Using solc to Generate Standard JSON Output Files
### Generating Test Fixtures with `solc`

~~~bash
```bash
pushd tests && solc --standard-json mega_playground/input.json > mega_playground/output.json && popd
pushd tests && solc --standard-json abstract_and_interface/input.json --pretty-json > abstract_and_interface/output.json && popd
pushd tests && solc --standard-json standard_yul_debug_info_ethdebug_compatible_output/input.json > standard_yul_debug_info_ethdebug_compatible_output/output.json --allow-paths . && popd
~~~
```
3 changes: 3 additions & 0 deletions src/annotator/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .annotate import annotate, check_optimizer_disabled

__all__ = ["annotate", "check_optimizer_disabled"]
91 changes: 91 additions & 0 deletions src/annotator/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
"""
__main__.py

CLI entry point: python -m annotator
"""

from __future__ import annotations

import argparse
import json
import sys
from typing import Optional

from .annotate import annotate, check_optimizer_disabled


def main() -> None:
parser = argparse.ArgumentParser(
description=(
"Annotate solc standard JSON output with ethdebug format data.\n"
"\n"
"The compilation must have been performed with the optimizer disabled.\n"
"For full annotation, request the following output fields:\n"
" ast, metadata, storageLayout,\n"
" evm.bytecode.object, evm.bytecode.sourceMap,\n"
" evm.deployedBytecode.object, evm.deployedBytecode.sourceMap"
),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"output_file",
nargs="?",
default="-",
help="Path to solc standard JSON output file, or '-' for stdin (default: stdin)",
)
parser.add_argument(
"-o",
"--output",
help="Annotated output file path (default: stdout)",
)
parser.add_argument(
"-i",
"--input-json",
dest="input_json",
help=(
"Path to the original solc standard JSON *input* file. "
"Used to include source file contents in the ethdebug Info object."
),
)
parser.add_argument(
"-s",
"--sources-dir",
dest="sources_dirs",
action="append",
default=[],
metavar="DIR",
help=(
"Directory to search for source files (can be repeated). "
"Used to include source file contents when --input-json is not provided."
),
)
args = parser.parse_args()

if args.output_file == "-":
solc_output = json.load(sys.stdin)
else:
with open(args.output_file) as f:
solc_output = json.load(f)

input_json: Optional[dict] = None
if args.input_json:
with open(args.input_json) as f:
input_json = json.load(f)

check_optimizer_disabled(solc_output)
annotated = annotate(
solc_output,
source_dirs=args.sources_dirs or None,
input_json=input_json,
)

text = json.dumps(annotated, indent=2)
if args.output:
with open(args.output, "w") as f:
f.write(text)
else:
print(text)


if __name__ == "__main__":
main()
Loading
Loading