spaghettree

Software complexity directly affects the maintainability of modern codebases. Most of the software lifecycle is spent maintaining production systems. High complexity leads to harder maintenance, slower feature delivery, and longer onboarding for new engineers.

What this tool does

This is a prototype tool for simplifying structural complexity of a codebase. It works by optimising the call-graph and is intended for integration as a CI/CD pipeline stage.

Why bother?

This tool hopes to:

Help manage and limit complexity growth during development.
Complements traditional linters and formatters by addressing architectural issues.
And also:
- reduce technical debt
- lower maintenance costs
- speed up engineer onboarding

Notes

As this is a prototype and not ready for production use, the defaults are set to just report the current structures directed weighted modularity and the current call tree for the repo as it stands.

Installation

pip install spaghettree

Or

uv add spaghettree

Example usage:

uv run -m spaghettree "path/to/src_code"

This will calculate the directed weighted modularity (DWM) of the codebase, and make up to 5 suggestions for improvements for the structure of the code.

The output looks like this

spaghettree.domain.entities.EntityCST -> spaghettree.domain.processing +0.035

This is saying, move the EntityCST class in spaghettree.domain.entities to spaghettree.domain.processing for an increase of DWM for 0.035.

There are experimental features that will automatically refactor the entire codebase (--optimise-src-code) using a divide and conquer algorithm, the behaviour is much more aggressive than the suggestions which aim to keep the developer in the loop before larger restructuring changes.

Lastly it will print a representation of the call tree to the terminal to allow for further analysis the user may want to do. Each entry in the list is a call the function in the key calls.

{
    "some_package.mod_a.A": [],
    "some_package.mod_a.func_a": [],
    "some_package.mod_a.func_b": [
        "some_package.mod_a.func_a",
        "some_package.mod_a.func_a",
    ],
    "some_package.mod_b.B": [
        "some_package.mod_b.CONSTANT",
    ],
    "some_package.mod_b.C": [
        "some_package.mod_a.A",
        "some_package.mod_b.B",
    ],
    "some_package.mod_b.CONSTANT": [],
}

Args

Argument	Type	Required	Default	Description
positional src_root	`str`	✅		Path to the root of the repository to scan
`--new-root`	`str`	❌	`''`	Optional new root path for output (default: empty, meaning same as src_root if optimisation is enabled).
`--call-tree-save-path`	`str`	❌	`'./call_tree.json'`	The location to save the generated call tree. Only used if `--optimise-src-code` isn't used. Defaults to `./call_tree.json`.

How it works

All py files in the given directory are read in as strings
Each of those strings are parsed into libcst CST objects
- This is so comments and other things are retained otherwise useful info would be lost
A list of locations of each of the entities (name, original module, line no) is collected and stored.
The CSTs are transformed into custom objects:
- ModuleCST
- ClassCST
- FuncCST
- GlobalCST
- ImportCST
A ClassCST can have 0-n FuncCST methods on it, and each FuncCST has a list of fully qualified calls that the function calls. = With these structures we can create a call-graph. e.g. ClassA.method_a -> some_func
To ensure any refactoring is possible, a call from a classes methods is counted as a call to that class (so you don't split classes into separate methods).
From the call-graph, the non-native calls are filtered out, that means that only entities defined in the repo are considered for moving.
An adjacency matrix is created from the call graph where the x and y axes are the entities and then the co-ordinates are counts of calls from x to y
Each of the entities is considered as a single module at first, so that means you could have a single constant in a file by itself.
Then each pair-wise combination is considered to be merged
- If the merge of the entities would result in a gain of the repo's directed weighted modularity then its added as a possible merge to consider.
All the possible merges are sorted by the largest gain it'd bring to the overall system, then each non-overlapping merge is applied
- e.g. merge [(mod_a, mod_b), (mod_c, mod_d)]
- merge for (mod_b, mod_c) is not considered as the mod_c and mod_d merge would result in a higher directed weighted modularity.
This is repeated until there are no more valid merges
Once this is done some extra modification is done, for example if you were writing a library of validators that didn't call eachother but all sat in the same module, then they are combined.
When writing the entities to their new files, the imports are updated, and the location of each of the entities are kept as close as they can be to where they were before.

# some_original_mod

T = TypeVar("T")

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item
    
SomeType = SomeClass | SomeOtherClass

This is to ensure that for an example like above the result is still valid, an initial idea was to always write globals, classes, funcs, but that would result in some_broken_mod

# some_broken_mod

T = TypeVar("T")
SomeType = SomeClass | SomeOtherClass # BROKEN as the classes aren't defined yet

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item

Lastly when the entities are all written to their new module location, ruff is called on the files to fix any formatting, because of how ruff is set up, it means it would respect the users own ruff.toml so would include or exclude rules they were interested in.

Repo map

├── .github
│   └── workflows
│       ├── ci_tests.yaml
│       └── publish.yaml
├── src
│   └── spaghettree
│       ├── adapters
│       │   ├── __init__.py
│       │   └── io_wrapper.py
│       ├── core
│       │   ├── __init__.py
│       │   └── logger.py
│       ├── domain
│       │   ├── __init__.py
│       │   ├── entities.py
│       │   ├── optimisation.py
│       │   ├── parsing.py
│       │   ├── processing.py
│       │   └── visitors.py
│       ├── __init__.py
│       └── __main__.py
├── tests
│   ├── adapters
│   │   ├── __init__.py
│   │   └── test_adapter_apis.py
│   ├── core
│   │   └── __init__.py
│   ├── domain
│   │   ├── __init__.py
│   │   ├── test_entities.py
│   │   ├── test_optimisation.py
│   │   └── test_processing.py
│   ├── __init__.py
│   ├── conftest.py
│   └── test_main.py
├── .pre-commit-config.yaml
├── README.md
├── pyproject.toml
├── ruff.toml
└── uv.lock
::

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

spaghettree

What this tool does

Why bother?

Notes

Installation

Example usage:

Args

How it works

Repo map

About

Uh oh!

Releases 5

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github/workflows		.github/workflows
mock_data		mock_data
src/spaghettree		src/spaghettree
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
call_tree.json		call_tree.json
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

second-ed/spaghettree

Folders and files

Latest commit

History

Repository files navigation

spaghettree

What this tool does

Why bother?

Notes

Installation

Example usage:

Args

How it works

Repo map

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 2

Uh oh!

Languages

Packages