Skip to content

Enforce strict mypy type checking and additional ruff rules#492

Open
FabianHofmann wants to merge 24 commits intomasterfrom
refac/type-annotation
Open

Enforce strict mypy type checking and additional ruff rules#492
FabianHofmann wants to merge 24 commits intomasterfrom
refac/type-annotation

Conversation

@FabianHofmann
Copy link
Copy Markdown
Contributor

@FabianHofmann FabianHofmann commented Mar 16, 2026

Changes proposed in this Pull Request

This PR enforces strict static type checking with mypy and enables additional ruff lint rules across the entire codebase (including test/ and doc/).

Docstrings

  • Add docstrings to all functions

Mypy

  • Configured strict mypy settings in pyproject.toml: disallow_incomplete_defs = true, check_untyped_defs = true
  • Removed all directory exclusions from mypy — it now checks atlite/, test/, and doc/
  • Updated .pre-commit-config.yaml to match
  • Leveraged existing atlite/_types.py type aliases and added TypeAlias annotations where needed (e.g. csp.py, utils.py)
  • Added missing type annotations to function signatures (e.g. conversion functions in convert.py)
  • Fixed __all__ in __init__.py to use strings instead of object references

Ruff

Enabled 9 new rule sets and fixed all violations (~260 fixes):

  • B (flake8-bugbear): Added from err/from None to re-raised exceptions, fixed mutable defaults
  • SIM (flake8-simplify): Collapsed nested if statements, simplified dict operations
  • RET (flake8-return): Removed superfluous else after return/raise
  • C4 (flake8-comprehensions): Simplified unnecessary dict()/list() constructors
  • TC (flake8-type-checking): Moved type-only imports into TYPE_CHECKING blocks
  • NPY (numpy): Replaced legacy np.random.seed/np.random.rand with np.random.default_rng
  • G (flake8-logging-format): Replaced f-strings in logging with lazy % formatting
  • PTH (flake8-use-pathlib): Replaced os.path.* and open() with pathlib equivalents
  • RUF100: Cleaned up unused # noqa comments

TODO

  • replace cast() where possible, need bump python version for mypy
  • add ruff check for docstring consistency

Checklist

  • Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • Unit tests for new features were added (if applicable).
  • Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • I consent to the release of this PR's code under the MIT license.

FabianHofmann and others added 22 commits March 13, 2026 14:43
…nd removing unnecessary type: ignore comments
…te_time helper, consistent keep_attrs, use fixtures in tests
…ypes across codebase

- Add NumPy-style docstrings to all functions in datasets/era5.py and improve convert.py docstrings
- Define shared Literal type aliases (TrackingType, ClearskyModel, TrigonModel, etc.) in _types.py
- Update ~30 function signatures across convert.py, pv/, and data.py to use Literal types
- Fix CSPConfig TypedDict missing 'technology' key and widen orientation/panel param types
Per-file ignores for datasets modules still lacking docstrings (cordex, ncep, sarah, gebco).
@lkstrp lkstrp self-requested a review April 1, 2026 13:20
Copy link
Copy Markdown
Member

@lkstrp lkstrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tons of good cleanup changes, thank youuu ! Just a bit too much and mixed which makes it a bit hard to review, since this now includes changed docstrings, ruff, types and tests.

I think some things need another look. I only went through parts for it now:

  • There are many comments removed, and sometimes it looks like Claude chucked them, but some might be removed intentionally? I am not always sure and I would wanna keep hand crafted code explaining comments in general (if useful and up to date). The same goes for docstrings sometimes. They are rewritten, often improved, but sometimes useful old information is removed
  • In _types.py, many types are defined which are then used everywhere. This makes sense for new literals and more complicated types, but some also just define an alias for existing types from the library. I suppose this is also a style question, but since I believe that simplicity is becoming increasingly important, I would try to simplify it. If an argument requires many different types, this is also an indication that the signature should be restructured. Abstracting all those types away just hides that

if TYPE_CHECKING:
from types import ModuleType

modules: dict[str, ModuleType] = {"era5": era5, "sarah": sarah, "gebco": gebco}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ModuleType needed here?

ds = ds.sel(x=xs, y=ys)

if newname in {"influx", "outflux"}:
# shift averaged data to beginning of bin
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the comments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't like comments which are describing the code

)
)
elif newname in {"runoff"}:
# shift and fill 6hr average data to beginning of 3hr bins
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the comments?

config.cordex_dir, # noqa: F821
weather_data_config: dict[str, dict[str, Any]] = {}
try:
from atlite import config # type: ignore[attr-defined]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should never be an import error is it?

"prepare_func": prepare_weather_types_cordex,
"oldname": "CWT",
"newname": "CWT",
"template": os.path.join( # noqa: PTH118
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather ignore PTH118 globally for now, so we can add it easily later again. Using pathlib instead of os is very recommended

)
elif clearsky_model == "enhanced":
# Enhanced Reindl model with ambient air temperature and relative
# humidity
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the comment?

influx = direct + diffuse

with np.errstate(divide="ignore", invalid="ignore"):
# brightening factor
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove comments?

NDArrayInt: TypeAlias = np.ndarray[Any, np.dtype[np.signedinteger[Any]]]
NDArrayBool: TypeAlias = np.ndarray[Any, np.dtype[np.bool_]]
DataArray: TypeAlias = xr.DataArray
Dataset: TypeAlias = xr.Dataset
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all those? Some are just aliases. I think it's much cleaner to just keep for example use xr.Dataset than redefining it?

SparseMatrix: TypeAlias = sp.lil_matrix | sp.csr_matrix

TrackingType: TypeAlias = (
Literal["horizontal", "tilted_horizontal", "vertical", "dual"] | None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to define those literals! But I would not put | None in there in general. That information if something is optional should be in the signature. And again, I am not so sure if this needs to be defined in an extra Type? How often is this actually used in the signature? If I wanna see this in the function I need to click once more now. Guess for some it makes sense, but you think for all of them?

DataFormat: TypeAlias = Literal["grib", "netcdf"]


class ERA5RetrievalParams(TypedDict, total=False):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it wise to hardcode that here? Since it's a dict it doesn't give you any benefits, is it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants