Fix ground truth for inheritance/MRO benchmarks (Liskov substitution) by jaltmayerpizzorno · Pull Request #14 · secure-software-engineering/TypeEvalPy

jaltmayerpizzorno · 2026-03-13T18:37:04Z

Hi! Thanks again for creating and maintaining TypeEvalPy — it's been an invaluable resource for our work evaluating type inference tools.

While running the benchmarks, we noticed that 5 inheritance/MRO ground truth annotations use only each method body's return type, without accounting for the Liskov substitution principle. When annotated as given, mypy --strict reports incompatible override errors on all of them. Widening the parent method return types to include the subclass override types resolves this and makes the annotations consistent with what a type-safe program requires.

Affected benchmarks

Benchmark	Function	Before	After
`classes/inheritance_overriding`	`MyClass.func`	`str`	`int\|str`
`mro/parents_same_superclass`	`A.func`	`str`	`int\|str`
`mro/self_assignment`	`B.func`	`int`	`int\|str`
`mro/two_parents`	`B.func`	`str`	`int\|str`
`mro/two_parents_method_defined`	`A.func`	`float`	`float\|str`
`mro/two_parents_method_defined`	`B.func`	`int`	`float\|int\|str`

We verified with `mypy --strict` that the original annotations produce override errors and the corrected ones pass cleanly.

Thanks for considering this!

…able definition;

…meration; - made test more interesting by substituting <value1> with more than just "int";

…ns-2

…able definition. Corresponds change a40d4db in the templates;

The previous ground truth annotated each method with only its body's return type, ignoring that subclass overrides must have compatible return types per the Liskov substitution principle. When annotated as given, mypy --strict reports override errors on every affected benchmark. The corrected annotations widen parent method return types to include subclass override types, making all benchmarks pass mypy. Affected benchmarks: - classes/inheritance_overriding: MyClass.func str -> int|str - mro/parents_same_superclass: A.func str -> int|str - mro/self_assignment: B.func int -> int|str - mro/two_parents: B.func str -> int|str - mro/two_parents_method_defined: A.func float -> float|str, B.func int -> float|int|str

jaltmayerpizzorno added 20 commits October 12, 2025 13:26

Merge branch '202510-righttyper-support'

934de26

Merge branch '202510-corrections'

f35ece8

Merge branch '202510-corrections'

0ce1258

- fixed coordinates;

35fa775

- fixed types not being parametrized;

e6da004

- fixed missing "function" context;

81f2ce8

- fixed coordinate;

09bdce5

- fixed incorrect function context;

03c842f

- fixed type not being parametrized;

4959d1c

- deleted entry from ground truth, as it describes a call, not a vari…

a40d4db

…able definition;

- fixed invalid dictionary keys in subscript expressions;

378745a

- fixed type, which wasn't parametrized;

3991208

- fixed index out of range error due to <value1> interfering with enu…

e010ba9

…meration; - made test more interesting by substituting <value1> with more than just "int";

Merge branch 'secure-software-engineering:main' into main

1c03981

Merge branch 'secure-software-engineering:main' into 202510-correctio…

348f225

…ns-2

Merge branch '202510-corrections-2'

f167d80

- deleted entry from ground truth, as it describes a call, not a vari…

00311c4

…able definition. Corresponds change a40d4db in the templates;

Merge branch '202510-corrections-2'

bdd2c78

Merge branch 'secure-software-engineering:main' into main

10ab41a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ground truth for inheritance/MRO benchmarks (Liskov substitution)#14

Fix ground truth for inheritance/MRO benchmarks (Liskov substitution)#14
jaltmayerpizzorno wants to merge 20 commits intosecure-software-engineering:mainfrom
plasma-umass:fix-inheritance-ground-truth

jaltmayerpizzorno commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jaltmayerpizzorno commented Mar 13, 2026

Affected benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant