Allow verifying statements according to the new problem format #289

ElliotRipa · 2025-03-18T21:02:44Z

Overview

verifyproblem.py now checks the appropriate location for the problem statements based on the version of problem format.
problem2html.py and problem2pdf.py now both take a flag "-F" which specifies the format used, which defaults to legacy. This flag is added because both files take the problem directory as an argument and therefore need to know where to expect the statements.
Also allows the constructor for template.py to specify a version for the same reasons.
The cls templates now also allow for using either version.

simonlindholm · 2025-03-18T21:06:45Z

problem2html.py and problem2pdf.py now both take a flag "-F" which specifies the format used, which defaults to legacy. This flag is added because both files take the problem directory as an argument and therefore need to know where to expect the statements.

Can we read problem.yaml for this? As a problemtools user, having to pass -F every time will be very annoying.

niemela · 2025-03-18T22:57:12Z

Can we read problem.yaml for this? As a problemtools user, having to pass -F every time will be very annoying.

Agreed. If problem_format_version is not in problem.yaml, then the version is legacy otherwise it is what is given, so this is always well defined. It's fine to provide -F as a way to override the format version though (I.e. validate this as if it was this version instead).

Matistjati · 2025-03-20T17:42:37Z

I think it would be nice to change DIR_END to something more clear such as STATEMENT_DIR. Additionally, I dislike having a copy of detect_problem_version in template.py. It would probably be good for the future to extract detect_problem_version so it can be used by multiple files.

Also, it would be probably be wise to use constants instead of hard-coding strings such as "2023-07". Perhaps create a file version.py that has these constants, the detect_problem_version function and more? I'm not too familiar with the new code, opinions @Zazmuz @square-cylinder ?

Finally, I think version "default" is strange. It would probably be better to explicitly set a default problem version that exists, i.e., legacy.

Zazmuz · 2025-03-20T20:31:28Z

@Matistjati I think your input is very valid and think it makes a lot of sense. I am a little unsure what DIR_END stands for or does but STATEMENT_DIR makes more sense.

Having a copy of detect_problem_version should be avoided, either by splitting out the functionality to the config.py since it could be argued to be 'config esq' or as you suggested with a version.py. Personally I'm leaning towards version.py.
When splitting it out I think a good idea would be to have a mapping from version name eg. "2023-07" "legacy" to its corresponding statement directory as a dict in version.py or config.py whichever is decided, is DIR_END still needed then?

Should there be one latex template per format maybe? Instead of checking for the old folder then the new. It feels like it could build up to a mess over time.

Maybe also throwing a VersionError instead of a generic RuntimeError would be a good idea.

Zazmuz · 2025-03-20T20:39:02Z

It could also be argued that when splitting it into version.py that the EXTENSIONS = ['tex'] and EXTENSIONS = ['md', 'tex'] should be moved out and have a similar mapping as I proposed in the last comment therefore allowing problem_statement to be fully generic between the two formats.

ElliotRipa · 2025-03-21T16:11:01Z

Alright, this feedback seems reasonable, thanks. I'll try to implement these changes

ElliotRipa · 2025-03-23T22:35:38Z

The code specific to certain format versions has now been moved to a separate file, and the feedback provided here has been implemented. Also, the format-version flag was changed to -v to stay consistent with similar functions in verifyproblem.py and made optional to use automatic detection

gkreitz · 2025-03-24T09:56:37Z

problemtools/verifyproblem.py

        if not self.statements:
-            allowed_statements = ', '.join(f'problem.{ext}, problem.[a-z][a-z].{ext}' for ext in self.EXTENSIONS)
-            self.error(f'No problem statements found (expected file of one of following forms in folder problem_statement/: {allowed_statements}')
+            allowed_statements = ', '.join(f'problem.{ext}, problem.[a-z][a-z].{ext}' for ext in STATEMENT_DATA.get_statement_extensions)


You probably meant to call STATEMENT_DATA.get_statement_extensions here.

Yes, you're right. Thank you

Zazmuz · 2025-03-24T13:34:50Z

There were some good changes here! However:

The flag for problem2html is still -F
I am not a big fan of the abstract class that you use for formatversion.py. I still stand by that you should save the data in dictionaries to allow for easy generic behavior between different versions not having random if's in a function to instantiate a class that stores next to no data. Instead having the problem format names easily accessible in that module and using the name as a key too find the other information. Now you have basically just moved a class from verifyproblem.py and changed its name and made it inherent from an abstract. This is not necessarily wrong but its excessive and is more error prone.
The STATEMENT_EXTENSIONS and STATEMENT_DIRECTORY being looked up be the easily accessible name. What is your opinion @square-cylinder, not sure how much @Matistjati looks at problemtools but usually has good feedback.
The change to 'automatic' is good
Still think we should consider having two separate latex templates, what is everyone's opinion?

Overall good improvements.

ElliotRipa · 2025-03-24T17:42:27Z

The flag has been changed to be -v in both places now. Also the data objects in formatversion.py have been replaced with a dictionary instead.

Matistjati · 2025-03-24T18:45:44Z

I don't see any constants? Concretely, my suggestion is to expose
VERSION_LEGACY = "legacy"
and
VERSION_2023_07 = "2023_07"

This way, you can use
from problem_version import VERSION_LEGACY
in other files. That way, there is no risk of misspelling problem versions, as it will lead to a "compile error". I suggested this earlier but don't see it in the PR, are you against it?

Also, didn't we agree to not touch problemset_0.1.cls? Since any packages using that one are likely invalid in the 2023-07 format.

Zazmuz · 2025-03-24T19:04:45Z

I don't fully see the point of get_format_data, when you could just index the dictionary.

As I see it, you would use detect_problem_version to get a name for the format. After which you can use the same string with the same functionality to index the dictionary and it will just work. The functions seem unnecessary. Also having "name": "legacy" seems weird, since you use that value as a key to get the dictionary in the first place.

More importantly, I think the code is suffering a little from all the changes and should be polished and cleaned up.
For example, self.FORMAT_DATA = formatversion.get_format_data(self.problem.probdir) should not be in the init, that is what setup() is for. @square-cylinder pointed out that your code should just crash since you check if FORMAT_DATA is set but you set it after checking.

ElliotRipa · 2025-03-24T21:32:00Z

I don't fully see the point of get_format_data, when you could just index the dictionary.

As I see it, you would use detect_problem_version to get a name for the format. After which you can use the same string with the same functionality to index the dictionary and it will just work. The functions seem unnecessary. Also having "name": "legacy" seems weird, since you use that value as a key to get the dictionary in the first place.

The reason for get_format_data was that, since users will rarely if ever need to handle data specific to multiple versions at once, having a method that simply gives the information relevant to the current format could be desired. This would further mean that the details of implementation aren't as necessary as you can simply make use of the parts of the dictionary currently relevant.

Having "name": "legacy" was done mostly in the case that the dictionary in some later case might be passed to a function. Then this could be used to get the version it applies to. However this was mostly a minor afterthought and not one I feel very strongly about.

As for the second half of the comment, it seemed to work fine for me without crashing, but I see no real reason to keep it in __init__ so I'll move it and check for anything similar in the code.

ElliotRipa · 2025-03-24T21:35:58Z

I don't see any constants? Concretely, my suggestion is to expose VERSION_LEGACY = "legacy" and VERSION_2023_07 = "2023_07"

This way, you can use from problem_version import VERSION_LEGACY in other files. That way, there is no risk of misspelling problem versions, as it will lead to a "compile error". I suggested this earlier but don't see it in the PR, are you against it?

I can add those as well. The main reason for not having that as a priority currently was since it likely won't be used in the code for the statements, nor specific to this PR in any greater extent than this provides some extra overhead for the format versioning itself.

Also, didn't we agree to not touch problemset_0.1.cls? Since any packages using that one are likely invalid in the 2023-07 format.

Yes, I can revert that back to the previous version

Matistjati · 2025-03-25T15:58:09Z

Nice! Given my (comparatively limited) knowledge, I think we're ready to merge. @Zazmuz @square-cylinder opinions?

Matistjati · 2025-03-25T15:59:16Z

Also @ElliotRipa, for the future, please rebase instead of merging. Currently, ~40% of the commits in this PR are merges, and it makes it harder to follow.

gkreitz

Overall, this looks good. I added some minor comments, somewhat on the picky side.

This will need to be rebased, as there are merge conflicts. When rebasing, please also consider squishing a bit to get a cleaner commit history for this PR.

gkreitz · 2025-03-25T18:03:46Z

problemtools/formatversion.py

+The data specific to any given format version.
+"""
+FORMAT_DATA = {
+    "legacy": {


You probably want to use the VERSION_* constants in this dict rather than copy-pasting the values.

gkreitz · 2025-03-25T18:06:28Z

problemtools/formatversion.py

+"""
+Returns a dictionary containing the necessary data for a file format.
+"""
+def get_format_data(path):


Can't this be implemented as return get_format_data_by_name(detect_problem_version(path)) to avoid code duplication?

gkreitz · 2025-03-25T18:09:26Z

problemtools/formatversion.py

+
+
+"""
+Returns a dictionary containing the necessary data for a file format.


These functions should document what keys the dict has. "The necessary data" is pretty vague. :)

(The best/easiest way to document what keys to expect would be to return a dataclass rather than a dict.)

gkreitz · 2025-03-25T18:12:15Z

problemtools/formatversion.py

+VERSION_2023_07 = "2023-07"
+
+
+"""


AFAIK, docstrings should be the first thing in the function, not before?

gkreitz · 2025-03-25T18:14:44Z

problemtools/verifyproblem.py

    PART_NAME = 'statement'
-
-    EXTENSIONS = []
+    FORMAT_DATA = None


Why is this in all caps when it seems to be an instance variable?

Also, why are you initializing it here?

gkreitz · 2025-03-25T18:22:51Z

problemtools/template.py

+        else:
+            version_data = formatversion.get_format_data_by_name(version)
+
+        stmtdir = os.path.join(problemdir, version_data.get('statement_directory'))


Is there a reason you index the version data dict with get? I think the vast majority of the code base indexes dicts using brackets. (This comment applies throughout the PR in a lot of places).

ElliotRipa · 2025-03-28T17:12:15Z

All right, the changes have been implemented now. I wound up changing to a dataclass instead of a dict, and I also rebased to get a cleaner commit history

gkreitz reviewed Mar 24, 2025

View reviewed changes

gkreitz reviewed Mar 25, 2025

View reviewed changes

ElliotRipa added 13 commits March 28, 2025 17:22

Make cls templates able work with either problem format

122f0bc

Allow problem statement to use either problem format

4738fc1

Make template.py detect format version instead

ab5f2b5

Provisional updates

60c7504

Add formatversion.py

a7676d9

Minor fixes in imports

be21d6f

Move version specific functionality to separate file

9a9cc38

Change to flag '-v' for format-version

c531bc3

Add missing parentheses

d235ba0

Use dictionary instead of data objects for format data

4496918

Make problem2html.py use -v to specify format version

e71dbf9

Add constants for version names

b59ce65

Rollback problemset_0.1.cls

b805d01

ElliotRipa added 4 commits March 28, 2025 17:48

Move initialisation of FORMAT_DATA to setup

2aa9be0

Make formatversion.py use dataobjects instead of dicts

470fbb8

Fix documentation

cba93dc

Remove unnecessary initialisation

f964bc3

gkreitz approved these changes Mar 28, 2025

View reviewed changes

gkreitz merged commit b85906a into Kattis:develop Mar 28, 2025
3 checks passed



		"""
		Returns a dictionary containing the necessary data for a file format.

Allow verifying statements according to the new problem format #289

Allow verifying statements according to the new problem format #289

Uh oh!

Conversation

ElliotRipa commented Mar 18, 2025

Overview

Uh oh!

simonlindholm commented Mar 18, 2025

Uh oh!

niemela commented Mar 18, 2025

Uh oh!

Matistjati commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zazmuz commented Mar 20, 2025

Uh oh!

Zazmuz commented Mar 20, 2025

Uh oh!

ElliotRipa commented Mar 21, 2025

Uh oh!

ElliotRipa commented Mar 23, 2025

Uh oh!

gkreitz Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

ElliotRipa Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

Zazmuz commented Mar 24, 2025

Uh oh!

ElliotRipa commented Mar 24, 2025

Uh oh!

Matistjati commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zazmuz commented Mar 24, 2025

Uh oh!

ElliotRipa commented Mar 24, 2025

Uh oh!

ElliotRipa commented Mar 24, 2025

Uh oh!

Matistjati commented Mar 25, 2025

Uh oh!

Matistjati commented Mar 25, 2025

Uh oh!

gkreitz left a comment

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

gkreitz Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

ElliotRipa commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Matistjati commented Mar 20, 2025 •

edited

Loading

Matistjati commented Mar 24, 2025 •

edited

Loading

gkreitz Mar 25, 2025 •

edited

Loading