Skip to content

Conversation

@ElliotRipa
Copy link
Contributor

Overview

  • verifyproblem.py now checks the appropriate location for the problem statements based on the version of problem format.
  • problem2html.py and problem2pdf.py now both take a flag "-F" which specifies the format used, which defaults to legacy. This flag is added because both files take the problem directory as an argument and therefore need to know where to expect the statements.
  • Also allows the constructor for template.py to specify a version for the same reasons.
  • The cls templates now also allow for using either version.

@simonlindholm
Copy link
Member

problem2html.py and problem2pdf.py now both take a flag "-F" which specifies the format used, which defaults to legacy. This flag is added because both files take the problem directory as an argument and therefore need to know where to expect the statements.

Can we read problem.yaml for this? As a problemtools user, having to pass -F every time will be very annoying.

@niemela
Copy link
Member

niemela commented Mar 18, 2025

Can we read problem.yaml for this? As a problemtools user, having to pass -F every time will be very annoying.

Agreed. If problem_format_version is not in problem.yaml, then the version is legacy otherwise it is what is given, so this is always well defined. It's fine to provide -F as a way to override the format version though (I.e. validate this as if it was this version instead).

@Matistjati
Copy link
Contributor

Matistjati commented Mar 20, 2025

I think it would be nice to change DIR_END to something more clear such as STATEMENT_DIR. Additionally, I dislike having a copy of detect_problem_version in template.py. It would probably be good for the future to extract detect_problem_version so it can be used by multiple files.

Also, it would be probably be wise to use constants instead of hard-coding strings such as "2023-07". Perhaps create a file version.py that has these constants, the detect_problem_version function and more? I'm not too familiar with the new code, opinions @Zazmuz @square-cylinder ?

Finally, I think version "default" is strange. It would probably be better to explicitly set a default problem version that exists, i.e., legacy.

@Zazmuz
Copy link
Contributor

Zazmuz commented Mar 20, 2025

@Matistjati I think your input is very valid and think it makes a lot of sense. I am a little unsure what DIR_END stands for or does but STATEMENT_DIR makes more sense.

Having a copy of detect_problem_version should be avoided, either by splitting out the functionality to the config.py since it could be argued to be 'config esq' or as you suggested with a version.py. Personally I'm leaning towards version.py.
When splitting it out I think a good idea would be to have a mapping from version name eg. "2023-07" "legacy" to its corresponding statement directory as a dict in version.py or config.py whichever is decided, is DIR_END still needed then?

Should there be one latex template per format maybe? Instead of checking for the old folder then the new. It feels like it could build up to a mess over time.

Maybe also throwing a VersionError instead of a generic RuntimeError would be a good idea.

@Zazmuz
Copy link
Contributor

Zazmuz commented Mar 20, 2025

It could also be argued that when splitting it into version.py that the EXTENSIONS = ['tex'] and EXTENSIONS = ['md', 'tex'] should be moved out and have a similar mapping as I proposed in the last comment therefore allowing problem_statement to be fully generic between the two formats.

@ElliotRipa
Copy link
Contributor Author

Alright, this feedback seems reasonable, thanks. I'll try to implement these changes

@ElliotRipa
Copy link
Contributor Author

The code specific to certain format versions has now been moved to a separate file, and the feedback provided here has been implemented. Also, the format-version flag was changed to -v to stay consistent with similar functions in verifyproblem.py and made optional to use automatic detection

if not self.statements:
allowed_statements = ', '.join(f'problem.{ext}, problem.[a-z][a-z].{ext}' for ext in self.EXTENSIONS)
self.error(f'No problem statements found (expected file of one of following forms in folder problem_statement/: {allowed_statements}')
allowed_statements = ', '.join(f'problem.{ext}, problem.[a-z][a-z].{ext}' for ext in STATEMENT_DATA.get_statement_extensions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably meant to call STATEMENT_DATA.get_statement_extensions here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. Thank you

@Zazmuz
Copy link
Contributor

Zazmuz commented Mar 24, 2025

There were some good changes here! However:

  • The flag for problem2html is still -F
  • I am not a big fan of the abstract class that you use for formatversion.py. I still stand by that you should save the data in dictionaries to allow for easy generic behavior between different versions not having random if's in a function to instantiate a class that stores next to no data. Instead having the problem format names easily accessible in that module and using the name as a key too find the other information. Now you have basically just moved a class from verifyproblem.py and changed its name and made it inherent from an abstract. This is not necessarily wrong but its excessive and is more error prone.
    The STATEMENT_EXTENSIONS and STATEMENT_DIRECTORY being looked up be the easily accessible name. What is your opinion @square-cylinder, not sure how much @Matistjati looks at problemtools but usually has good feedback.
  • The change to 'automatic' is good
  • Still think we should consider having two separate latex templates, what is everyone's opinion?

Overall good improvements.

@ElliotRipa
Copy link
Contributor Author

The flag has been changed to be -v in both places now. Also the data objects in formatversion.py have been replaced with a dictionary instead.

@Matistjati
Copy link
Contributor

Matistjati commented Mar 24, 2025

I don't see any constants? Concretely, my suggestion is to expose
VERSION_LEGACY = "legacy"
and
VERSION_2023_07 = "2023_07"

This way, you can use
from problem_version import VERSION_LEGACY
in other files. That way, there is no risk of misspelling problem versions, as it will lead to a "compile error". I suggested this earlier but don't see it in the PR, are you against it?

Also, didn't we agree to not touch problemset_0.1.cls? Since any packages using that one are likely invalid in the 2023-07 format.

@Zazmuz
Copy link
Contributor

Zazmuz commented Mar 24, 2025

I don't fully see the point of get_format_data, when you could just index the dictionary.

As I see it, you would use detect_problem_version to get a name for the format. After which you can use the same string with the same functionality to index the dictionary and it will just work. The functions seem unnecessary. Also having "name": "legacy" seems weird, since you use that value as a key to get the dictionary in the first place.

More importantly, I think the code is suffering a little from all the changes and should be polished and cleaned up.
For example, self.FORMAT_DATA = formatversion.get_format_data(self.problem.probdir) should not be in the init, that is what setup() is for. @square-cylinder pointed out that your code should just crash since you check if FORMAT_DATA is set but you set it after checking.

@ElliotRipa
Copy link
Contributor Author

I don't fully see the point of get_format_data, when you could just index the dictionary.

As I see it, you would use detect_problem_version to get a name for the format. After which you can use the same string with the same functionality to index the dictionary and it will just work. The functions seem unnecessary. Also having "name": "legacy" seems weird, since you use that value as a key to get the dictionary in the first place.

The reason for get_format_data was that, since users will rarely if ever need to handle data specific to multiple versions at once, having a method that simply gives the information relevant to the current format could be desired. This would further mean that the details of implementation aren't as necessary as you can simply make use of the parts of the dictionary currently relevant.

Having "name": "legacy" was done mostly in the case that the dictionary in some later case might be passed to a function. Then this could be used to get the version it applies to. However this was mostly a minor afterthought and not one I feel very strongly about.

As for the second half of the comment, it seemed to work fine for me without crashing, but I see no real reason to keep it in __init__ so I'll move it and check for anything similar in the code.

@ElliotRipa
Copy link
Contributor Author

I don't see any constants? Concretely, my suggestion is to expose VERSION_LEGACY = "legacy" and VERSION_2023_07 = "2023_07"

This way, you can use from problem_version import VERSION_LEGACY in other files. That way, there is no risk of misspelling problem versions, as it will lead to a "compile error". I suggested this earlier but don't see it in the PR, are you against it?

I can add those as well. The main reason for not having that as a priority currently was since it likely won't be used in the code for the statements, nor specific to this PR in any greater extent than this provides some extra overhead for the format versioning itself.

Also, didn't we agree to not touch problemset_0.1.cls? Since any packages using that one are likely invalid in the 2023-07 format.

Yes, I can revert that back to the previous version

@Matistjati
Copy link
Contributor

Nice! Given my (comparatively limited) knowledge, I think we're ready to merge. @Zazmuz @square-cylinder opinions?

@Matistjati
Copy link
Contributor

Also @ElliotRipa, for the future, please rebase instead of merging. Currently, ~40% of the commits in this PR are merges, and it makes it harder to follow.

Copy link
Contributor

@gkreitz gkreitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I added some minor comments, somewhat on the picky side.

This will need to be rebased, as there are merge conflicts. When rebasing, please also consider squishing a bit to get a cleaner commit history for this PR.

The data specific to any given format version.
"""
FORMAT_DATA = {
"legacy": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to use the VERSION_* constants in this dict rather than copy-pasting the values.

"""
Returns a dictionary containing the necessary data for a file format.
"""
def get_format_data(path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be implemented as return get_format_data_by_name(detect_problem_version(path)) to avoid code duplication?



"""
Returns a dictionary containing the necessary data for a file format.
Copy link
Contributor

@gkreitz gkreitz Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions should document what keys the dict has. "The necessary data" is pretty vague. :)

(The best/easiest way to document what keys to expect would be to return a dataclass rather than a dict.)

VERSION_2023_07 = "2023-07"


"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, docstrings should be the first thing in the function, not before?

PART_NAME = 'statement'

EXTENSIONS = []
FORMAT_DATA = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this in all caps when it seems to be an instance variable?

Also, why are you initializing it here?

else:
version_data = formatversion.get_format_data_by_name(version)

stmtdir = os.path.join(problemdir, version_data.get('statement_directory'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you index the version data dict with get? I think the vast majority of the code base indexes dicts using brackets. (This comment applies throughout the PR in a lot of places).

@ElliotRipa
Copy link
Contributor Author

All right, the changes have been implemented now. I wound up changing to a dataclass instead of a dict, and I also rebased to get a cleaner commit history

@gkreitz gkreitz merged commit b85906a into Kattis:develop Mar 28, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants