-
Notifications
You must be signed in to change notification settings - Fork 9
[Issue #412] Adding library for auto-generation and documentation of script #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
c9a3d26
196ad22
e382d09
097b592
1810346
7ea53de
9e5462e
68053b2
a3f4474
2568263
c47fe00
1bdda04
975237f
d1f444d
86091e4
405536d
79295ce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,3 +10,5 @@ __pycache__/ | |
| *$py.class | ||
| .pytest_cache/ | ||
|
|
||
| #Generated Schema objects | ||
| generated/ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,227 @@ | ||
| # Auto-generated Pydantic Schema | ||
|
|
||
| This is current home for auto generated schemas, this is a work in progress feature. The intent of this work is to move away from manual management of pydantic schemas and have it be handled by the scripts inside of this `generated` directory | ||
|
|
||
| These schemas should not be updated as re-generating the script will overwrite all updates done manually. | ||
|
|
||
| ## Additional Documentation | ||
| The plugin used to generate these is [here](https://koxudaxi.github.io/datamodel-code-generator/) | ||
|
|
||
|
|
||
|
|
||
| ## Usage | ||
|
|
||
| To generate pydantic schemas call this script [generate_models.sh](./scripts/generate_models.sh) with 2 inputs. | ||
| 1. The input directory for the JSONSchema yamls | ||
| 2. The desired output location for both the copied yamls and the generated pydantic schemas. | ||
|
|
||
| ``` | ||
| ./generate_models.sh ../../../../website/public/schemas/yaml . | ||
| ``` | ||
|
|
||
| The above command will put copies of the updated yaml files and the pydantic schemas into the given output directory under the `schemas` and `pydantic` folders respectively | ||
|
|
||
|
|
||
| ## Under the hood | ||
| The main entrypoint for creating the schemas is the ./generate_models.sh command shown above. A breakdown of that script and it's steps are as follows | ||
| - All yaml files from the given input directory are copied to the output directory | ||
| - The copied yaml files are then updated to add a `title` field to each file that has a value that matches the file name. This enables each pydantic schema file to have a model with a name that matches the file name. | ||
| - The previous two steps occur within the `rename_and_add_title.sh` script | ||
| - After the rename script runs we then call the datamodel-code-generator library to take each of the yaml files and process them into a `/pydantic` folder in the given output directory. | ||
|
|
||
| ## Issues to be Resolved | ||
| The following issues are from comparing the generated schemas against the existing hand built schemas. Note this is not an exhaustive list | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are possibly show stopper issues. The PR should not be merged until it can be proven that spec-compliant schemas can effectively be generated from code. For reference, issues like these were what caused me to abandon the effort to auto-generate marshmallow schemas several months ago. My experience was that tooling and automation could generate 99.x% of the schemas but the last handful of edge cases were intractable, and it turned into a huge time sink to solve the edge cases. YMMV but let's not commit new scaffolding and dependencies to the repo until it is proven to work.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where are the generated schemas? Do they work with the example API implementations? (e.g. The example API implementations are the main use case for CommonGrants pydantic schemas, therefore acceptance criteria for this or any auto-generation scaffolding or libraries should include validation against one or all of those use cases: generate the schemas, import them into example API, run
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The thinking was to commit this baseline to the repo in an isolated way that wouldn't impact the existing hand built schemas and their implementations. This would give us a foundation to build off of while not having the completion of this feature become an all consuming task at the expense of other planned work. The generated schemas are not committed to the repo since that would clog up the codebase with 103 additional files for something that isn't complete and could lead to confusion with the existing schemas. The idea is that for now anyone who wanted to view the schema could run the bash script and it would create the schemas in the generated directory that the readme exists in. |
||
|
|
||
| For more detailed information on the below issues see [this comment](https://github.com/HHS/simpler-grants-protocol/pull/452#discussion_r2683046281) for side-by-side comparison | ||
|
|
||
| <details> | ||
| <summary> Filters NumberRange Property </summary> | ||
|
|
||
| Manual | ||
| ```python | ||
| class NumberRange(CommonGrantsBaseModel): | ||
| """Represents a range between two numeric values.""" | ||
|
|
||
| min: Union[int, float] = Field(..., description="The minimum value in the range") | ||
| max: Union[int, float] = Field(..., description="The maximum value in the range") | ||
| ``` | ||
|
|
||
| Generated | ||
| ```python | ||
| class Value(CommonGrantsBaseModel): | ||
| min: float | ||
| max: float | ||
|
|
||
|
|
||
| class NumberRangeFilter(CommonGrantsBaseModel): | ||
| operator: RangeOperators = Field( | ||
| ..., description='The operator to apply to the filter value' | ||
| ) | ||
| value: Value = Field( | ||
| ..., | ||
| description='The value to use for the filter operation', | ||
| examples=[{'min': 1000, 'max': 10000}], | ||
| json_schema_extra={'unevaluatedProperties': {'not': {}}}, | ||
| ) | ||
|
|
||
| ``` | ||
| </details> | ||
|
|
||
| <details> | ||
| <summary> SingleDate Event String vs Literal (Applies to all EventType</summary> | ||
|
|
||
| Manual | ||
| ```python | ||
| # Single Date Event | ||
| class SingleDateEvent(EventBase): | ||
| """Description of an event that has a date (and possible time) associated with it.""" | ||
|
|
||
| event_type: Literal[EventType.SINGLE_DATE] = Field( | ||
| EventType.SINGLE_DATE, | ||
| alias="eventType", | ||
| ) | ||
| date: ISODate = Field( | ||
| ..., | ||
| description="Date of the event in ISO 8601 format: YYYY-MM-DD", | ||
| ) | ||
| time: Optional[ISOTime] = Field( | ||
| default=None, | ||
| description="Time of the event in ISO 8601 format: HH:MM:SS", | ||
| ) | ||
|
|
||
| ``` | ||
|
|
||
| Generated | ||
| ```python | ||
|
|
||
| class SingleDateEvent(EventBase): | ||
| event_type: Literal['singleDate'] = Field( | ||
| ..., alias='eventType', description='Type of event' | ||
| ) | ||
| date: isoDate.IsoDate = Field( | ||
| ..., description='Date of the event in in ISO 8601 format: YYYY-MM-DD' | ||
| ) | ||
| time: Optional[isoTime.IsoTime] = Field( | ||
| default=None, description='Time of the event in ISO 8601 format: HH:MM:SS' | ||
| ) | ||
|
|
||
|
|
||
| ``` | ||
| </details> | ||
|
|
||
| <details> | ||
| <summary>Event.py set under __root__ vs Event = Union </summary> | ||
|
|
||
| Manual | ||
| ```python | ||
| # Event Union | ||
| Event = Union[SingleDateEvent, DateRangeEvent, OtherEvent] | ||
|
|
||
| ``` | ||
|
|
||
| Generated | ||
| ```python | ||
|
|
||
| class Event(RootModel[Union[SingleDateEvent, DateRangeEvent, OtherEvent]]): | ||
| root: Union[SingleDateEvent, DateRangeEvent, OtherEvent] = Field( | ||
| ..., | ||
| description='Union of all event types', | ||
| json_schema_extra={'$schema': 'https://json-schema.org/draft/2020-12/schema'}, | ||
| title='Event', | ||
| ) | ||
|
|
||
| ``` | ||
| </details> | ||
|
|
||
| <details> | ||
| <summary>DefaultFilter value property is Any instead of Union</summary> | ||
|
|
||
| Manual | ||
| ```python | ||
|
|
||
| class DefaultFilter(CommonGrantsBaseModel): | ||
| """Base class for all filters that matches Core v0.1.0 DefaultFilter structure.""" | ||
|
|
||
| operator: Union[ | ||
| EquivalenceOperator, | ||
| ComparisonOperator, | ||
| ArrayOperator, | ||
| StringOperator, | ||
| RangeOperator, | ||
| ] = Field(..., description="The operator to apply to the filter value") | ||
| value: Union[str, int, float, list, dict] = Field( | ||
| ..., | ||
| description="The value to use for the filter operation", | ||
| ) | ||
|
|
||
| @field_validator("operator", mode="before") | ||
| @classmethod | ||
| def validate_operator(cls, v): | ||
| """Convert string to enum if needed.""" | ||
| if isinstance(v, str): | ||
| # Try to match against each operator type | ||
| for operator_class in [ | ||
| EquivalenceOperator, | ||
| ComparisonOperator, | ||
| ArrayOperator, | ||
| StringOperator, | ||
| RangeOperator, | ||
| ]: | ||
| try: | ||
| return operator_class(v) | ||
| except ValueError: | ||
| continue | ||
| # If no match found, raise ValueError | ||
| raise ValueError(f"Invalid operator: {v}") | ||
| return v | ||
|
|
||
| ``` | ||
|
|
||
| Generated | ||
| ```python | ||
| class DefaultFilter(CommonGrantsBaseModel): | ||
| operator: Union[ | ||
| EquivalenceOperators, | ||
| ComparisonOperators, | ||
| ArrayOperators, | ||
| StringOperators, | ||
| RangeOperators, | ||
| AllOperators, | ||
| ] = Field(..., description='The operator to apply to the filter value') | ||
| value: Any = Field(..., description='The value to use for the filter operation') | ||
|
|
||
| ``` | ||
|
|
||
| </details> | ||
|
|
||
| ## Future Work/ Next Steps | ||
|
|
||
| ### Manual Verification of Pydantic Schemas | ||
|
|
||
| ### Steps Taken | ||
| 1. Update the `common-grants-sdk` setting in /templates/fast-api/pyproject.toml comment line 10 uncomment line 11 | ||
| 2. Run the generate_models.sh script | ||
| 3. Update the `__init__.py` inside of [`/generated`](../generated/scripts/pydantic/__init__.py) (This assumes you ran the generations script from the scripts directory where this README resides and used an output direcotry of . as the second argument) to export the generated models | ||
| 4. Update the `__init__.py` inside of [`templates/fast-api/src/common_grants/schemas/__init__.py`](../../../templates/fast-api/src/common_grants/schemas/__init__.py) to point to `common_grants_sdk.schemas.pydantic.generated` | ||
| 5. Run `make check-types`, this will return an error | ||
| 6. Address errors as needed until the `make` command passes | ||
| 7. Add the auto-generated schemas to the existing schema objects so that the existing schema objects are wrapping the auto-generated objects | ||
| 8. Stand up the fast-api template and verify that things work as expected | ||
|
|
||
|
|
||
|
|
||
| ### Gaps Identified | ||
| 1. We are missing the title field inside the JSONSchema YAML files, this should be fixed from the typespec generation but as a workaround for verifying our auto-generation a temp script will be included to add this field to a separate copy of the jsonSchema.yaml files | ||
| 2. The manual pydantic models have small variations in their naming from the JSONSchema (ArrayOperator vs ArrayOperators). As a workaround to this we will try and alias these to minimize the impact of the changes while we are testing. | ||
|
|
||
|
|
||
| ### Complete/merge in the property based testing | ||
| - Property based testing will be merged in a future PR | ||
| - The hypothesis library has the ability to fine tune the specific properties to expand our scope | ||
| - Break out the `validate_all` python script to pytest framework based approach | ||
| - Set up the above pytests to be omitted from any automatic test runs until the automated schemas are being leveraged by the rest of the client code | ||
|
|
||
| ### Cutover | ||
| When it comes time to transition from the hand built pydantic schemas to the auto generated ones the existing hand built pydantic files should remain in place and should be updated to become a wrapper for the properties contained within the auto-generated schema. | ||
|
|
||
| This allows us to preserve any helper functions we've built as well as avoids additional work updating wherever the existing schemas are called within the codebase. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #!/bin/bash | ||
| #Script to run the end to end process for auto-generating pydantic schemas | ||
| #Usage: generate_models.sh # <PATH TO JSONSCHEMA YAML> <OUTPUT DIR TO PLACE THE PYDANTIC OBJECTS IN | ||
|
|
||
|
widal001 marked this conversation as resolved.
|
||
| # Input Variables | ||
| YAML_DIR="$1" | ||
| OUTPUT_DIR="$2" | ||
|
|
||
|
|
||
| "./rename_and_add_title.sh" $YAML_DIR $OUTPUT_DIR | ||
|
|
||
| echo "Processing all schemas" | ||
| poetry run datamodel-codegen --input "${OUTPUT_DIR}/schemas" \ | ||
| --output "${OUTPUT_DIR}/pydantic" \ | ||
| --output-model-type pydantic_v2.BaseModel \ | ||
| --base-class common_grants_sdk.schemas.pydantic.base.CommonGrantsBaseModel \ | ||
| --reuse-model \ | ||
| --field-constraints \ | ||
| --use-standard-collections \ | ||
| --capitalise-enum-members \ | ||
| --use-specialized-enum \ | ||
| --target-python-version 3.13\ | ||
| --snake-case-field \ | ||
| --field-include-all-keys \ | ||
| --use-default-kwarg \ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Script to copy the yaml files and add the 'title' attribute in each YAML schema file | ||
| # to match the filename | ||
|
widal001 marked this conversation as resolved.
|
||
|
|
||
| #Usage: rename_and_add_title <PATH TO JSONSCHEMA TAML DIRECTORY> <OUTPUT DIRECTORY FOR MODIFIED YAMLS> | ||
|
|
||
| SCHEMAS_DIR="${1}" | ||
| OUTPUT_DIR="${2}" | ||
|
|
||
| # Check if the directory exists | ||
| if [ ! -d "$SCHEMAS_DIR" ]; then | ||
| echo "Error: Directory not found: $SCHEMAS_DIR" | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Create generated directory alongside yaml directory | ||
| GENERATED_DIR="$OUTPUT_DIR/schemas" | ||
| mkdir -p "$GENERATED_DIR" | ||
| echo "Created directory: $GENERATED_DIR" | ||
|
jcrichlake marked this conversation as resolved.
|
||
|
|
||
| # Copy all yaml files to generated directory | ||
| echo "Copying YAML files from $SCHEMAS_DIR to $GENERATED_DIR" | ||
| cp "$SCHEMAS_DIR"/*.yaml "$GENERATED_DIR/" 2>/dev/null || true | ||
| echo "Copied $(ls -1 "$GENERATED_DIR"/*.yaml 2>/dev/null | wc -l | tr -d ' ') YAML files" | ||
|
|
||
| echo "Processing YAML files in: $SCHEMAS_DIR" | ||
|
|
||
| # Find .yaml files only at the root level (not in subdirectories) | ||
| find "$GENERATED_DIR" -maxdepth 1 -name "*.yaml" -type f | while read -r file; do | ||
| # Get the filename without extension | ||
| filename=$(basename "$file" .yaml) | ||
|
|
||
| # Check if file already has a title line | ||
| if grep -q "^title:" "$file"; then | ||
| # Update existing title | ||
| sed -i '' "s/^title:.*$/title: $filename/" "$file" | ||
| echo "Updated title in: $file" | ||
| else | ||
| # Add title after $id line | ||
| if grep -q '^\$id:' "$file"; then | ||
| sed -i '' "/^\$id:/a\\ | ||
| title: $filename | ||
| " "$file" | ||
| echo "Added title to: $file" | ||
| else | ||
| # If no $id line, add title after $schema line | ||
| if grep -q '^\$schema:' "$file"; then | ||
| sed -i '' "/^\$schema:/a\\ | ||
| title: $filename | ||
| " "$file" | ||
| echo "Added title after \$schema in: $file" | ||
| else | ||
| # Prepend title to the file | ||
| sed -i '' "1i\\ | ||
| title: $filename | ||
| " "$file" | ||
| echo "Prepended title to: $file" | ||
| fi | ||
| fi | ||
| fi | ||
| done | ||
|
|
||
| echo "Done processing all YAML files." | ||
Uh oh!
There was an error while loading. Please reload this page.