StdText is a FastAPI service that rewrites short invoice line texts using a rule-driven pipeline with optional OpenAI post-processing. It normalizes abbreviations, preserves named entities, corrects spelling, formats counts, and can call an OpenAI model to polish the final uppercase output.
- Python 3.11 (use a Python 3.11 virtual environment to match the supported runtime)
- pip
- Java Development Kit (JDK) > V17
-
Create and activate a Python 3.11 virtual environment:
py -3.11 -m venv .venv source .venv/bin/activate -
Install DaCy libraries:
py -m pip install dacy py dacy_inst.py
-
Install dependencies:
py -m pip install -r requirements.txt
Newer versions of pip are strict about wheel filenames and will reject DaCy
model wheels that do not include an explicit version number, raising an error
like ERROR: Invalid wheel filename (invalid version): 'da_dacy_small_trf-any-py3-none-any'.
If you run into this while installing a DaCy model manually, rename the file to
include a version (for example da_dacy_small_trf-0.1.0-py3-none-any.whl) and
install it with pip:
mv da_dacy_small_trf-any-py3-none-any.whl da_dacy_small_trf-0.1.0-py3-none-any.whl
pip install da_dacy_small_trf-0.1.0-py3-none-any.whlThis keeps pip happy while still installing the same model artifact.
Service behavior is configured in stdtext/config.yaml. You can enable optional OpenAI polishing, choose the model, and control whether responses are uppercased. The rule configuration also contains fuzzy action definitions used by the pipeline.
Start the FastAPI application with Uvicorn:
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 1Windows users can use run.cmd after activating the virtual environment.
GET /health— Basic health probe that reports whether OpenAI support is enabled and verifies spell-check initialization.POST /rewrite— Runs the rule-based pipeline and, if configured or requested, sends the draft through OpenAI for light polishing.POST /debug_rewrite— Executes only the rule-based pipeline and returns all intermediate stages for debugging.POST /check_spelling— Performs normalization-aware spell checking while preserving placeholders for abbreviations and entities.
- Text normalization and domain-specific action expansion.
- Abbreviation and entity placeholder extraction to protect key tokens.
- Spell correction for non-placeholder tokens.
- Structured count extraction and formatting.
- Rewrite pattern application to produce canonical action phrases.
- Reinsertion of entities, abbreviations, and formatted counts.
- Optional OpenAI refinement before returning the final uppercase result.
- Keep
stdtext/config.yamlAPI keys out of version control. - Use the
/debug_rewriteendpoint when adjusting rules to see every intermediate stage. - The service is designed to run without OpenAI access; OpenAI calls are skipped unless explicitly enabled.