The server is deployed at https://ai-document-search-backend.azurewebsites.net/.
The deployment is automatic on push to the master branch.
The OpenAPI schema is available at https://ai-document-search-backend.azurewebsites.net/docs.
This repository uses Poetry package manager (see useful commands).
The server uses FastAPI framework.
The code uses dependency injection and is tested using pytest.
The server is available at http://localhost:8000.
Start by creating an .env file in the project root with the following content:
APP_OPENAI_API_KEY=your_openai_api_key
APP_WEAVIATE_API_KEY=api_key_for_weaviate_url_specified_in_config
COSMOS_KEY=key_for_cosmos_url_specified_in_config
AUTH_SECRET_KEY=any_secret_key
AUTH_USERNAME=any_user
AUTH_PASSWORD=any_passwordpoetry installpoetry run uvicorn ai_document_search_backend.application:app --reload
docker compose up
poetry run pytest
- Start the server locally (see above).
poetry run locust- Open http://localhost:8089/ in your browser.
- Enter the number of users, the spawn rate and Host (http://localhost:8000 – without trailing slash).
- Click "Start swarming".
poetry run black --config black.py.toml .poetry run ruff check . --fix
poetry run black --config black.py.toml . --checkpoetry run ruff check .
docker build -t ai-document-search-backend -f Dockerfile .docker tag ai-document-search-backend:latest crdocsearchdev.azurecr.io/crdocsearchdev/ai-document-search-backend:0.0.1az loginaz acr login --name crdocsearchdevdocker push crdocsearchdev.azurecr.io/crdocsearchdev/ai-document-search-backend:0.0.1
- Install all dependencies:
poetry install. - Add new package at the latest version:
poetry add <package>, e.g.poetry add numpy. - Add package only for development:
poetry add <package> --group dev, e.g.poetry add jupyter --group dev. - Regenerate
poetry.lockfile:poetry lock --no-update. - Remove package:
poetry remove <package>, e.g.poetry remove numpy.
- Download
NTNU2.xlsxfrom the customer and save it todata/NTNU2.xlsx. This file is private and is therefore not included in the repository. Seeprepare_data.pyfor the columns that must be present in the file. - Run
poetry run python ai_document_search_backend/scripts/prepare_data.pyto pre-process the data. - Run
poetry run python ai_document_search_backend/scripts/download_documents.py [limit]to download the PDFs into a local folder. The limit is optional and specifies the number of documents to download. If not specified, all documents will be downloaded. - Run
poetry run python ai_document_search_backend/scripts/fill_vectorstore.pyto store the documents in the vector database.
For a more detailed description of the project structure, architecture and design, see the project structure document.