AI Document Search Backend

The server is deployed at https://ai-document-search-backend.azurewebsites.net/. The deployment is automatic on push to the master branch. The OpenAPI schema is available at https://ai-document-search-backend.azurewebsites.net/docs.

This repository uses Poetry package manager (see useful commands).

The server uses FastAPI framework.

The code uses dependency injection and is tested using pytest.

How to run the server locally

The server is available at http://localhost:8000.

Start by creating an .env file in the project root with the following content:

APP_OPENAI_API_KEY=your_openai_api_key
APP_WEAVIATE_API_KEY=api_key_for_weaviate_url_specified_in_config
COSMOS_KEY=key_for_cosmos_url_specified_in_config
AUTH_SECRET_KEY=any_secret_key
AUTH_USERNAME=any_user
AUTH_PASSWORD=any_password

Without Docker

poetry install
poetry run uvicorn ai_document_search_backend.application:app --reload

With Docker

docker compose up

Other useful commands

Unit tests

poetry run pytest

Load tests

Start the server locally (see above).
poetry run locust
Open http://localhost:8089/ in your browser.
Enter the number of users, the spawn rate and Host (http://localhost:8000 – without trailing slash).
Click "Start swarming".

Lint autoformat

poetry run black --config black.py.toml .
poetry run ruff check . --fix

Lint check

poetry run black --config black.py.toml . --check
poetry run ruff check .

Build Docker image, tag and push to Azure Container Registry

docker build -t ai-document-search-backend -f Dockerfile .
docker tag ai-document-search-backend:latest crdocsearchdev.azurecr.io/crdocsearchdev/ai-document-search-backend:0.0.1
az login
az acr login --name crdocsearchdev
docker push crdocsearchdev.azurecr.io/crdocsearchdev/ai-document-search-backend:0.0.1

Useful Poetry commands

Install all dependencies: poetry install.
Add new package at the latest version: poetry add <package>, e.g. poetry add numpy.
Add package only for development: poetry add <package> --group dev, e.g. poetry add jupyter --group dev.
Regenerate poetry.lock file: poetry lock --no-update.
Remove package: poetry remove <package>, e.g. poetry remove numpy.

Populating the vector database

Download NTNU2.xlsx from the customer and save it to data/NTNU2.xlsx. This file is private and is therefore not included in the repository. See prepare_data.py for the columns that must be present in the file.
Run poetry run python ai_document_search_backend/scripts/prepare_data.py to pre-process the data.
Run poetry run python ai_document_search_backend/scripts/download_documents.py [limit] to download the PDFs into a local folder. The limit is optional and specifies the number of documents to download. If not specified, all documents will be downloaded.
Run poetry run python ai_document_search_backend/scripts/fill_vectorstore.py to store the documents in the vector database.

Project structure, architecture and design

For a more detailed description of the project structure, architecture and design, see the project structure document.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github/workflows		.github/workflows
ai_document_search_backend		ai_document_search_backend
docs		docs
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
black.py.toml		black.py.toml
config.yml		config.yml
docker-compose-weaviate.yml		docker-compose-weaviate.yml
docker-compose.yml		docker-compose.yml
locustfile.py		locustfile.py
logging.conf		logging.conf
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Document Search Backend

How to run the server locally

Without Docker

With Docker

Other useful commands

Unit tests

Load tests

Lint autoformat

Lint check

Build Docker image, tag and push to Azure Container Registry

Useful Poetry commands

Populating the vector database

Project structure, architecture and design

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

avid-fullstack/ai-doc-search-fastAPI

Folders and files

Latest commit

History

Repository files navigation

AI Document Search Backend

How to run the server locally

Without Docker

With Docker

Other useful commands

Unit tests

Load tests

Lint autoformat

Lint check

Build Docker image, tag and push to Azure Container Registry

Useful Poetry commands

Populating the vector database

Project structure, architecture and design

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages