This project is an image tagging and searching application that leverages the power of open-source local multimodal LLM like Llama 3.2 Vision and vector database like ChromaDB to provide a seamless image management experience.
This project is an extensive rewrite of "llama vision image tagger" by Guodong Zhao, where I aim to
- make the llm backend selectable to take advantage of more recent models like Gemma3 vision
- improve the usability by making the processing queue asynchronous and add file system navigation and bookmarking
- add the ability to save tags and descriptions to the images' metadata to supplement local search engines and photo organizers that don't have llm based tagging
The application provides an intuitive way to organize and search through your image collection using AI-powered tagging and natural language search. When you first open the application, it will:
- Prompt you to choose a folder containing your images
- Scan the folder and subfolders for images (png/jpg/jpeg/webp)
- Initialize an index record (using JSON) to track new/deleted images
- Process images with Llama 3.2 Vision to generate:
- Tags for elements and styles
- Short descriptions
- Text extracted from images
- Store all metadata in a vector database for efficient retrieval
- Enable natural language search using both full-text and vector search
- Provide a modern web interface for browsing and managing images
- On first open, it will prompt users to choose a folder
- It will scan the folder and subfolder for images (png/jpg/jpeg/webp) and initialize an index record (stored as image_metadata.json in the selected folder)
- It will then create taggings of the images with Llama3.2 Vision with Ollama when "Start Tagging". It will create tags of elements/styles, a short description of the image, text within the images. The image path, tags, description, text within the images will be saved to a vector database for easier retrieval later
- Users can then query the images with natural language. During querying, it will use full-text search and vector search to find the most relevant images
- Users can browse the images on the UI, on click thumbnail, modal opens with image and its tags, description, and text within the image
- requirements.txt should ONLY exist in the root directory, not in backend/
- All dependencies should be listed in the root requirements.txt file
-
Folder Selection and Image Discovery:
- Select any folder on your system
- Recursive scanning of subfolders
- Support for multiple image formats (png, jpg, jpeg, webp)
- Automatic tracking of new and deleted images
-
Intelligent Tagging:
- AI-powered image analysis using Llama 3.2 Vision
- Generation of descriptive tags
- Extraction of image content and style information
- Text extraction from images
- Interruptible batch processing with progress tracking
- Real-time progress updates for each processing step:
- Description generation (0-33%)
- Tag extraction (33-67%)
- Text content analysis (67-100%)
-
Vector Database Storage:
- Efficient storage using ChromaDB
- Fast vector-based similarity search
- Persistent storage of metadata
- Automatic synchronization with file system
-
Natural Language Search:
- Hybrid search combining full-text and vector similarity
- Search by description, tags, or extracted text
- Semantic understanding of search queries
- Ranked results based on relevance
-
User Interface:
- Modern web interface built with Vue3 and Tailwind CSS
- Responsive image grid layout
- Detailed image modal with metadata
- Real-time processing progress tracking
- Stop/Resume batch processing capability
-
External Volume Support:
- Handles external drives and network shares
- Graceful handling of permission limitations
- Robust file operations with appropriate error messages
- Support for metadata read/write operations on various filesystem types
-
Local server web page with Tailwind CSS, Vue3, and HTML.
- Tailwind CSS CDN: <script src="https://cdn.tailwindcss.com"></script>
- Vue3 CDN: <script src="https://unpkg.com/vue@3/dist/vue.global.js"></script>
-
Built with Vue3 and Tailwind CSS
-
Responsive and modern UI design
-
Real-time updates and progress tracking
-
Modal-based image viewing
-
Interruptible batch processing controls
- FastAPI server for robust API endpoints
- Ollama integration for running Llama 3.2 Vision model
- ChromaDB for vector storage and similarity search
- File system storage service for robust file operations with retry mechanisms
- Asynchronous image processing with stop/resume capability
The application includes a dedicated storage service that:
- Handles all file system interactions with robust error handling
- Provides atomic file operations with retry mechanisms
- Validates file permissions before operations
- Handles JSON metadata serialization/deserialization
- Ensures metadata consistency across operations
- Creates parent directories as needed for file operations
- Manages temporary files for atomic writes
- Provides both synchronous and asynchronous interfaces
- Implements proper cleanup for failed operations
- Handles external volumes and their specific requirements
This service isolates file system interactions from business logic, making the code more testable and robust against filesystem errors like permission issues or concurrent access problems.
The application stores metadata in two primary locations:
-
JSON File: A
image_metadata.jsonfile is created in each image folder, containing all metadata for images in that folder. This allows metadata to stay with the images even when moved. -
Vector Database: Metadata is also stored in a ChromaDB instance for efficient semantic searching.
The app ensures synchronization between these storage locations, with intelligent matching between filenames and paths to maintain metadata consistency.
The application processes each image in three distinct steps, providing real-time progress updates throughout:
-
Description Generation (0-33%)
- Sends the image to Ollama with a prompt for a concise description
- Returns a structured description in JSON format
- Progress updates reflect both step completion and Ollama's processing status
-
Tag Extraction (33-67%)
- Analyzes the image for relevant tags (objects, styles, colors, etc.)
- Generates 5-10 descriptive tags in JSON format
- Progress updates combine step position and Ollama's processing status
-
Text Content Analysis (67-100%)
- Detects and extracts any visible text in the image
- Returns a structured response with text content and presence flag
- Final progress update (100%) includes complete metadata
Each step yields progress updates that are scaled within its range, ensuring smooth progress tracking in the UI. The process can be interrupted at any point, and all progress is persisted to enable resuming from the last completed step.
- Python 3.11+: Ensure Python 3.11 or newer is installed on your system
- Ollama: Install Ollama to run the Llama model (Ollama website)
- ChromaDB: Will be installed via pip
- Exempi: Required for XMP metadata handling
- macOS:
brew install exempi - Ubuntu/Debian:
sudo apt-get install libexempi3 libexempi-dev - Fedora/RHEL:
sudo dnf install exempi exempi-devel - Windows: Build from source or use pre-built binaries
- macOS:
- Data Directory: The application requires write access to create and manage a data directory for queue persistence and other operational data. By default, this is created at:
$PROJECT_ROOT/data(where $PROJECT_ROOT is the root directory of the project)- Ensure your user has write permissions to this location
llm-image-tagger/
├── backend/ # Backend code
│ ├── app/ # Application package
│ │ ├── api/ # API endpoints
│ │ │ ├── routes.py # API route definitions
│ │ │ └── dependencies.py # API dependencies
│ │ ├── core/ # Core functionality
│ │ │ ├── config.py # Configuration settings
│ │ │ └── logging.py # Logging configuration
│ │ ├── models/ # Data models
│ │ │ └── schemas.py # Pydantic models
│ │ ├── services/ # Business logic
│ │ │ ├── image_processor.py # Image processing service
│ │ │ ├── vector_store.py # Vector database service
│ │ │ ├── storage.py # File system storage service
│ │ │ ├── processing_queue.py # Queue management
│ │ │ └── queue_persistence.py # Queue state persistence
│ │ └── utils/ # Utility functions
│ │ └── helpers.py # Utility functions
│ ├── tests/ # Test files
│ ├── main.py # Application entry point
│ └── .env # Environment variables
├── data/ # Application data directory
│ └── queue_state.json # Queue persistence data
├── static/ # Frontend static files
│ └── index.html # Main HTML file
├── run.py # Script to run the application
├── requirements.txt # Python dependencies
└── README.md # Project documentation
-
Clone the repository:
git clone https://github.com/CaliLuke/llm-image-tagger cd llm-image-tagger -
Set up a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Unix/macOS # or venv\Scripts\activate # On Windows pip install -r requirements.txt
-
Configure XMP support:
The application uses XMP for image metadata handling, which requires the Exempi library.
# Ensure Exempi is installed on your system: # macOS: brew install exempi # Ubuntu/Debian: sudo apt-get install libexempi3 libexempi-dev # Fedora/RHEL: sudo dnf install exempi exempi-devel # Make sure your virtual environment is active, then: source setup_exempi.sh
This script will:
- Detect your operating system
- Find the Exempi library on your system
- Configure the necessary environment variables
- Verify the configuration
If you encounter issues with XMP support:
- Ensure Exempi is properly installed
- Make sure you use
source setup_exempi.sh(not just running the script) - Check that your virtual environment is activated
- Look at log messages for specific paths that need to be configured
-
Configure environment variables:
- Copy the
.env.examplefile in the backend directory to.env - Modify the values as needed
- Copy the
-
Install Ollama and pull the model:
- Download and install Ollama from ollama.com
- Pull the gemma3:4b model:
ollama pull gemma3:4b
You can run the application using the run.py script, which provides several command-line options:
# Always ensure your virtual environment is active first
source venv/bin/activate # On Unix/macOS
# or
venv\Scripts\activate # On Windows
# Configure Exempi (must be done in every new terminal session)
source setup_exempi.sh
# Now run the application
python run.py [options]| Option | Description |
|---|---|
--host HOST |
Host to run the server on (default: 127.0.0.1) |
--port PORT |
Port to run the server on (default: 8000) |
--no-browser |
Don't open the browser automatically |
--debug |
Run in debug mode with tests |
--skip-tests |
Skip running tests even in debug mode |
--force |
Force start even if port appears to be in use |
# Run with default settings
python run.py
# Run on a specific port
python run.py --port 8080
# Run in debug mode with automatic reloading
python run.py --debug
# Run without opening the browser
python run.py --no-browser
# Force start even if the port is in use
python run.py --force-
Activate the virtual environment and configure Exempi:
source venv/bin/activate # On Unix/macOS # or venv\Scripts\activate # On Windows source setup_exempi.sh # Configure Exempi paths
-
Run the application:
python run.py # Run with default settings python run.py --debug # Run in debug mode with auto-reload
Options:
python run.py --host=0.0.0.0 --port=8080 # Specify host and port python run.py --no-browser # Run without opening browser python run.py --skip-tests # Skip running tests in debug mode python run.py --force # Force start even if port is in use
-
Select a folder:
- Enter the path to your image folder (including external drives like
/Volumes/...) - Wait for initial scanning and processing
- First-time processing may take longer due to model downloads
- Enter the path to your image folder (including external drives like
-
Process images:
- Use "Process All" for batch processing
- Use "Process Image" for individual images
- Monitor progress in real-time
-
Search images:
- Enter natural language queries
- View results in the responsive grid
- Click images for detailed metadata
-
Refresh images:
- Use "Refresh" to scan for new/removed images
- Metadata is preserved across sessions
GET /directories: Lists all directories at the specified path or current folder- Takes an optional
pathquery parameter - Returns metadata about directories including whether they contain images or metadata
- Handles permission errors gracefully with descriptive error messages
- Takes an optional
POST /search: Performs hybrid search combining vector similarity and full-text search- Searches through descriptions, tags, and extracted text
- Returns ranked results based on relevance
- Supports natural language queries
GET /queue/status: Returns current queue status (size, active/completed/failed tasks)GET /queue/tasks: Lists all tasks in the queue with their statusPOST /queue/clear: Clears all tasks from the queue and task history
POST /processing/start: Starts batch image processing- Processes images in queue
- Provides real-time progress updates
- Supports background processing
POST /processing/stop: Stops current processing operationGET /processing/status: Returns current processing status and progress
POST /logging/error: Logs frontend errors with stack tracesPOST /logging/info: Logs frontend information messagesPOST /logging/debug: Logs frontend debug information
Each endpoint includes:
- Comprehensive error handling
- Detailed logging
- Progress tracking where applicable
- Type validation
- Authentication (if configured)
For detailed API documentation, visit /docs when the server is running.
To run in development mode with auto-reload and tests:
python run.py --debugThis will:
- Run the test suite before starting the server
- Enable auto-reload for code changes (no manual restarts needed)
- Start the server at http://127.0.0.1:8000
The application uses uvicorn's auto-reload feature, which automatically detects code changes and restarts the server, making development much faster. You'll see "WatchFiles detected changes" messages in the console when files are modified.
Additional options:
python run.py --debug --no-browser # Don't open browser automatically
python run.py --debug --port 8080 # Use a different portThe project uses pytest for testing. Tests are organized by component:
tests/
├── conftest.py # Shared test fixtures
├── test_data/ # Test data directory
├── test_image_processor.py # Image processor tests
└── test_api.py # API endpoint tests
To run tests manually:
# Run all tests
pytest
# Run tests with output
pytest -v
# Run specific test file
pytest tests/test_image_processor.py
# Run specific test
pytest tests/test_api.py::test_stop_processing- Scan image metadata for context in descriptions/tags
- Hybrid OCR with tesseract and Llama Vision
- Separate frontend into proper Vue.js project
- Add authentication for multi-user support
- Implement WebSockets for real-time updates
Contributions are welcome! Please feel free to submit issues or pull requests.
If you are an AI assistant helping with this project, please read AI_ASSISTANT_GUIDE.md before proceeding.
This project is licensed under the MIT License.