Skip to content

Conversation

@kouloumos
Copy link
Member

Latest development was happening on the fastapi branch due to breaking changes, it's time to converge.

- support for fastapi
- first endpoint
curator commands use the bitcointranscripts
repo/deployment as source of truth to get the
transcription backlog
the introduction of the `/transcription` endpoint breaks
the transcription process into three parts:
1. add to transcription queue
2. remove from transcription queue
3. start the transcription

Different endpoints exist for each part.
The cli now calls the respective endpoints during the
`transcribe` command and the `preprocess` command.
The `Transcription` instance and the individual `Transcript`s
now have a `status` that changes throughout the transcription
process and its available at the `/transcription/queue` endpoint
Centralize environment variables within the `Settings` class to ensure consistent configuration access throughout the application.
- Create APIClient class to encapsulate all API-related operations
- Move API call methods from main CLI file to APIClient
- Apply api_error_handler decorator to all APIClient methods
- Update CLI commands to use new APIClient
`MediaProcessor` now attempts to get the video URL first using
Invidious, and if that fails, falls back to using yt_dlp.
refactor handling of username for transcription attribution
after the cli-to-server refactoring
Refactor setup to use extras, isolating the Whisper dependency as an optional install.
This change helps streamline the installation process by avoiding unnecessary
dependencies when Whisper is not required.
- Refactor entry points to create separate console scripts for cli and server
- Introduced `transcriber_server.py` to streamline server startup.
- Added `tstbtc-server` command to simplify running the transcription server
 in different modes (`dev` or `prod`) using `uvicorn`.
Introduced a docker-compose.yaml file for easier setup and deployment.
Added relevant documentation to guide users through the Docker setup process.
- Implement GitHubAPIHandler class to manage all GitHub interactions
- Remove dependency on local Git repositories and commands
- Add automatic forking of target repository if not exists
- Implement permission checks for GitHub token at initialization
- Create new branches based on latest commit of origin repository
- Add support for creating and updating files via GitHub API
- Implement pull request creation through GitHub API
- Update CLI to use a --github flag for enabling API integration
- Enhance error handling and logging for GitHub API interactions
- Update README with new GitHub integration instructions and requirements

BREAKING CHANGE: This update removes the need for local Git repositories.
The BITCOINTRANSCRIPTS_DIR environment variable is no longer used.
Instead, GITHUB_TOKEN, GITHUB_REPO_OWNER, and GITHUB_REPO_NAME must be set
in the environment or passed explicitly. The --github CLI option is now a
flag instead of a choice.
- Introduce Output type to group all transcript-related outputs
- Update Transcript class to initialize outputs with empty Output structure
- Modify Transcription class to use new Output structure
- Update Deepgram/Whisper class to populate Output fields directly
- Replace transcript.result with transcript.outputs.raw
- Add transcription_service_output_file and dpe_file to Output type
- Update related methods to use new Output structure

This change improves type safety, accessibility of transcript data,
and centralizes all output-related information for each transcript.
- Extend GitHubAPIHandler to manage both transcripts and metadata repos
- Implement grouped commits for metadata files per transcript
- Replace Personal Access Token with GitHub App for authentication
- Update GitHubAPIHandler to use GitHub App credentials
- Modify README to reflect new GitHub App integration process
- Remove forking logic as it's no longer needed with GitHub App
- Change multiple `logger.info` calls to `logger.debug` for less verbose default output
- log configuration at server startup
- Replace fixed format "18" with flexible format selection
- Prioritize lowest quality for efficient audio extraction
- Refactor check_md_file function for improved testing
- Install FFmpeg using apt-get
this new metadata field wasn't always initialized
correctly, resulting to erros.
…erent formats

- Implemented a factory pattern for creating exporters based on configuration with support for Markdown, JSON, and plain text formats.
- Modified existing methods to utilize new exporters for markdown and JSON outputs.
- Updated CLI and API routes to accommodate new export options.
- Configured pytest to include tests for new exporters.
- Updated README for clearer installation and server management instructions.
- Added server management commands to the CLI for starting, stopping, and checking the status of the transcription server.
- Implemented automatic server start feature for specific CLI commands.
- Enhanced error handling and logging for server operations.
- Updated requirements to include new dependencies for server management.
…rt functionality

- Deleted the Queuer class and related commands to streamline the transcription process.
- Updated the Transcription class to support JSON output directly, replacing the previous queue mechanism.
- Modified CLI commands and API routes to reflect the removal of Queuer and the addition of JSON export options.
- Adjusted tests to ensure proper functionality of the new JSON export feature.
- Updated documentation to remove references to Queuer and clarify the new export capabilities.
- Introduced a new media command group for handling audio processing tasks.
- Implemented commands to split audio files based on silence and convert media files to MP3 format.
- Enhanced error logging for better debugging during media processing.
- Updated requirements to include the latest version of yt-dlp.
- Changed Python version from 3.9 to 3.11 in the GitHub Actions workflow.
- Simplified the dependency installation command by removing the use of --editable and --use-pep517 flags.
@kouloumos kouloumos merged commit 54ec4ac into main Mar 17, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants