Skip to content

Conversation

@benliang99
Copy link
Contributor

@benliang99 benliang99 commented May 2, 2025

Release 2.2.11 – I2V Expansion & Task-Specific Prompt Generation Fixes

Release Date: May 2, 2025

Version 2.2.11 introduces configuration support for the CogVideoX1.5-5B image-to-video model and fixing image and video prompt generation logic by correctly utilizing task types for annotation enhancement. It adds prompt sanitation to address legal NSFW filtering limitations in the DeepFloyd/IF model.

Updates

Model Changes

  • New Model Added:
    • Added THUDM/CogVideoX1.5-5B-I2V model with full pipeline configuration and support for I2V generation
    • Enabled I2V model selection in randomized pipelines

Fixes

  • Prompt Generation:
    • Now derives task type from the model group rather than uninitialized model_name
    • Grouped models by task type to reduce prompt generator reload frequency
    • Added detailed logging for better runtime visibility
  • DeepFloyd/IF NSFW Filtering:
    • We found that this model automatically censors NSFW content with no configuration toggle due to legal constraints
    • Automatic Prompt Sanitization:
      • If NSFW is detected, the prompt is automatically sanitized using the moderation LLM and generation is retried with a safer prompt (up to 3 attempts)

Technical Details

Key changes include:

  1. Introduced CogVideoX1.5-5B-I2V into the I2V_MODELS pipeline
  2. Reworked task-type inference logic in the prompt generator
  3. Reduced overhead from redundant prompt generator loads
  4. Improved logging granularity for synthetic data processes
  5. Implemented DeepFloyd/IF NSFW detection and prompt sanitization loop

Impact

These changes strengthen the robustness of I2V generation by expanding model coverage and resolving incorrect task typing. Prompt generation is now more efficient and adaptable across task types.

Breaking Changes

  • Updated prompt generation to rely on model group task typing
  • Internal logic changes may affect custom model configurations relying on model_name-based task detection

benliang99 and others added 6 commits April 29, 2025 20:37
Add THUDM/CogVideoX1.5-5B-I2V model configuration with pipeline settings and enable I2V in random model selection.
…neration

- Fix task assignment by using model group's task type instead of uninitialized model_name
- Group models by task type to minimize prompt generator reloading
- Add detailed logging for better monitoring
- Remove duplicate random import
… handling

- Refactor PromptGenerator to add load_vlm() and load_llm() for separate model loading
- Use only load_llm() in DeepFloyd/IF NSFW retry loop to reduce memory usage
- Add is_black_image utility to image_utils.py
- Ensure LLM is loaded before prompt sanitization and clear GPU after
- Minor code cleanup and comments
Release 2.2.11: Add CogVideoX1.5-5B I2V Model Configuration
@benliang99 benliang99 requested a review from aliang322 May 2, 2025 19:01
Copy link
Contributor

@aliang322 aliang322 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@benliang99 benliang99 merged commit b2b82de into main May 2, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants