Describe the feature you'd like
Currently customers cannot use GRPO for Qwen model customization for models using long context (more than 2048 tokens)
How would this feature be used? Please describe.
Should work similar to JumpStart training implementation:
estimator = JumpStartEstimator(
model_id="meta-textgeneration-llama-2-7b",
hyperparameters={
"max_input_length": "4096", # Update context length here
"max_total_tokens": "4096"
}
)
Describe alternatives you've considered
Using SM training jobs for GRPO training with recipes.
Additional context
Customer: Intuit (a high-priority issue blocking usage of SM serverless customization service).