-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers
Description
Lack of word diversity
- When generating many samples, we observe words or structure that are being repeated.
Describe the solution you'd like
- We should explore the option to extract and and count keywords and add them back to prompt to prevent them from being overly used by LLMs by injecting in the prompt something like "Avoid using the following common words {most common words}". Maybe we can even do same deduplication of the keywords or do something like pagerank to get truely original words vs deduplication.
- Similarly we could try to explore the structure of generated user query to encourage diversity or to rewrite them. Here we could use n-grams, beginning-of-sentence that often repeat.
What to do
Before implementing it would be great to experiment some of these ew different approaches in a few simple notebooks.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomers