Conversation
|
Hey @Ryzhtus, thanks a lot for the contribution. Since the original issue is quite open about how the integration should be structured, I'd prefer to align on the high-level design before implementing it:
Feel free to share your thoughts in #1635. I haven't reviewed the code, but one early note: if possible, let's avoid introducing new Generators. We are switching to Chat Generators only, which are more flexible, and we may eventually deprecate legacy Generators. |
|
Hey @anakin87, I think I finished the first PR and all checks are formally passed. Could you please review it? Other ideas we've discussed will be implemented in the following PRs |
anakin87
left a comment
There was a problem hiding this comment.
Sharing my initial comments. Feel free to address them.
(I haven't reviewed this PR in depth yet)
integrations/dspy/src/haystack_integrations/components/generators/dspy/__init__.py
Outdated
Show resolved
Hide resolved
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/__init__.py
Outdated
Show resolved
Hide resolved
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
|
@anakin87 Next review round please :) |
There was a problem hiding this comment.
I left more comments.
I'd like to raise a higher-level question.
I'm struggling a bit to see how Haystack/DSPy users would benefit from this bridge module: what's the main value we're giving them with this integration?
Maybe if you are a DSPy user, I'd love to hear the bigger picture, what am I missing, or what's a good use case you have in mind?
Or maybe the value will become clearer once we implement the DSPyProgramRunner?
Feel free to share opinions...
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
| print(f"Question: {messages[0].text}") | ||
| print(f"Answer: {result['llm']['replies'][0].text}\n") | ||
|
|
||
|
|
There was a problem hiding this comment.
I tried to use the ReAct module:
def basic_qa_example():
"""Simple question-answering with Chain-of-Thought reasoning."""
generator = DSPyChatGenerator(
model="openai/gpt-5-mini",
signature=QASignature,
module_type="ReAct",
output_field="answer",
)
pipeline = Pipeline()
pipeline.add_component("llm", generator)
messages = [ChatMessage.from_user("What causes rainbows to appear?")]
result = pipeline.run({"llm": {"messages": messages}})
print(f"Question: {messages[0].text}")
print(f"Answer: {result['llm']['replies'][0].text}\n")It fails with
TypeError: ReAct.__init__() missing 1 required positional argument: 'tools'
Am I doing something wrong? Is ReAct really supported?
There was a problem hiding this comment.
There were no kwargs that could be used to pass tool to ReAct which expects them. Fixes, and wrote an example react_agent_example.py
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
integrations/dspy/src/haystack_integrations/components/generators/dspy/chat/chat_generator.py
Outdated
Show resolved
Hide resolved
| def to_dict(self) -> dict[str, Any]: | ||
| """Serialize this component to a dictionary.""" | ||
| kwargs: dict[str, Any] = { | ||
| "signature": self._signature_to_string(), |
There was a problem hiding this comment.
It's not clear to me if this would preserve the complete signature in case it's not a string
There was a problem hiding this comment.
Worked on it, you can check signature_serialization_example.py in examples/
@anakin87 I took some time to review my initial submission and ideas, so this is what I came up with from my perspective as a user of both Haystack and DSPy: DSPy excels at prompt optimization and a declarative programming approach to LLMs. Meanwhile, Haystack's main strength lies in data handling and retrieval capabilities - DSPy doesn't really compete here. The value of this integration is precisely in bridging those two complementary strengths: you get DSPy's optimizable, declarative LLM logic running inside a Haystack pipeline that handles your data and retrieval. Neither framework alone gives you both. Now, in particular: DSPy has two main ways of declaring a way LLM should handle your IO:
I can implement dspy.Signature using SO in OpenAIChatGenerator for example, but I can't directly integrate my DPSy program as another component in a Haystack pipeline. And if I want to optimize my prompt, I have to write a DSPy program, optimize it and then reproduce it in Haystack. So, my proposal is to submit ChatGenerator, because it is intended to work with Signatures in combination with built-in DSPy module like Predict, CoT or ReAct, but not custom-build dspy.Module. P.S. I think according to this logic it would be better to rename components in the following way:
I believe this will give a user a more intuitive understanding of possibilities/limitations of each component |
|
@Ryzhtus thank you for sharing your perspective! I also agree with the renaming proposal. |
|
@anakin87 Happy to hear, I will then rename this component and work on your comments + manual testing |
|
Hey @anakin87 I did the changes. I provided two more examples that address ReAct and Serialization issues, and addressed other issues you've noticed. Please, have a look, maybe I missed something |
Related Issues
Proposed Changes:
This DSPy integration provides generation components:
DSPyGenerator is the internal base class that holds all the core logic: DSPy LM initialization, module creation (Predict/ChainOfThought/ReAct), serialization, and the run(prompt: str) -> {"replies": List[str]} execution path
DSPyChatGenerator — a Haystack @component for quick prototyping with DSPy signatures. Accepts List[ChatMessage], returns List[ChatMessage]. Wraps DSPy modules (Predict, ChainOfThought, ReAct) behind Haystack's chat generator and generator interface.
DSPyChatGenerator just simply extends DSPyGenerator to follow Haystack's ChatGenerator.
I also want to add in either this PR or the next one another component which I would like to call DSPyProgramRunner. The difference will be that DSPyProgramRunner @component loads and executes pre-compiled/optimized DSPy programs in production pipelines. So it can accept dspy.Module for initialization
How did you test it?
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.