Skip to content

DSPy Integration#2831

Open
Ryzhtus wants to merge 14 commits intodeepset-ai:mainfrom
Ryzhtus:integration/dspy
Open

DSPy Integration#2831
Ryzhtus wants to merge 14 commits intodeepset-ai:mainfrom
Ryzhtus:integration/dspy

Conversation

@Ryzhtus
Copy link
Contributor

@Ryzhtus Ryzhtus commented Feb 11, 2026

Related Issues

Proposed Changes:

This DSPy integration provides generation components:

DSPyGenerator is the internal base class that holds all the core logic: DSPy LM initialization, module creation (Predict/ChainOfThought/ReAct), serialization, and the run(prompt: str) -> {"replies": List[str]} execution path

DSPyChatGenerator — a Haystack @component for quick prototyping with DSPy signatures. Accepts List[ChatMessage], returns List[ChatMessage]. Wraps DSPy modules (Predict, ChainOfThought, ReAct) behind Haystack's chat generator and generator interface.

DSPyChatGenerator just simply extends DSPyGenerator to follow Haystack's ChatGenerator.

I also want to add in either this PR or the next one another component which I would like to call DSPyProgramRunner. The difference will be that DSPyProgramRunner @component loads and executes pre-compiled/optimized DSPy programs in production pipelines. So it can accept dspy.Module for initialization

How did you test it?

  • unit tests

Checklist

@Ryzhtus Ryzhtus requested a review from a team as a code owner February 11, 2026 22:19
@Ryzhtus Ryzhtus requested review from anakin87 and removed request for a team February 11, 2026 22:19
@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Feb 11, 2026
@Ryzhtus Ryzhtus mentioned this pull request Feb 11, 2026
@anakin87
Copy link
Member

anakin87 commented Feb 12, 2026

Hey @Ryzhtus, thanks a lot for the contribution.

Since the original issue is quite open about how the integration should be structured, I'd prefer to align on the high-level design before implementing it:

  • How do you envision the overall integration?
  • Which components to introduce?
  • A draft usage example would also help.

Feel free to share your thoughts in #1635.

I haven't reviewed the code, but one early note: if possible, let's avoid introducing new Generators. We are switching to Chat Generators only, which are more flexible, and we may eventually deprecate legacy Generators.

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Feb 12, 2026

@anakin87 Hi, sure, let's move our discussion to #1635. I will leave my ideas about the integration and what I've already submitted there

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Feb 23, 2026

Hey @anakin87, I think I finished the first PR and all checks are formally passed. Could you please review it?

Other ideas we've discussed will be implemented in the following PRs

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing my initial comments. Feel free to address them.

(I haven't reviewed this PR in depth yet)

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Feb 24, 2026

@anakin87 Next review round please :)

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left more comments.

I'd like to raise a higher-level question.
I'm struggling a bit to see how Haystack/DSPy users would benefit from this bridge module: what's the main value we're giving them with this integration?

Maybe if you are a DSPy user, I'd love to hear the bigger picture, what am I missing, or what's a good use case you have in mind?

Or maybe the value will become clearer once we implement the DSPyProgramRunner?

Feel free to share opinions...

print(f"Question: {messages[0].text}")
print(f"Answer: {result['llm']['replies'][0].text}\n")


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the ReAct module:

def basic_qa_example():
    """Simple question-answering with Chain-of-Thought reasoning."""

    generator = DSPyChatGenerator(
        model="openai/gpt-5-mini",
        signature=QASignature,
        module_type="ReAct",
        output_field="answer",
    )

    pipeline = Pipeline()
    pipeline.add_component("llm", generator)

    messages = [ChatMessage.from_user("What causes rainbows to appear?")]
    result = pipeline.run({"llm": {"messages": messages}})

    print(f"Question: {messages[0].text}")
    print(f"Answer: {result['llm']['replies'][0].text}\n")

It fails with
TypeError: ReAct.__init__() missing 1 required positional argument: 'tools'

Am I doing something wrong? Is ReAct really supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were no kwargs that could be used to pass tool to ReAct which expects them. Fixes, and wrote an example react_agent_example.py

def to_dict(self) -> dict[str, Any]:
"""Serialize this component to a dictionary."""
kwargs: dict[str, Any] = {
"signature": self._signature_to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this would preserve the complete signature in case it's not a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked on it, you can check signature_serialization_example.py in examples/

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Mar 1, 2026

I'd like to raise a higher-level question. I'm struggling a bit to see how Haystack/DSPy users would benefit from this bridge module: what's the main value we're giving them with this integration?

Maybe if you are a DSPy user, I'd love to hear the bigger picture, what am I missing, or what's a good use case you have in mind?

@anakin87 I took some time to review my initial submission and ideas, so this is what I came up with from my perspective as a user of both Haystack and DSPy:

DSPy excels at prompt optimization and a declarative programming approach to LLMs. Meanwhile, Haystack's main strength lies in data handling and retrieval capabilities - DSPy doesn't really compete here.

The value of this integration is precisely in bridging those two complementary strengths: you get DSPy's optimizable, declarative LLM logic running inside a Haystack pipeline that handles your data and retrieval. Neither framework alone gives you both.

Now, in particular:

DSPy has two main ways of declaring a way LLM should handle your IO:

  1. dspy.Signature — declares how an LLM should handle your IO (essentially structured outputs with semantic field descriptions)
  2. dspy.Module — a composable program that can be automatically optimized given a metric and training examples

I can implement dspy.Signature using SO in OpenAIChatGenerator for example, but I can't directly integrate my DPSy program as another component in a Haystack pipeline. And if I want to optimize my prompt, I have to write a DSPy program, optimize it and then reproduce it in Haystack.

So, my proposal is to submit ChatGenerator, because it is intended to work with Signatures in combination with built-in DSPy module like Predict, CoT or ReAct, but not custom-build dspy.Module.
Then implement DSPyProgramRunner, and then finalize with prompt optimization (I'm still thinking on how better handle this).

P.S. I think according to this logic it would be better to rename components in the following way:

  1. DSPyChatGenerator-> DSPySignatureChatGenerator
  2. DSPyProgramRunner -> DSPyProgramChatGenerator

I believe this will give a user a more intuitive understanding of possibilities/limitations of each component

@anakin87
Copy link
Member

anakin87 commented Mar 2, 2026

@Ryzhtus thank you for sharing your perspective!

I also agree with the renaming proposal.

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Mar 2, 2026

@anakin87 Happy to hear, I will then rename this component and work on your comments + manual testing

@Ryzhtus
Copy link
Contributor Author

Ryzhtus commented Mar 4, 2026

Hey @anakin87 I did the changes. I provided two more examples that address ReAct and Serialization issues, and addressed other issues you've noticed. Please, have a look, maybe I missed something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants