feat: Implement optimization code paths and functionality for initial release by andrewklatzke · Pull Request #140 · launchdarkly/python-server-sdk-ai

andrewklatzke · 2026-04-17T17:18:56Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Related issues

This PR encapsulates all previous changes in the chain of optimization PRs that were broken up into smaller pieces. Consolidating here so that we can have a single commit/release of the package. The PRs were independently reviewed and approved.

Describe the solution you've provided

See:

#116
#117
#119
#122
#127
#128
#130
#135
#139

Note

High Risk
Large new implementation that orchestrates LLM-driven prompt iteration and performs authenticated LaunchDarkly REST API writes (result persistence and variation creation), so failures could impact optimization data and published configurations.

Overview
Implements the initial functional ldai_optimizer package, replacing the prior ldai_optimization scaffold and renaming the distribution/paths accordingly.

Adds a full OptimizationClient that runs iterative optimization loops (agent calls, judge evaluation, variation generation with retries/validation, token/duration tracking) and supports both option-based runs and API config–driven runs with live result persistence.

Introduces an internal LaunchDarkly REST API client (LDApiClient) with retries/redaction plus utilities/prompts/dataclasses to support structured judge/variation I/O, variable placeholder safety, and optional auto-commit of winning variations back to LaunchDarkly.

^{Reviewed by Cursor Bugbot for commit 92f51fa. Bugbot is set up for automated code reviews on this repo. Configure here.}

…andler

…ype, remove required context_choices argument and default to anon

jsonbailey · 2026-04-21T16:23:05Z

+
+            # variation() returns the raw JSON before chevron.render(), so instructions
+            # still contain {{placeholder}} tokens rather than empty strings.
+            raw_variation = self._ldClient._client.variation(agent_key, context, {})


Not a blocker but I would be careful relying on the hidden property. Maybe this is something we need to expose in the public api?

It's not really "hidden" in the sense that the user shouldn't be able to access it if they're determined. It's just a property that's only really used internally. We're using this to re-derive the variables that were present in the initial variation so that the LLM has a stable reference when trying to replace them

pkaeding · 2026-04-21T17:58:52Z

+        Errors are caught and logged rather than raised so that persistence
+        failures never abort an in-progress optimization run.
+


Will this handle things like backof/throttle/retry in the event of 429? Maybe aborting an in-progress optimization is best in cases of permanent errors (eg 401, indicates an invalid token), or even sporadic errors like 403, where certain requests fail, and others succeed, but it would lead to incorrect results?

The api key gets checked up-front with the fetch for model configs etc. so shouldn't error unless it's revoked midway through. Given that, I'd say it makes sense to fail on any 401s here.

I'll add some retry logic for other status codes up to a max and then fail the optimization if we hit max retries

pkaeding · 2026-04-21T18:03:46Z

+                "current_parameters": {
+                    "type": "object",
+                    "description": "The improved agent parameters (e.g., temperature, max_tokens, etc.)",
+                    "additionalProperties": True,


What does this do? Does it allow the LLM to emit arbitrary keys, which then get persisted on agent_optimization_result.parameters and, on auto_commit, pushed as the new AI Config variation's parameters? It looks like we only expect certain values here?

Maybe after extract_json_from_response, filter current_parameters to a known allow-list of LLM call parameters — e.g., {temperature, top_p, max_tokens, presence_penalty, frequency_penalty, stop} — and drop unknown keys with a warning log. This also belongs in the server-side validation on POST /ai-configs/{config_key}/variations.

Let me look into this, it may only exist for posterity at this point (moved away from tool-based validation on these since some providers didn't play nicely with it).

In the case we need to keep it, we unfortunately cannot know what is valid in this parameters object given that the user provides their LLM call. Each provider's parameters differ so we could only allow-list if we knew up-front which provider they were using

pkaeding · 2026-04-21T18:10:13Z

+            "Failed to extract JSON from response. "
+            "Response length: %d, response: %s",
+            len(response_str),
+            response_str,


This is not really safe to log, as it likely contains sensitive info

pkaeding · 2026-04-21T18:22:31Z

 ]
 dependencies = [
    "launchdarkly-server-sdk-ai>=0.16.0",
+    "coolname>=2.0.0",


Do we need this? Why such an old version? Is having cool names for variation keys worth the risk of this dependency? Would the 'cool' names even be useful/meaningful, vs a default date-based name?

Discussed in review meeting; will just hand roll this as its pretty simple to implement

pkaeding · 2026-04-21T18:29:26Z

+    Attempts direct JSON parsing first, then progressively falls back to
+    extracting JSON from markdown code blocks and balanced-brace scanning.


the flexibility here is neat, but it increases the risks of adversarial JSON smuggling.

Would it be possible to get metrics from how many times the fallback methods are used?

Maybe if a fallback method was used, we should check to see if there are unexpected keys in the resulting object, and WARN-log it, to hopefully alert the customer that something is amiss?

Yeah this is basically just fallback parsing if the LLM decides to respond with something like:

Here is your JSON: {...}

Unfortunately since we don't know which provider the user is using for this we can't enforce any structured output (for those that even support it).

I think it's worth emitting a warn on

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7468372. Configure here.}

andrewklatzke · 2026-04-22T21:48:02Z

New commits added with security review results + cursor feedback on those changes:

8b3c69f
7468372

this includes:

Rename of package to ldai_optimizer
Changes target to the ldai_optimizer package for publishing (sec review - package already published)
Adds <untrusted> sentinels around user-supplied or LLM-generated data
Fills out the readme some, adds note about how API keys are used (sec review)
Adds a top level comment with the same data as the readme ^^ (sec review)
Adds a RedactionFilter which will scrub any keys from the loggers
Removes logging of the response in fail state

312161f

Includes:

Retry logic for posting results
Structural validation for the variation we assemble from the LLM response

3bce893

Includes:

Removed unused function that had additionalProperties: True

andrewklatzke · 2026-04-23T21:30:16Z

e8c6692

Includes:

Adding token limit handling

jsonbailey · 2026-04-24T16:46:46Z

@@ -1,7 +1,7 @@
 [project]
-name = "launchdarkly-server-sdk-ai-optimization"
+name = "ldai_optimizer"


I won't block on this but all other launchdarkly python packages have the launchdarkly-* name.

**Requirements** - [x] I have added test coverage for new or changed functionality - [x] I have followed the repository's [pull request submission guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests) - [x] I have validated my changes against all supported platform versions **Describe the solution you've provided** Implements handling for "inverted" judges. **Describe alternatives you've considered** This gets feature parity with our online evals functionality; no alternatives considered. **Additional context** When a metric has `is_inverted` set, it's intended that the evaluation of the score flips from `>=` to` <=`. This adds a util `_judge_passed` to handle that logic and implements it throughout. We don't surface the inverted property in the SDK, so we fetch the judge directly to get this information.  --- > [!NOTE] > **Medium Risk** > Changes core judge pass/fail semantics and adds per-judge REST calls (`get_ai_config`) during config-driven runs, which could affect optimization outcomes and introduce new failure/performance modes if the API is unavailable or slow. > > **Overview** > Adds first-class support for **inverted judges** (where *lower* scores are better) by introducing a shared `judge_passed` helper and using it for pass/fail decisions in `OptimizationClient` and in prompt feedback generation. > > Extends `OptimizationJudge` with an `is_inverted` flag and, for `optimize_from_config`, fetches each judge’s `isInverted` value via `api_client.get_ai_config` when building options. Updates logging to include the inverted status, and adds targeted tests covering the helper, mixed inverted/standard evaluation, config building behavior, and `variation_prompt_feedback` output. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit a8f14de. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>

**Requirements** - [x] I have added test coverage for new or changed functionality - [x] I have followed the repository's [pull request submission guidelines](../blob/main/CONTRIBUTING.md#submitting-pull-requests) - [x] I have validated my changes against all supported platform versions **Describe the solution you've provided** Implements better handling for params on the initial variation (folds model changes in as overwrites rather than completely replacing) and ensures custom params are persisted unchanged. Additionally makes sure that tools cannot be changed by the optimization process. **Describe alternatives you've considered** This is the result of a bug report so there weren't really alternatives considered. **Additional context** Initially it was assumed that the optimization process would properly pull params forward (via the LLM) but this doesn't seem to always be the case. In the case of custom params, they aren't fed into the LLM calls since they're user-specified data (not specific to the actual optimization result). We now just pull these through as-is. In the case of tools, the model will be able to optimize the prompt to call a specific tool if multiple are provided, but we don't want to strip any tool information from the final result as it may be necessary for the calls to function.  --- > [!NOTE] > **Medium Risk** > Moderate risk because it changes how model parameters and `tools` are carried forward across optimization iterations and what gets auto-committed, which can affect runtime agent behavior if merging/restoration logic is wrong. > > **Overview** > Improves variation-application logic so LLM-generated `current_parameters` are **merged** into existing parameters instead of replacing them, preserving user-specified/custom settings (e.g. `max_tokens`, `response_format`) when the LLM omits them. > > Prevents tool drift by always restoring the original `tools` list (and logging when the LLM returns a different one) to avoid silently dropping user tools or leaking internal framework tools. > > Captures `model.custom` from the initial LaunchDarkly variation and includes it when auto-committing a winning variation; adds focused test coverage for parameter persistence, tool restoration/warnings, and `model.custom` propagation. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 572a2aa. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup>

andrewklatzke added 29 commits March 25, 2026 17:09

feat: implements optimize method in SDK, code moved

9859d08

feat: implementation of agent optimization + tests

1712e4f

feat: implement ability to use completions or agents for judge calls

ea596a7

feat: all logs -> debug

2fd55e2

fix: lints + structured output tool rename

8481690

fix: lint + missed variable rename

f8e5509

fix: sort imports

c032aaf

fix: lint

aee6aa7

chore: break up long lines, add spaces where necessary

59c7ac7

chore: break up another long line

59f03f2

chore: fix on_turn path

e2ff561

chore: move prompts to own file, better debug info

af2dd03

chore: update tests, fix cursor feedback

ea43575

feat: implements LD API client, optimize_from_config path

2fecd54

feat: partially implement optimize_from_config

d3e1f96

feat: ground truth optimization path

44c8c59

feat: prevent overfitting via prompt changes and post-processing

8f9f1e2

chore: remove some dead code

a17fd6e

chore: remove provided_tool_handlers code

67fdbf1

fix: adjust iteration logic so validation doesn't consume them

3042984

feat: implement latency & token tracking for optimizations

288336e

feat: add optimization for duration

5d76276

feat: add auto-commit option

4cb8859

chore: add tests

ba369a2

chore: various fixes, improvements for optimization package

149aa76

feat: add shared dataclass for calls so they can be handled by same h…

31c8385

…andler

chore: improve call config, context so they're passable as a single t…

55674ae

…ype, remove required context_choices argument and default to anon

fix: success path + add test, cursor feedback

8f3468f

feat: dx improvements for optimization package

7074cfa

andrewklatzke requested a review from jsonbailey April 17, 2026 17:18

chore: additional lint

937542a

cursor Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/util.py

jsonbailey reviewed Apr 21, 2026

View reviewed changes

jsonbailey approved these changes Apr 21, 2026

View reviewed changes

pkaeding reviewed Apr 21, 2026

View reviewed changes

chore: rename package, imports, and address security review

8b3c69f

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/client.py Outdated

Comment thread packages/optimization/src/ldai_optimizer/util.py Outdated

merge + conflicts

f7ce6d3

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/client.py

chore: cursor feedback, remove unnecessary log of response

7468372

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread packages/optimization/src/ldai_optimizer/dataclasses.py

Comment thread packages/optimization/src/ldai_optimizer/util.py

chore: fix readme inconsistency

a387f83

andrewklatzke added 3 commits April 22, 2026 14:16

chore: add retry logic and structural validation for variations

312161f

chore: remove coolnames, implement generate_slug() cleanup unused files

542c135

chore: removes cleanup unused tool function, remove additionalProperties

3bce893

andrewklatzke requested a review from jsonbailey April 22, 2026 22:44

feat: add token limit handling

e8c6692

jsonbailey reviewed Apr 24, 2026

View reviewed changes

jsonbailey approved these changes Apr 24, 2026

View reviewed changes

andrewklatzke added 5 commits May 4, 2026 15:40

fix: better handling of params and custom params for optimization

572a2aa

feat: adds ability to use inverted judges

0ed243e

feat: use judge_passed for all calcs

a8f14de

		Errors are caught and logged rather than raised so that persistence
		failures never abort an in-progress optimization run.

		Attempts direct JSON parsing first, then progressively falls back to
		extracting JSON from markdown code blocks and balanced-brace scanning.

Conversation

andrewklatzke commented Apr 17, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewklatzke Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andrewklatzke commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewklatzke commented Apr 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andrewklatzke commented Apr 17, 2026 •

edited by cursor Bot

Loading

andrewklatzke Apr 22, 2026 •

edited

Loading

andrewklatzke commented Apr 22, 2026 •

edited

Loading