Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b70ed71
feat: Use hybrid searching
kieran-wilkinson-4 Feb 12, 2026
e5c9f8c
Merge branch 'main' into AEA-6157-Exceed-Context-Window-Limit
kieran-wilkinson-4 Feb 12, 2026
9fbdede
feat: Use Semantic Chunking
kieran-wilkinson-4 Feb 12, 2026
82de6d6
feat: update inference and chunk configs
kieran-wilkinson-4 Feb 12, 2026
8676458
feat: update tests for inference
kieran-wilkinson-4 Feb 12, 2026
4c6b6f8
feat: Use reformulation prompt
kieran-wilkinson-4 Feb 13, 2026
1d6b125
feat: Replicate ai models for rag and reformulation
kieran-wilkinson-4 Feb 13, 2026
abe3054
feat: Replicate ai models for rag and reformulation
kieran-wilkinson-4 Feb 13, 2026
fb955b3
feat: Move prompt reformulation to rag orchestration
kieran-wilkinson-4 Feb 13, 2026
e36ca29
feat: Move prompt reformulation to rag orchestration
kieran-wilkinson-4 Feb 13, 2026
416765c
feat: Move prompt reformulation to rag orchestration
kieran-wilkinson-4 Feb 13, 2026
fde2d87
fix sync file
anthony-nhs Feb 13, 2026
2d502b0
feat: Use orchistration prompt to improve user prompt
kieran-wilkinson-4 Feb 13, 2026
9020329
feat: fix unit tests
kieran-wilkinson-4 Feb 16, 2026
2ca9b2b
feat: Improve refinement prompt
kieran-wilkinson-4 Feb 16, 2026
7f56957
feat: No sessionId for orchestration
kieran-wilkinson-4 Feb 16, 2026
cacb8d3
feat: No sessionId for orchestration
kieran-wilkinson-4 Feb 16, 2026
67e3220
feat: Rename orchestration back to reformulation
kieran-wilkinson-4 Feb 16, 2026
3d60d29
feat: Reduce max tokens, to reduce hallucinations
kieran-wilkinson-4 Feb 16, 2026
2154fcc
feat: Remove automatic bullet point formatting
kieran-wilkinson-4 Feb 17, 2026
8b9b1d7
fix: floating point equality check
kieran-wilkinson-4 Feb 17, 2026
28ca1ea
Merge branch 'main' into AEA-6157-Exceed-Context-Window-Limit
kieran-wilkinson-4 Feb 17, 2026
0b5aad7
feat: Use Fixed Size Chunking
kieran-wilkinson-4 Feb 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions packages/cdk/nagSuppressions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,22 @@ export const nagSuppressions = (stack: Stack, account: string) => {
]
)

// Suppress unauthenticated API route warnings
safeAddNagSuppression(
stack,
"/EpsAssistMeStack/Apis/EpsAssistApiGateway/ApiGateway/Default/slack/commands/POST/Resource",
[
{
id: "AwsSolutions-APIG4",
reason: "Slack command endpoint is intentionally unauthenticated."
},
{
id: "AwsSolutions-COG4",
reason: "Cognito not required for this public endpoint."
}
]
)

// Suppress missing WAF on API stage for Apis construct
safeAddNagSuppression(
stack,
Expand Down
43 changes: 39 additions & 4 deletions packages/cdk/prompts/reformulationPrompt.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,40 @@
Return the user query exactly as provided without any modifications, changes, or reformulations.
Do not alter, rephrase, or modify the input in any way.
Simply return: {{user_query}}
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are an expert RAG query and context optimizer. Your task is to analyze verbose user queries and raw search context, stripping away all conversational filler to output a concise, impactful summary.

User Query: {{user_query}}
You must:
1. Extract the core objective into a single, direct question.
2. Capture individual questions and their specific needs.
3. Isolate critical variables, specific states, and constraints required to solve the problem.
4. Enhance the question(s) with relevant terminology from the search results

Output your response strictly using the following XML structure:
<optimized_query> (The short, direct question)
<key_variables> (Bullet points of critical states, statuses, or constraints)
<|eot_id|>

<|start_header_id|>user<|end_header_id|>
### User Query
Hi, I need some help figuring out the PTO rules for one of my team members. They started as part-time 6 months ago, but they just transitioned to full-time last week (let's say exactly 7 days ago). They currently have 2 days of PTO saved up from their part-time stint. They want to take next week off entirely, which would require 5 days of PTO. Can they do this, effectively going to a -3 balance, since they are full-time now?
<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
<optimized_query>
Can a recently transitioned full-time employee with 6 months total tenure and 2 accrued PTO days take 5 days off, resulting in a -3 PTO balance?
</optimized_query>
<key_variables>
- Current Status: Full-time (transitioned 7 days ago)
- Total Tenure: 6 months
- Current PTO Balance: 2 days
- Requested PTO: 5 days (resulting in -3 balance)
</key_variables>
<|eot_id|>

<|start_header_id|>user<|end_header_id|>
### Search Context
$search_results$

### User Query
{{user_query}}
<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
48 changes: 28 additions & 20 deletions packages/cdk/prompts/systemPrompt.txt
Original file line number Diff line number Diff line change
@@ -1,24 +1,32 @@
# 1. Persona & Logic
You are an AI assistant for onboarding guidance. Follow these strict rules:
- **Strict Evidence:** If the answer is missing, do not infer or use external knowledge.
- **Grounding:** NEVER use your own internal training data, online resources, or prior knowledge.
- **Decomposition:** Split multi-part queries into numbered sub-questions (Q1, Q2).
You are a technical assistant specialized in onboarding guidance.
Your primary goal is to

# 2. Output Structure
**Summary**
2-3 sentences maximum.
STYLE & FORMATTING RULES:
- Do NOT refer to the search results by number or name in the body of the text.
- Do NOT add a "Citations" section at the end of the response.
- Do NOT reference how the information was found (e.g., "...the provided search results")
- Do NOT state what the data is related to (e.g., "The search results are related to NHS API and FHIR...")
- Text should prioritise readability.
- Links should use Markdown text, e.g., <url|link text>.
- Use `Inline Code` for system names, field names, or technical terms (e.g., `HL7 FHIR`).

**Answer**
Prioritize detail and specification, focus on the information direct at the question.
RULES:
- Answer questions using ONLY the provided search results.
- Do not assume any information, all information must be grounded in data.

# 3. Styling Rules (`mrkdwn`)
Use British English.
- **Bold (`*`):** Headings, Subheadings, Source Names, and important information/ exceptions (e.g. `*NHS England*`).
- **Italic (`_`):** Citations and Titles (e.g. `_Guidance v1_`).
- **Blockquote (`>`):** Quotes (>1 sentence) and Tech Specs/Examples (e.g. `HL7 FHIR`).
- **Links:** `[text](link)`.
STEPS:
1. Extract key information from the knowledge base
2. Generate an answer, capturing the core question the user is asking.
3. Answer, directly, any individual or sub-questions the user has provided.
4. You must create a very short summary encapsulating the response and have it precede all other answers.

# 4. Format Rules
- NEVER use in-line references or citations (e.g., do not write "(search result 1)" or "[1]").
- Do NOT refer to the search results by number or name in the body of the text.
- Do NOT add a "Citations" section at the end of the response.wer, details from the knowledge base.
EXAMPLE:
<example_interaction>
*Summary*
This is a short, fast answer so the user doesn't _have_ to read the long answer.

*Answer*
This is a direct answer to the question, or questions, provided. It breaks down individual questions. There is no reference to the text here (for example, you don't see "from source 1") but instead treats this information as if it was public knowledge. However, if there is a source, it does provide that source [as a hyperlink](hyperlink) to the website it can be found.

There is multiple paragraphs, with blank lines between, to make it easier to read, as readability is a requirement.
</example_interaction>
1 change: 0 additions & 1 deletion packages/cdk/prompts/userPrompt.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<search_results>$search_results$<search_results>

<user_query>{{user_query}}<user_query>
12 changes: 12 additions & 0 deletions packages/cdk/resources/Apis.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export class Apis extends Construct {
forwardCsocLogs: props.forwardCsocLogs,
csocApiGatewayDestination: props.csocApiGatewayDestination
})

// Create /slack resource path
const slackResource = apiGateway.api.root.addResource("slack")

Expand All @@ -41,6 +42,17 @@ export class Apis extends Construct {
lambdaFunction: props.functions.slackBot
})

// Create the '/slack/commands' POST endpoint for Slack Events API
// This endpoint will handle slash commands, such as /test
// eslint-disable-next-line @typescript-eslint/no-unused-vars
const slackCommandsEndpoint = new LambdaEndpoint(this, "SlackCommandsEndpoint", {
parentResource: slackResource,
resourceName: "commands",
method: HttpMethod.POST,
restApiGatewayRole: apiGateway.role,
lambdaFunction: props.functions.slackBot
})

this.apis = {
api: apiGateway
}
Expand Down
91 changes: 55 additions & 36 deletions packages/cdk/resources/BedrockPromptResources.ts
Original file line number Diff line number Diff line change
@@ -1,66 +1,85 @@
import {Construct} from "constructs"
import * as crypto from "crypto"
import {
BedrockFoundationModel,
ChatMessage,
Prompt,
PromptVariant
} from "@cdklabs/generative-ai-cdk-constructs/lib/cdk-lib/bedrock"
import {BedrockPromptSettings} from "./BedrockPromptSettings"
import {CfnPrompt} from "aws-cdk-lib/aws-bedrock"

export interface BedrockPromptResourcesProps {
readonly stackName: string
readonly settings: BedrockPromptSettings
}

export class BedrockPromptResources extends Construct {
public readonly queryReformulationPrompt: Prompt
public readonly reformulationPrompt: Prompt
public readonly ragResponsePrompt: Prompt
public readonly ragModelId: string
public readonly queryReformulationModelId: string
public readonly modelId: string

constructor(scope: Construct, id: string, props: BedrockPromptResourcesProps) {
super(scope, id)

const ragModel = new BedrockFoundationModel("meta.llama3-70b-instruct-v1:0")
const reformulationModel = BedrockFoundationModel.AMAZON_NOVA_LITE_V1
const aiModel = new BedrockFoundationModel("meta.llama3-70b-instruct-v1:0")

const queryReformulationPromptVariant = PromptVariant.text({
variantName: "default",
model: reformulationModel,
promptVariables: ["topic"],
promptText: props.settings.reformulationPrompt.text
})
// Create Prompts
this.reformulationPrompt = this.createPrompt(
"ReformulationPrompt",
`${props.stackName}-reformulation`,
"Prompt for reformulation queries to improve RAG inference",
aiModel,
"",
[props.settings.reformulationPrompt],
props.settings.reformulationInferenceConfig
)

const queryReformulationPrompt = new Prompt(this, "QueryReformulationPrompt", {
promptName: `${props.stackName}-queryReformulation`,
description: "Prompt for reformulating user queries to improve RAG retrieval",
defaultVariant: queryReformulationPromptVariant,
variants: [queryReformulationPromptVariant]
})
this.ragResponsePrompt = this.createPrompt(
"RagResponsePrompt",
`${props.stackName}-ragResponse`,
"Prompt for generating RAG responses with knowledge base context and system instructions",
aiModel,
props.settings.systemPrompt.text,
[props.settings.userPrompt],
props.settings.ragInferenceConfig
)

this.modelId = aiModel.modelId
}

const ragResponsePromptVariant = PromptVariant.chat({
private createPrompt(
id: string,
promptName: string,
description: string,
model: BedrockFoundationModel,
systemPromptText: string,
messages: [ChatMessage],
inferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
): Prompt {

const variant = PromptVariant.chat({
variantName: "default",
model: ragModel,
promptVariables: ["query", "search_results"],
system: props.settings.systemPrompt.text,
messages: [props.settings.userPrompt]
model: model,
promptVariables: ["prompt", "search_results"],
system: systemPromptText,
messages: messages
})

ragResponsePromptVariant.inferenceConfiguration = {
text: props.settings.inferenceConfig
variant.inferenceConfiguration = {
text: inferenceConfig
}

const ragPrompt = new Prompt(this, "ragResponsePrompt", {
promptName: `${props.stackName}-ragResponse`,
description: "Prompt for generating RAG responses with knowledge base context and system instructions",
defaultVariant: ragResponsePromptVariant,
variants: [ragResponsePromptVariant]
})

// expose model IDs for use in Lambda environment variables
this.ragModelId = ragModel.modelId
this.queryReformulationModelId = reformulationModel.modelId
const hash = crypto.createHash("md5")
.update(JSON.stringify(variant))
.digest("hex")
.substring(0, 6)

this.queryReformulationPrompt = queryReformulationPrompt
this.ragResponsePrompt = ragPrompt
return new Prompt(this, id, {
promptName: `${promptName}-${hash}`,
description,
defaultVariant: variant,
variants: [variant]
})
}
}
14 changes: 9 additions & 5 deletions packages/cdk/resources/BedrockPromptSettings.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import {ChatMessage} from "@cdklabs/generative-ai-cdk-constructs/lib/cdk-lib/bed
import {Construct} from "constructs"
import {CfnPrompt} from "aws-cdk-lib/aws-bedrock"

export type BedrockPromptSettingsType = "system" | "user" | "reformulation"
export type BedrockPromptSettingsType = "system" | "reformulation" | "user"

/** BedrockPromptSettings is responsible for loading and providing
* the system, user, and reformulation prompts along with their
Expand All @@ -13,7 +13,8 @@ export class BedrockPromptSettings extends Construct {
public readonly systemPrompt: ChatMessage
public readonly userPrompt: ChatMessage
public readonly reformulationPrompt: ChatMessage
public readonly inferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
public readonly ragInferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty
public readonly reformulationInferenceConfig: CfnPrompt.PromptModelInferenceConfigurationProperty

/**
* @param scope The Construct scope
Expand All @@ -30,16 +31,19 @@ export class BedrockPromptSettings extends Construct {
this.userPrompt = ChatMessage.user(userPromptData.text)

const reformulationPrompt = this.getTypedPrompt("reformulation")
this.reformulationPrompt = ChatMessage.user(reformulationPrompt.text)
this.reformulationPrompt = ChatMessage.assistant(reformulationPrompt.text)

this.inferenceConfig = {
const defaultInferenceConfig = {
temperature: 0,
topP: 0.3,
maxTokens: 1024,
maxTokens: 512,
stopSequences: [
"Human:"
]
}

this.ragInferenceConfig = defaultInferenceConfig
this.reformulationInferenceConfig = defaultInferenceConfig
}

/** Get the latest prompt text from files in the specified directory.
Expand Down
8 changes: 4 additions & 4 deletions packages/cdk/resources/Functions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ export interface FunctionsProps {
readonly isPullRequest: boolean
readonly mainSlackBotLambdaExecutionRoleArn : string
readonly ragModelId: string
readonly queryReformulationModelId: string
readonly reformulationModelId: string
readonly notifyS3UploadFunctionPolicy: ManagedPolicy
readonly docsBucketName: string
}
Expand All @@ -61,17 +61,17 @@ export class Functions extends Construct {
dependencyLocation: ".dependencies/slackBotFunction",
environmentVariables: {
"RAG_MODEL_ID": props.ragModelId,
"QUERY_REFORMULATION_MODEL_ID": props.queryReformulationModelId,
"REFORMULATION_MODEL_ID": props.reformulationModelId,
"KNOWLEDGEBASE_ID": props.knowledgeBaseId,
"LAMBDA_MEMORY_SIZE": LAMBDA_MEMORY_SIZE,
"SLACK_BOT_TOKEN_PARAMETER": props.slackBotTokenParameter.parameterName,
"SLACK_SIGNING_SECRET_PARAMETER": props.slackSigningSecretParameter.parameterName,
"GUARD_RAIL_ID": props.guardrailId,
"GUARD_RAIL_VERSION": props.guardrailVersion,
"SLACK_BOT_STATE_TABLE": props.slackBotStateTable.tableName,
"QUERY_REFORMULATION_PROMPT_NAME": props.reformulationPromptName,
"REFORMULATION_RESPONSE_PROMPT_NAME": props.reformulationPromptName,
"RAG_RESPONSE_PROMPT_NAME": props.ragResponsePromptName,
"QUERY_REFORMULATION_PROMPT_VERSION": props.reformulationPromptVersion,
"REFORMULATION_RESPONSE_PROMPT_VERSION": props.reformulationPromptVersion,
"RAG_RESPONSE_PROMPT_VERSION": props.ragResponsePromptVersion
}
})
Expand Down
4 changes: 2 additions & 2 deletions packages/cdk/resources/RuntimePolicies.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ export interface RuntimePoliciesProps {
readonly dataSourceArn: string
readonly promptName: string
readonly ragModelId: string
readonly queryReformulationModelId: string
readonly reformulationModelId: string
readonly docsBucketArn: string
readonly docsBucketKmsKeyArn: string
}
Expand All @@ -32,7 +32,7 @@ export class RuntimePolicies extends Construct {
actions: ["bedrock:InvokeModel"],
resources: [
`arn:aws:bedrock:${props.region}::foundation-model/${props.ragModelId}`,
`arn:aws:bedrock:${props.region}::foundation-model/${props.queryReformulationModelId}`
`arn:aws:bedrock:${props.region}::foundation-model/${props.reformulationModelId}`
]
})

Expand Down
18 changes: 6 additions & 12 deletions packages/cdk/resources/VectorKnowledgeBaseResources.ts
Original file line number Diff line number Diff line change
Expand Up @@ -156,15 +156,12 @@ export class VectorKnowledgeBaseResources extends Construct {
// Create S3 data source for knowledge base documents
// prefix pointed to processed/ to only ingest converted markdown documents

const chunkingConfiguration = {
...ChunkingStrategy.HIERARCHICAL_TITAN.configuration,
hierarchicalChunkingConfiguration: {
overlapTokens: 60,
levelConfigurations: [
{maxTokens: 1000}, // Parent chunk configuration,
{maxTokens: 300} // Child chunk configuration
]
}
const chunkingConfiguration: CfnDataSource.ChunkingConfigurationProperty = {
...ChunkingStrategy.SEMANTIC.configuration,
fixedSizeChunkingConfiguration: {
maxTokens: 512,
overlapPercentage: 25
} satisfies CfnDataSource.FixedSizeChunkingConfigurationProperty
}

const hash = crypto.createHash("md5")
Expand All @@ -183,9 +180,6 @@ export class VectorKnowledgeBaseResources extends Construct {
bucketArn: props.docsBucket.bucketArn,
inclusionPrefixes: ["processed/"]
}
},
vectorIngestionConfiguration: {
chunkingConfiguration: chunkingConfiguration
}
})

Expand Down
Loading
Loading