Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
301 changes: 119 additions & 182 deletions README.md

Large diffs are not rendered by default.

Binary file added images/eps-assist-me-flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions packages/bedrockLoggingConfigFunction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Bedrock Logging Config Function

CloudFormation custom resource Lambda that configures Bedrock model invocation logging to CloudWatch.

## What This Is

A bridge resource.
AWS CloudFormation currently has no native resource type for configuring Bedrock logging. This Lambda bridges that gap by using the Bedrock API directly during stack deployment/update/deletion.

## What This Is Not

- Not a runtime dependency - it only executes during CDK/CloudFormation deployments
- Not the log consumer - it only tells Bedrock *where* to send logs

## Architecture Overview

```mermaid
flowchart LR
CloudFormation -->|Create/Update/Delete| ConfigLambda[bedrockLoggingConfigFunction]
ConfigLambda -->|Put/Delete Logging Config| BedrockAPI[Bedrock API]
```

## Environment Variables

Configured by CDK based on stack parameters.

| Variable | Purpose |
|---|---|
| `ENABLE_LOGGING` | Toggle for enabling/disabling logs (`true` or `false`) |
| `CLOUDWATCH_LOG_GROUP_NAME` | Destination CloudWatch Log Group |
| `CLOUDWATCH_ROLE_ARN` | IAM Role allowing Bedrock to write to CloudWatch |

## Known Constraints

- It affects the Bedrock logging configuration for the *entire AWS region/account* where deployed. If another stack tries to modify Bedrock logging in the same account/region, they will overwrite each other.
49 changes: 49 additions & 0 deletions packages/cdk/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# CDK Infrastructure

AWS Cloud Development Kit (CDK) application defining the EPS Assist Me infrastructure.

## What This Is

The single source of truth for the project's cloud resources.
Provisions the entire bot ecosystem in one deployable stack.

## Architecture

Provisions:

- **API Gateway** - Receives Slack events
- **Lambda Functions** - `slackBotFunction`, `preprocessingFunction`, `syncKnowledgeBaseFunction`, `notifyS3UploadFunction`, `bedrockLoggingConfigFunction`
- **Amazon Bedrock** - Knowledge Base and Data Source configuration
- **OpenSearch Serverless** - Vector database for RAG document embeddings
- **S3 Buckets** - Raw and processed document storage with event notifications
- **DynamoDB** - Bot session state and feedback storage
- **SQS** - Queue for asynchronous processing of document events
- **IAM Roles** - Least-privilege access across services

## Project Structure

- `bin/` CDK app entry point (`EpsAssistMeApp.ts`)
- `constructs/` Reusable Layer 3 (L3) components (e.g. `RestApiGateway`, `LambdaFunction`, `DynamoDbTable`)
- `resources/` L2/L1 definitions grouped by domain (e.g. `VectorKnowledgeBaseResources`, `OpenSearchResources`)
- `stacks/` The actual CloudFormation stack definition (`EpsAssistMeStack`)
- `prompts/` Text templates used to construct Bedrock prompts (System, User, Reformulation)

## Environment Variables

Configured in the stack context (`cdk.json` or via CLI).

| Variable | Purpose |
|---|---|
| `accountId` | Target AWS Account ID |
| `stackName` | CloudFormation stack name |
| `versionNumber` | Stack version |
| `commitId` | Hash for tagging |
| `logRetentionInDays` | CloudWatch retention policy |
| `slackBotToken` | The OAuth token from Slack |
| `slackSigningSecret` | The signing secret from Slack |

## Deployment Notes

- Deployment uses context variables passed during synthesis (`cdk synth --context...`)
- OpenSearch Serverless collections can take around 5-10 minutes to provision
- The Bedrock data source ingestion relies on IAM permissions that might occasionally have propagation delays on first deploy
59 changes: 59 additions & 0 deletions packages/preprocessingFunction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Preprocessing Function

Lambda that converts raw uploaded documents into Markdown format for Bedrock Knowledge Base ingestion.
Runs sequentially when new documents land in the raw S3 bucket prefix.

## What This Is

A document standardisation step.
Converts `.pdf`, `.doc`, `.docx`, `.xls`, `.xlsx`, and `.csv` files into `.md`.
Passes through `.txt`, `.md`, `.html`, and `.json` files untouched.

Output is written to the processed S3 bucket prefix, ready for ingestion.

## Architecture Overview

```mermaid
flowchart LR
S3Raw["S3 (raw/)"] -->|event| Preprocessing[preprocessingFunction]
Preprocessing -->|convert/copy| S3Processed["S3 (processed/)"]
```

Downloads from `raw/`, converts to Markdown locally (in a secure temp directory), and uploads to `processed/`.

## Project Structure

- `app/handler.py` Lambda entry point. Processes S3 records.
- `app/config/config.py` Configuration and environment variables.
- `app/services/` Conversion logic (`converter.py`) and S3 helpers (`s3_client.py`).
- `app/cli.py` Local CLI wrapper for convert-docs.
- `tests/` Unit tests.

## Environment Variables

Set by CDK.

| Variable | Purpose |
|---|---|
| `DOCS_BUCKET_NAME` | S3 bucket containing the documents |
| `RAW_PREFIX` | Prefix where raw uploads land (e.g. `raw/`) |
| `PROCESSED_PREFIX` | Prefix where Markdown output goes (e.g. `processed/`) |
| `AWS_ACCOUNT_ID` | AWS Account ID |

## Running Tests

```bash
cd packages/preprocessingFunction
PYTHONPATH=. poetry run python -m pytest
```

Or from the repo root:

```bash
make test
```

## Known Constraints

- Complex PDFs (heavy formatting, multi-column layouts) may produce imperfect Markdown
- Runs sequentially per uploaded file - large batch uploads may take time to process
204 changes: 51 additions & 153 deletions packages/slackBotFunction/README.md
Original file line number Diff line number Diff line change
@@ -1,179 +1,77 @@
# Slack Bot Function

AWS Lambda function that handles Slack interactions for the EPS Assist Me bot. Provides AI-powered responses to user queries about the NHS EPS API using Amazon Bedrock Knowledge Base.
Lambda that handles all Slack interactions for the EPS Assist Me bot.
Receives events from Slack, queries Bedrock Knowledge Base, returns AI-generated responses.

## Architecture
## What This Is

- **Slack Bolt Framework**: Handles Slack events and interactions
- **Amazon Bedrock**: RAG-based AI responses using knowledge base
- **DynamoDB**: Session management and feedback storage
- **Async Processing**: Self-invoking Lambda for long-running AI queries
The core bot logic. Handles:

## User Interaction Patterns
- `@mentions` in public channels
- direct messages
- thread follow-ups (no re-mention needed)
- feedback (Yes/No buttons and `feedback:` text prefix)

### Starting Conversations
One Lambda. Uses a self-invoking async pattern to handle heavy processing while still acknowledging Slack's 3-second response timeout.

**Public Channels** - Mention the bot:
```
#general channel:
User: "@eps-bot What is EPS API?"
Bot: "EPS API is the Electronic Prescription Service..."
```
## What This Is Not

**Direct Messages** - Send message directly:
```
DM to @eps-bot:
User: "How do I authenticate with EPS?"
Bot: "Authentication requires..."
```
- Not the infrastructure - that's in `packages/cdk/`
- Not the document ingestion pipeline - that's `preprocessingFunction` and `syncKnowledgeBaseFunction`
- Not the upload notifier - that's `notifyS3UploadFunction`

### Follow-up Questions

**In Channel Threads** - No @mention needed after initial conversation:
```
#general channel thread:
User: "@eps-bot What is EPS API?" ← Initial mention required
Bot: "EPS API is..."
User: "Can you explain more about authentication?" ← No mention needed
Bot: "Authentication works by..."
User: "What about error handling?" ← Still no mention needed
```

**In DMs** - Continue messaging naturally:
```
DM conversation:
User: "How do I authenticate?"
Bot: "Use OAuth 2.0..."
User: "What scopes do I need?" ← Natural follow-up
Bot: "Required scopes are..."
```

### Providing Feedback

**Button Feedback** - Click Yes/No on bot responses:
```
Bot: "EPS API requires OAuth authentication..."
[Yes] [No] ← Click buttons
```
## Architecture Overview

**Text Feedback** - Use "feedback:" prefix anytime (applies to most recent bot response):
```
Bot: "EPS API requires OAuth authentication..."
User: "feedback: This was very helpful, thanks!"
User: "feedback: Could you add more error code examples?"
User: "feedback: The authentication section needs clarification"
```mermaid
flowchart LR
SlackEvent[Slack Event] -->|3s timeout| Handler
Handler -->|async| SelfInvoke[Self-invoke]
SelfInvoke --> Bedrock[Bedrock KB]
Bedrock --> Response[Slack Response]
Handler --> DynamoDB[DynamoDB]
```

## Handler Architecture
- **Slack Bolt** for event handling
- **Bedrock Knowledge Base** for RAG responses with guardrails
- **DynamoDB** for session state and feedback storage

- **`mention_handler`**: Processes @mentions in public channels
- **`dm_message_handler`**: Handles direct messages to the bot
- **`thread_message_handler`**: Manages follow-up replies in existing threads
- **`feedback_handler`**: Processes Yes/No button clicks
## Project Structure

### Conversation Flow
```
Channel:
User: "@eps-bot What is EPS?" ← mention_handler
Bot: "EPS is..." [Yes] [No]

├─ User clicks [Yes] ← feedback_handler
│ Bot: "Thank you for your feedback."
├─ User clicks [No] ← feedback_handler
│ Bot: "Please provide feedback:"
│ User: "feedback: Need more examples" ← thread_message_handler
│ Bot: "Thank you for your feedback."
└─ User: "Tell me more" ← thread_message_handler
Bot: "More details..." [Yes] [No]

DM:
User: "How do I authenticate?" ← dm_message_handler
Bot: "Use OAuth..." [Yes] [No]
User clicks [Yes/No] ← feedback_handler
Bot: "Thank you for your feedback."
User: "feedback: Could be clearer" ← dm_message_handler
Bot: "Thank you for your feedback."
User: "What scopes?" ← dm_message_handler
```
- `app/handler.py` Lambda entry point.
- `app/core/` Configuration and environment variables.
- `app/services/` Business logic - Bedrock client, DynamoDB, Slack client, prompt loading, AI processing.
- `app/slack/` Event handlers - mentions, DMs, threads, feedback.
- `app/utils/` Shared utilities.
- `tests/` Unit tests.

## Conversation Flow Rules
## Environment Variables

1. **Public channels**: Must @mention bot to start conversation
2. **Threads**: After initial @mention, no further mentions needed
3. **DMs**: No @mention required, direct messaging
4. **Feedback restrictions**:
- Only available on most recent bot response
- Cannot vote twice on same message (Yes/No)
- Cannot rate old messages after conversation continues
5. **Text feedback**: Use "feedback:" prefix anytime in conversation (multiple comments allowed)
- Feedback applies to the most recent bot message in the conversation
Set by CDK. Don't hardcode these.

## Technical Implementation
| Variable | Purpose |
|---|---|
| `SLACK_BOT_TOKEN_PARAMETER` | Parameter Store path for bot token |
| `SLACK_SIGNING_SECRET_PARAMETER` | Parameter Store path for signing secret |
| `SLACK_BOT_STATE_TABLE` | DynamoDB table name |
| `KNOWLEDGEBASE_ID` | Bedrock Knowledge Base ID |
| `RAG_MODEL_ID` | Bedrock model ARN |
| `GUARD_RAIL_ID` | Bedrock guardrail ID |

### Event Processing Flow
```
Slack Event → Handler (3s timeout) → Async Lambda → Bedrock → Response
```
## Running Tests

### Data Storage
- **Sessions**: 30-day TTL for conversation continuity
- **Q&A Pairs**: 90-day TTL for feedback correlation
- **Feedback**: 90-day TTL for analytics
- **Event Dedup**: 1-hour TTL for retry handling

### Privacy Features
- **Automatic cleanup**: Q&A pairs without feedback are deleted when new messages arrive (reduces data retention by 70-90%)
- **Data minimisation**: Configurable TTLs automatically expire old data
- **Secure credentials**: Slack tokens stored in AWS Parameter Store

### Feedback Protection
- **Latest message only**: Users can only rate the most recent bot response in each conversation
- **Duplicate prevention**: Users cannot vote twice on the same message (Yes/No buttons)
- **Multiple text feedback**: Users can provide multiple detailed comments using "feedback:" prefix

## Configuration

### Environment Variables
- `SLACK_BOT_TOKEN_PARAMETER`: Parameter Store path for bot token
- `SLACK_SIGNING_SECRET_PARAMETER`: Parameter Store path for signing secret
- `SLACK_BOT_STATE_TABLE`: DynamoDB table name
- `KNOWLEDGEBASE_ID`: Bedrock Knowledge Base ID
- `RAG_MODEL_ID`: Bedrock model ARN
- `GUARD_RAIL_ID`: Bedrock guardrail ID

### DynamoDB Schema
```
Primary Key: pk (partition key), sk (sort key)

Sessions: pk="thread#C123#1234567890", sk="session"
Q&A Pairs: pk="qa#thread#C123#1234567890#1234567891", sk="turn"
Feedback: pk="feedback#thread#C123#1234567890#1234567891", sk="user#U123"
Text Notes: pk="feedback#thread#C123#1234567890#1234567891", sk="user#U123#note#1234567892"
```bash
cd packages/slackBotFunction
PYTHONPATH=. poetry run python -m pytest
```

## Development
Or from the repo root:

### Local Testing
```bash
# Install dependencies
npm install

# Run tests
npm test

# Deploy to dev environment
make cdk-deploy STACK_NAME=your-dev-stack
make test
```

### Debugging
- Check CloudWatch logs for Lambda execution details
- Monitor DynamoDB for session and feedback data

## Monitoring

- **CloudWatch Logs**: `/aws/lambda/{stack-name}-SlackBotFunction`
- **DynamoDB Metrics**: Built-in AWS metrics for table operations
## Known Constraints

**Note**: No automated alerts configured. Uses AWS built-in metrics and manual log review.
- Slack enforces a 3-second response window. A quick acknowledgement is required, but how the subsequent background processing is handled (like the async self-invoke pattern) is an architectural choice.
- Bedrock guardrails can block legitimate queries if they hit content filters - check CloudWatch logs
- Session state lives in DynamoDB with TTLs - conversations expire after 30 days
Loading
Loading