Skip to content

llama stack vector_store operations in lightspeed. Enhancing URL fetc…#2742

Open
JslYoon wants to merge 6 commits intoredhat-developer:mainfrom
JslYoon:JslYoon-notebooksPermissions
Open

llama stack vector_store operations in lightspeed. Enhancing URL fetc…#2742
JslYoon wants to merge 6 commits intoredhat-developer:mainfrom
JslYoon:JslYoon-notebooksPermissions

Conversation

@JslYoon
Copy link
Copy Markdown
Contributor

@JslYoon JslYoon commented Apr 9, 2026

…h securities. Update request/response schemas

Hey, I just made a Pull Request!

✔️ Checklist

  • A changeset describing the change and affected packages. (more info)
  • Added or Updated documentation
  • Tests for new functionality and regression tests for bug fixes
  • Screenshots attached (for UI changes)

…h securities. Update request/response schemas

Signed-off-by: Lucas <lyoon@redhat.com>
@rhdh-gh-app
Copy link
Copy Markdown

rhdh-gh-app bot commented Apr 9, 2026

Missing Changesets

The following package(s) are changed by this PR but do not have a changeset:

  • @red-hat-developer-hub/backstage-plugin-lightspeed-backend

See CONTRIBUTING.md for more information about how to add changesets.

Changed Packages

Package Name Package Path Changeset Bump Current Version
@red-hat-developer-hub/backstage-plugin-lightspeed-backend workspaces/lightspeed/plugins/lightspeed-backend none v1.4.0

@rhdh-qodo-merge
Copy link
Copy Markdown

Review Summary by Qodo

Migrate to lightspeed-core vector store proxy with security enhancements

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Migrate vector store operations from direct llama-stack to lightspeed-core proxy
• Add security hardening with SSRF protection and prompt injection detection
• Implement VectorStoresOperator HTTP client for lightspeed-core REST API
• Refactor document service to use file upload API with streaming and size limits
• Add comprehensive content sanitization for RAG security
Diagram
flowchart LR
  A["Direct Llama Stack"] -->|"DEPRECATED"| B["Document Service"]
  C["VectorStoresOperator"] -->|"HTTP Proxy"| D["Lightspeed-Core"]
  D -->|"REST API"| E["Vector Stores"]
  B -->|"REFACTORED"| C
  F["Security Layer"] -->|"SSRF Check"| C
  F -->|"Prompt Injection Filter"| C
  G["File Upload"] -->|"Streaming"| C
Loading

Grey Divider

File Changes

1. workspaces/lightspeed/plugins/lightspeed-backend/__fixtures__/lightspeedCoreHandlers.ts 🧪 Tests +237/-0

New MSW handlers for lightspeed-core vector store endpoints

workspaces/lightspeed/plugins/lightspeed-backend/fixtures/lightspeedCoreHandlers.ts


2. workspaces/lightspeed/plugins/lightspeed-backend/__fixtures__/llamaStackHandlers.ts 🧪 Tests +0/-226

Removed deprecated llama stack mock handlers

workspaces/lightspeed/plugins/lightspeed-backend/fixtures/llamaStackHandlers.ts


3. workspaces/lightspeed/plugins/lightspeed-backend/src/service/constant.ts ⚙️ Configuration changes +79/-1

Add security constants and HTTP configuration

workspaces/lightspeed/plugins/lightspeed-backend/src/service/constant.ts


View more (15)
4. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts ✨ Enhancement +446/-0

New HTTP client wrapper for lightspeed-core vector store API

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts


5. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentHelpers.test.ts 🧪 Tests +104/-3

Add security tests for prompt injection and content sanitization

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentHelpers.test.ts


6. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentHelpers.ts Security enhancement +50/-34

Add prompt injection detection and content sanitization functions

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentHelpers.ts


7. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.test.ts 🧪 Tests +178/-99

Update tests to use VectorStoresOperator and lightspeed-core handlers

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.test.ts


8. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts ✨ Enhancement +117/-316

Refactor to use VectorStoresOperator instead of direct llama-stack

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts


9. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/fileParser.ts ✨ Enhancement +110/-19

Add streaming URL fetch with size limits and content sanitization

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/fileParser.ts


10. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksResponses.ts Miscellaneous +14/-1

Update import paths for response types

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksResponses.ts


11. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts 🧪 Tests +4/-5

Update tests to use lightspeed-core handlers

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts


12. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts ✨ Enhancement +124/-77

Refactor router to use VectorStoresOperator and add query retry logic

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts


13. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.test.ts 🧪 Tests +13/-6

Update tests to use VectorStoresOperator and lightspeed-core

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.test.ts


14. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts ✨ Enhancement +98/-26

Refactor to use VectorStoresOperator with document count tracking

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts


15. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/types/notebooksTypes.ts ✨ Enhancement +9/-5

Simplify session types and add document count field

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/types/notebooksTypes.ts


16. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/utils.test.ts 🧪 Tests +0/-85

Remove sanitizeTitle tests as function moved to documentHelpers

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/utils.test.ts


17. workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/utils.ts Miscellaneous +18/-18

Remove sanitizeTitle function and update metadata extraction

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/utils.ts


18. workspaces/lightspeed/plugins/lightspeed-backend/app-config.yaml ⚙️ Configuration changes +7/-0

Add configuration options for query defaults and session defaults

workspaces/lightspeed/plugins/lightspeed-backend/app-config.yaml


Grey Divider

Qodo Logo

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge bot commented Apr 9, 2026

Code Review by Qodo

🐞 Bugs (4)   📘 Rule violations (0)   📎 Requirement gaps (0)
🐞\ ≡ Correctness (2) ⚙ Maintainability (1) ➹ Performance (1)

Grey Divider


Action required

1. Document ID API inconsistent🐞
Description
Document upload now returns document_id as the raw title, but status/delete endpoints and service
lookups are inconsistent about whether the identifier is a title or a sanitized ID. This will cause
document status/deletes to fail for existing callers/tests that still use slug IDs (e.g. test-doc
/ test-document).
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[R337-358]

+      // Return 202 Accepted immediately for async processing
+      res.status(HTTP_STATUS_ACCEPTED).json({
        status: 'processing',
-        document_id: finalDocumentId,
+        document_id: newTitle || title,
        session_id: sessionId,
        message: 'Document upload started',
      });

-      // upload document to vector store in background
+      // Upload document to vector store in background
+      logger.info(`Starting background upload for ${newTitle || title}`);
      documentService
-        .upsertDocument(
-          sessionId,
-          title,
-          parsedDocument.content,
-          parsedDocument.metadata,
-          newTitle,
-        )
+        .upsertDocument(sessionId, title, fileType, fileId, newTitle)
+        .then(() => {
+          logger.info(`Background upload succeeded for ${newTitle || title}`);
+        })
        .catch((err: any) => {
          logger.error(
-            `Background document upload failed for ${finalDocumentId}: ${err}`,
+            `Background document upload failed for ${newTitle || title}: ${err.message || err}`,
+            err,
          );
        });
    }),
Evidence
The upload response returns the raw title as document_id, while getFileStatus and
deleteDocument rely on title-based lookup (attributes.title equality). The existing router tests
still call delete using a slug path and assert document_id is a slug, demonstrating an in-repo
contract mismatch that will fail at runtime/tests.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[337-376]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[387-404]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts[83-92]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts[200-283]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Endpoints disagree on what `document_id` means (raw title vs sanitized ID), and tests/clients still use the older slug behavior.

### Issue Context
- Upload returns `document_id: newTitle || title`.
- Status route still uses `:documentId` and passes it into `getFileStatus()`, which now searches by `attributes.title`.
- Delete route uses `:documentTitle`.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[304-404]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts[83-92]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts[200-283]

### What to change
Choose one:
1) **Keep a stable sanitized `document_id` + separate `title`:** reintroduce/restore sanitization for IDs, store both in vector-store-file attributes, and make status/delete/list routes consistently use `document_id` in the path.
2) **Use title everywhere:** rename status route param to `:documentTitle`, ensure all callers URL-encode titles, and update tests + list response schema accordingly.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Document count UI regression 🐞
Description
Session metadata no longer includes metadata.document_ids, but the Lightspeed UI still uses that
field to display document counts. As a result, the UI will show 0 documents for sessions even when
documents exist.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts[R78-102]

    // Build temporary session object to generate metadata
    const tempSession: NotebookSession = {
-      session_id: 'temp', // Will be replaced with actual ID
+      session_id: 'temp', // Placeholder - will be replaced with vector store ID
      user_id: userId,
      name,
      description: description || '',
      created_at: now,
      updated_at: now,
      metadata: {
        ...metadata,
-        document_ids: [],
        conversation_id: null,
+        provider_id: this.providerId,
+        embedding_model: this.embeddingModel,
+        embedding_dimension: this.embeddingDimension,
      },
    };

-    // Create vector store with embedding config AND metadata in one call
    const vectorStore = await this.client.vectorStores.create({
      name: name || `Session for ${userId}`,
+      provider_id: this.providerId,
      embedding_model: this.embeddingModel,
      embedding_dimension: this.embeddingDimension,
-      provider_id: this.providerId,
      metadata: buildVectorStoreMetadata(tempSession),
    });
Evidence
Backend session creation no longer initializes document_ids, while the frontend renders counts
from notebook.metadata?.document_ids, so it will be undefined/empty for all sessions created by
this backend.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts[78-93]
workspaces/lightspeed/plugins/lightspeed/src/components/notebooks/NotebookCard.tsx[128-133]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/types/notebooksTypes.ts[18-41]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
The backend stopped providing `metadata.document_ids`, but the UI still uses it to display document counts.

### Issue Context
Backend now computes `document_count` but does not populate `metadata.document_ids`.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts[78-114]
- workspaces/lightspeed/plugins/lightspeed/src/components/notebooks/NotebookCard.tsx[128-133]

### What to change
- Either (a) restore `metadata.document_ids` population (and keep it updated), or (b) change the frontend to use `notebook.document_count` (and update frontend types accordingly).
- Prefer keeping backwards compatibility by providing both during a transition.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Operator error parsing crashes🐞
Description
VectorStoresOperator builds error messages via error.detail.cause, which can throw a TypeError if
detail is missing or not an object. This masks the real HTTP error and can crash request handling
paths.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[R95-102]

+      if (!response.ok) {
+        const error = await response.json();
+        this.logger.error('Failed to create vector store:', error);
+        throw mapHttpStatusToError(
+          response.status,
+          `Failed to create vector store: ${error.detail.cause}`,
+        );
+      }
Evidence
The code dereferences error.detail.cause without checking shape; many APIs return detail as a
string/array or omit it, so this code can throw before mapping to a Backstage error type.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[95-102]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[34-52]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`error.detail.cause` access is not null-safe and can throw, hiding the real HTTP failure.

### Issue Context
`mapHttpStatusToError()` already accepts an `errorDetail` argument.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[95-102]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[126-176]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[231-241]

### What to change
- Build default messages without unsafe dereferences, e.g. `error?.detail?.cause ?? error?.detail ?? response.statusText`.
- Prefer: `throw mapHttpStatusToError(response.status, 'Failed to ...', error)` and let the mapper pick `error.detail` if present.
- Consider falling back to `await response.text()` when `response.json()` fails.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (1)
4. Hardcoded guest userId🐞
Description
getUserId() always returns user:default/guest, so all sessions/doc operations execute as the
guest user regardless of the authenticated caller. This breaks ownership checks and can mix or block
access across users.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[R108-113]

  const getUserId = async (req: any): Promise<string> => {
-    const credentials = await httpAuth.credentials(req);
-    const user = await userInfo.getUserInfo(credentials);
-    return user.userEntityRef;
+    // const credentials = await httpAuth.credentials(req);
+    // const user = await userInfo.getUserInfo(credentials);
+    // return user.userEntityRef;
+    return 'user:default/guest';
  };
Evidence
The router forces every request to use the guest identity, and session ownership enforcement
compares the stored session owner against this derived userId, making all access control decisions
incorrect for real users.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[108-138]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/sessions/sessionService.ts[140-145]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`getUserId()` is hardcoded to `user:default/guest`, so all requests run under a fake identity.

### Issue Context
Ownership checks rely on `SessionService.readSession()` comparing `session.user_id` to the router-provided `userId`.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[108-113]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouters.ts[132-160]

### What to change
- Re-enable `httpAuth.credentials(req)` + `userInfo.getUserInfo(credentials)` and return `user.userEntityRef`.
- If tests require a default user, inject/mocks for `httpAuth/userInfo` in tests rather than hardcoding production code.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

5. source_type values inconsistent 🐞
Description
Document source_type is written from fileType (e.g. 'txt'), but SessionDocument.source_type
and fixtures use 'text', creating inconsistent API values and unsafe casts. This can break
filtering and downstream consumers expecting a stable set of source_type values.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts[R184-189]

        attributes: {
-          document_id: newDocumentId,
          title: newTitle || title,
-          source_type: metadata?.fileType || 'text',
+          source_type: fileType,
          created_at: createdAt,
          updated_at: new Date().toISOString(),
-          ...(metadata || {}),
        },
-        chunking_strategy: this.chunkingStrategy,
Evidence
Supported upload types use 'txt', but the SessionDocument type union and mock fixtures use
'text'; meanwhile upsert stores whatever fileType is passed, so production can emit values
outside the declared type.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/constant.ts[117-126]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/types/notebooksTypes.ts[46-51]
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts[179-190]
workspaces/lightspeed/plugins/lightspeed-backend/fixtures/lightspeedCoreHandlers.ts[54-65]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`source_type` uses inconsistent values (`txt` vs `text`) across constants/types/fixtures.

### Issue Context
- Upload accepts `fileType=txt`.
- `upsertDocument()` stores `source_type: fileType`.
- Types/fixtures expect `text`.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/types/notebooksTypes.ts[46-51]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/constant.ts[117-126]
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts[179-190]
- workspaces/lightspeed/plugins/lightspeed-backend/__fixtures__/lightspeedCoreHandlers.ts[54-65]

### What to change
- Pick one canonical value set (recommend: match `SupportedFileType` values like `txt`, `md`, etc), update `SessionDocument.source_type` accordingly, and update fixtures.
- Alternatively, translate `txt -> text` at write time and keep the API’s canonical value as `text`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Multipart upload buffers fully 🐞
Description
File upload builds the entire multipart payload into a Buffer before sending, which increases memory
usage and adds avoidable copies for large/concurrent uploads. This can reduce backend stability
under load.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[R417-432]

+      // Convert FormData stream to Buffer
+      const formBuffer = await new Promise<Buffer>((resolve, reject) => {
+        const chunks: Buffer[] = [];
+        formData.on('data', (chunk: string | Buffer) => {
+          chunks.push(Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk));
+        });
+        formData.on('end', () => resolve(Buffer.concat(chunks)));
+        formData.on('error', reject);
+        formData.resume();
+      });
+
+      const response = await fetch(`${this.baseURL}/v1/files`, {
+        method: 'POST',
+        body: formBuffer as unknown as BodyInit,
+        headers: formData.getHeaders(),
+      });
Evidence
The implementation collects all FormData stream chunks into an in-memory Buffer (Buffer.concat)
and then passes that buffer to fetch, guaranteeing a full materialization of the multipart body in
memory.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[417-432]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Multipart upload is fully buffered in memory.

### Issue Context
Uploads can be up to the configured multer max size; concurrent uploads amplify memory usage.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/VectorStoresOperator.ts[407-432]

### What to change
- Prefer using the built-in Web `FormData` supported by Node’s fetch (undici) with `Blob`/`File`, so you can pass the `FormData` directly as the request body.
- If you must use `form-data`, investigate whether the runtime supports passing its stream/async iterable directly to fetch without buffering, or set a strict content-length and stream.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

7. Debug console.log in test 🐞
Description
A console.log was added to notebooksRouter.test.ts, adding noise to CI output. This makes test
logs harder to interpret.
Code

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts[156]

+        console.log(createResponse.body, 'createResponse');
Evidence
The test contains a raw console.log(...) call that will print during test execution.

workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts[151-160]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
A debug `console.log` statement was left in a unit test.

### Issue Context
This adds noise to automated test output.

### Fix Focus Areas
- workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/notebooksRouter.test.ts[151-160]

### What to change
- Remove the `console.log` line.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

JslYoon added 5 commits April 10, 2026 11:42
Signed-off-by: Lucas <lyoon@redhat.com>
Signed-off-by: Lucas <lyoon@redhat.com>
Signed-off-by: Lucas <lyoon@redhat.com>
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant