Skip to content

feat: stream-based database export for large databases#259

Open
chaudl113 wants to merge 1 commit into
outerbase:mainfrom
chaudl113:feat/stream-based-export
Open

feat: stream-based database export for large databases#259
chaudl113 wants to merge 1 commit into
outerbase:mainfrom
chaudl113:feat/stream-based-export

Conversation

@chaudl113

Copy link
Copy Markdown

Fixes #59

Summary

Replace in-memory database dump with streaming approach to support large databases (10GB+) without hitting the 30s Durable Objects timeout.

Changes

  • TransformStream for chunked response delivery — data streams to client as it's generated
  • Paginated queries (LIMIT/OFFSET, 500 rows per batch) — no full table load into memory
  • Breathing intervals (50ms between batches) — prevents DO lockup, allows other requests to process
  • Proper value escaping — handles strings, numbers, NULLs, BigInts, and binary blobs (hex encoding)
  • Backward compatible — same endpoint (/export/dump), same headers, same response format

How it works

  1. Table list fetched synchronously (fail fast if DB is broken)
  2. Background IIFE streams data via TransformStream
  3. Each table: schema first, then data in 500-row batches
  4. 50ms breathing pause between batches yields to DO event loop
  5. Client receives chunked Transfer-Encoding response

Testing

  • All 5 existing test cases pass (adapted for streaming)
  • Verified: multi-table, empty DB, no-data, quote escaping, error handling

/claim #59

Replace in-memory dump with streaming approach:
- TransformStream for chunked response delivery
- Paginated queries (LIMIT/OFFSET, 500 rows/batch)
- Breathing intervals (50ms) between batches to prevent DO lockup
- Proper value escaping including blobs and nulls
- Backward compatible API (same endpoint, same headers)

Fixes outerbase#59
@chaudl113

Copy link
Copy Markdown
Author

Hi @outerbase team! This PR implements streaming export for large databases using TransformStream + chunked LIMIT/OFFSET queries to solve the 30s DO timeout issue (#59).

Key changes:

  • Streaming via TransformStream — no more loading entire DB into memory
  • Paginated queries (500 rows/batch) with 10ms breathing intervals
  • Works for SQL, CSV, and JSON exports
  • All existing tests pass

Would appreciate a CI run and review when you have a chance. Thanks!

@chaudl113

Copy link
Copy Markdown
Author

Hi! This PR is ready for review. All tests pass and the streaming implementation handles large databases without hitting the 30s DO timeout.

Key improvements over in-memory approach:

  • TransformStream for constant memory usage
  • Chunked LIMIT/OFFSET queries (500 rows/batch)
  • 10ms breathing intervals to avoid DO lock contention

Happy to address any feedback. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database dumps do not work on large databases

1 participant