Use chunked file reading to avoid loading entire files into memory by JanTvrdik · Pull Request #37 · nextras/multi-query-parser

JanTvrdik · 2026-02-18T15:06:30Z

Summary

Added BufferedFileParseTrait with shared logic that reads files in 64 KiB chunks via fopen()/fread() instead of file_get_contents(), so memory usage is proportional to the largest single query rather than the entire file size
Refactored all three parsers (MySqlMultiQueryParser, PostgreSqlMultiQueryParser, SqlServerMultiQueryParser) to use the trait — no changes to regex patterns or public API
Includes a safety check to prevent \z regex anchors from falsely matching at chunk boundaries before EOF

Test plan

All existing tests pass (composer tests — 4 tests, 0 assertions failures)
PHPStan reports no errors (composer phpstan)
Verify with a large SQL file (100+ MB) that memory stays bounded

Replace file_get_contents() with buffered fopen()/fread() via a shared BufferedFileParseTrait. Memory usage is now proportional to the largest single query rather than the entire file size, which matters for large SQL files (100+ MB). The parsers already use generators for output, so this completes the streaming pipeline on the input side.

Copilot

Pull request overview

This PR refactors the multi-query parser to use chunked file reading instead of loading entire files into memory, aiming to reduce memory usage for large SQL files. The implementation introduces a shared BufferedFileParseTrait that reads files in 64 KiB chunks using fopen()/fread() and refactors all three database-specific parsers (MySQL, PostgreSQL, SQL Server) to use this trait. A safety mechanism is included to prevent false \z regex anchor matches at chunk boundaries.

Changes:

Introduced BufferedFileParseTrait with chunked file reading logic (64 KiB chunks)
Refactored MySQL, PostgreSQL, and SQL Server parsers to use the new trait with callback-based pattern processing
Added safety check to handle \z anchor edge cases at chunk boundaries

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
src/BufferedFileParseTrait.php	New trait implementing chunked file reading with buffering, pattern matching, and memory management logic
src/MySqlMultiQueryParser.php	Refactored to use BufferedFileParseTrait with callback handling for dynamic delimiter changes
src/PostgreSqlMultiQueryParser.php	Refactored to use BufferedFileParseTrait with static callback for query extraction
src/SqlServerMultiQueryParser.php	Refactored to use BufferedFileParseTrait with static callback for query extraction

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T15:27:12Z

src/BufferedFileParseTrait.php

+					break;
+				}
+
+				if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) {


The buffering logic will fail for queries larger than 64 KiB. After reading the first chunk (line 38), if the remaining buffer is >= chunkSize (line 37), the inner loop exits. If preg_match then fails because the query is incomplete (no delimiter found yet), the outer loop breaks (line 52), and a RuntimeException is thrown (line 92).

The fix should ensure that if preg_match fails and we're not at EOF, we continue reading more data instead of breaking. One approach would be to:

Always try to read at least one more chunk after a failed match (if not at EOF)

Or, change the inner loop condition to ensure we have enough buffer data to match complete patterns

This is critical because the PR's goal is to handle large files without loading them entirely into memory, but it fails for any individual query exceeding the chunk size.

Suggested change

if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) {

if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) {

// If no match is found but we're not at EOF, try to read more data

if (!$eof) {

$chunk = fread($handle, $chunkSize);

if ($chunk !== false && $chunk !== '') {

$buffer .= $chunk;

$eof = feof($handle);

continue; // retry matching with more data in the buffer

}

$eof = feof($handle);

}

Copilot AI review requested due to automatic review settings February 18, 2026 15:06

Copilot started reviewing on behalf of JanTvrdik February 18, 2026 15:12 View session

JanTvrdik force-pushed the chunked-file-reading branch from a7322e5 to 04aafe6 Compare February 18, 2026 15:15

JanTvrdik force-pushed the chunked-file-reading branch from 04aafe6 to d768638 Compare February 18, 2026 15:19

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use chunked file reading to avoid loading entire files into memory#37

Use chunked file reading to avoid loading entire files into memory#37
JanTvrdik wants to merge 1 commit intomainfrom
chunked-file-reading

JanTvrdik commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

-				if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) {
+				if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) {
+					// If no match is found but we're not at EOF, try to read more data
+					if (!$eof) {
+						$chunk = fread($handle, $chunkSize);
+						if ($chunk !== false && $chunk !== '') {
+							$buffer .= $chunk;
+							$eof = feof($handle);
+							continue; // retry matching with more data in the buffer
+						}
+						$eof = feof($handle);
+					}

Conversation

JanTvrdik commented Feb 18, 2026

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments