Use chunked file reading to avoid loading entire files into memory#37
Use chunked file reading to avoid loading entire files into memory#37
Conversation
a7322e5 to
04aafe6
Compare
Replace file_get_contents() with buffered fopen()/fread() via a shared BufferedFileParseTrait. Memory usage is now proportional to the largest single query rather than the entire file size, which matters for large SQL files (100+ MB). The parsers already use generators for output, so this completes the streaming pipeline on the input side.
04aafe6 to
d768638
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the multi-query parser to use chunked file reading instead of loading entire files into memory, aiming to reduce memory usage for large SQL files. The implementation introduces a shared BufferedFileParseTrait that reads files in 64 KiB chunks using fopen()/fread() and refactors all three database-specific parsers (MySQL, PostgreSQL, SQL Server) to use this trait. A safety mechanism is included to prevent false \z regex anchor matches at chunk boundaries.
Changes:
- Introduced
BufferedFileParseTraitwith chunked file reading logic (64 KiB chunks) - Refactored MySQL, PostgreSQL, and SQL Server parsers to use the new trait with callback-based pattern processing
- Added safety check to handle
\zanchor edge cases at chunk boundaries
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/BufferedFileParseTrait.php | New trait implementing chunked file reading with buffering, pattern matching, and memory management logic |
| src/MySqlMultiQueryParser.php | Refactored to use BufferedFileParseTrait with callback handling for dynamic delimiter changes |
| src/PostgreSqlMultiQueryParser.php | Refactored to use BufferedFileParseTrait with static callback for query extraction |
| src/SqlServerMultiQueryParser.php | Refactored to use BufferedFileParseTrait with static callback for query extraction |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| break; | ||
| } | ||
|
|
||
| if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) { |
There was a problem hiding this comment.
The buffering logic will fail for queries larger than 64 KiB. After reading the first chunk (line 38), if the remaining buffer is >= chunkSize (line 37), the inner loop exits. If preg_match then fails because the query is incomplete (no delimiter found yet), the outer loop breaks (line 52), and a RuntimeException is thrown (line 92).
The fix should ensure that if preg_match fails and we're not at EOF, we continue reading more data instead of breaking. One approach would be to:
- Always try to read at least one more chunk after a failed match (if not at EOF)
- Or, change the inner loop condition to ensure we have enough buffer data to match complete patterns
This is critical because the PR's goal is to handle large files without loading them entirely into memory, but it fails for any individual query exceeding the chunk size.
| if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) { | |
| if (preg_match($pattern, $buffer, $match, 0, $offset) !== 1) { | |
| // If no match is found but we're not at EOF, try to read more data | |
| if (!$eof) { | |
| $chunk = fread($handle, $chunkSize); | |
| if ($chunk !== false && $chunk !== '') { | |
| $buffer .= $chunk; | |
| $eof = feof($handle); | |
| continue; // retry matching with more data in the buffer | |
| } | |
| $eof = feof($handle); | |
| } |
Summary
BufferedFileParseTraitwith shared logic that reads files in 64 KiB chunks viafopen()/fread()instead offile_get_contents(), so memory usage is proportional to the largest single query rather than the entire file sizeMySqlMultiQueryParser,PostgreSqlMultiQueryParser,SqlServerMultiQueryParser) to use the trait — no changes to regex patterns or public API\zregex anchors from falsely matching at chunk boundaries before EOFTest plan
composer tests— 4 tests, 0 assertions failures)composer phpstan)