-
Notifications
You must be signed in to change notification settings - Fork 0
Bulk COPY insert - tickets/INSTRM-2821 #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements bulk insert optimization using PostgreSQL's COPY command, providing significant performance improvements (claimed 10x speedup for 65k rows) over the previous multi-row INSERT approach.
Key Changes:
- Added new
psql_insert_copyfunction that uses PostgreSQL's COPY command for bulk data loading - Added
use_copyparameter (defaultTrue) toinsert_dataframemethod to enable/disable COPY optimization - Added performance timing to track insert operations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CraigLoomis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll suggest making that WITH (FORMAT csv, HEADER MATCH) (and whatever the equivalent of df.to_csv(...., header=True) is to ward against the worst mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
c9f780b to
d05bc09
Compare
Since this is coming from the I've added some explicit parameters to the csv writer and some other dataframe scrubbing checks that shouldn't interfere with data. It's actually running even faster with these explicit parameters since I guess it doesn't have to do an initial pass or conversions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d05bc09 to
c96de62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
15b0653 to
13579a7
Compare
* Add bulk COPY command for insert dataframe * Scrub the dataframe before inserting.
13579a7 to
2025a74
Compare
For inserting 65k rows into
cobra_target, this, offers a +10x speedup.