-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(aws_s3 sink): Add Apache Parquet encoder support #24372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
7c16cdd to
672999d
Compare
…ut defaulted to off
thomasqueirozb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rorylshanks, thanks for your contribution! It looks like there are failing checks (run make check-clippy for example). This is also failing to compile after merging master because you removed BatchSerializerConfig::build which is used by the clickhouse sink. I'll circle back to this PR and give it a review once I see commits pushed to this branch
Co-authored-by: Thomas <thomasqueirozb@gmail.com>
Summary
This PR adds Apache Parquet encoding support to the AWS S3 sink, enabling Vector to write columnar Parquet files optimized for analytics workloads.
Parquet is a columnar storage format that provides efficient compression and encoding, making it ideal for long-term storage and query performance with tools like AWS Athena, Apache Spark, and Presto. This implementation allows users to write properly formatted Parquet files with configurable schemas, compression, and row group sizing.
Key features:
Vector configuration
How did you test this PR?
I tested it against production Kafka data, and it produced correctly formatted Parquet files in S3.
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
parquetcolumnar format in theaws_s3sink #1374parquetcodec #17395