Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 20 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,7 @@ LC_ALL=C ./mvnw clean install

## Features

Parquet is a very active project, and new features are being added quickly. Here are a few features:

Parquet is an active project, and new features are being added quickly. Here are a few features:

* Type-specific encoding
* Hive integration (deprecated)
Expand All @@ -96,7 +95,9 @@ Parquet is a very active project, and new features are being added quickly. Here

## Java Vector API support
`The feature is experimental and is currently not part of the parquet distribution`.

Parquet-Java has supported Java Vector API to speed up reading, to enable this feature:

* Java 17+, 64-bit
* Requiring the CPU to support instruction sets:
* avx512vbmi
Expand All @@ -116,26 +117,29 @@ Note that to use an Input or Output format, you need to implement a WriteSupport
We've implemented this for 2 popular data formats to provide a clean migration path as well:

### Thrift

Thrift integration is provided by the [parquet-thrift](https://github.com/apache/parquet-java/tree/master/parquet-thrift) sub-project.

### Avro

Avro conversion is implemented via the [parquet-avro](https://github.com/apache/parquet-java/tree/master/parquet-avro) sub-project.

### Protobuf

Protobuf conversion is implemented via the [parquet-protobuf](https://github.com/apache/parquet-java/tree/master/parquet-protobuf) sub-project.

### Create your own objects

* The ParquetOutputFormat can be provided a WriteSupport to write your own objects to an event based RecordConsumer.
* The ParquetInputFormat can be provided a ReadSupport to materialize your own objects by implementing a RecordMaterializer

See the APIs:

* [Record conversion API](https://github.com/apache/parquet-java/tree/master/parquet-column/src/main/java/org/apache/parquet/io/api)
* [Hadoop API](https://github.com/apache/parquet-java/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api)

## Hive integration

Hive integration is provided via the [parquet-hive](https://github.com/apache/parquet-java/tree/master/parquet-hive) sub-project.

Hive integration is now deprecated within the Parquet project. It is now maintained by Apache Hive.

## Build
Expand Down Expand Up @@ -178,22 +182,22 @@ The current release is version `1.15.1`.

### How To Contribute

We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-java.git
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to `https://github.com/apache/parquet-java.git`.

If you are looking for some ideas on what to contribute, check out jira issues for this project labeled ["pick-me-up"](https://issues.apache.org/jira/browse/PARQUET-5?jql=project%20%3D%20PARQUET%20and%20labels%20%3D%20pick-me-up%20and%20status%20%3D%20open).
Comment on the issue and/or contact [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) with your questions and ideas.
If you are looking for some ideas on what to contribute, check out [GitHub issues](https://github.com/apache/parquet-java/issues) for labeled [Good first issue](https://github.com/apache/parquet-java/issues?q=state%3Aopen%20label%3A%22Good%20first%20issue%22). Comment on the issue and/or contact [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org) with your questions and ideas.

If you’d like to report a bug but don’t have time to fix it, you can still post it to our [issue tracker](https://issues.apache.org/jira/browse/PARQUET), or email the mailing list [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
If you’d like to report a bug but don’t have time to fix it, you can still raise an [issue on GitHub](https://github.com/apache/parquet-java/issues/new/choose), or email the mailing list [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org).

To contribute a patch:

1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
2. Create a JIRA for your patch on the [Parquet Project JIRA](https://issues.apache.org/jira/browse/PARQUET).
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/parquet-java/pull/240).
2. Create an issue for your patch on the [GitHub issues](https://github.com/apache/parquet-java/issues).
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the issue (ex: https://github.com/apache/parquet-java/pull/3260).
4. Make sure that your code passes the unit tests. You can run the tests with `./mvnw test` in the root directory.
5. Add new unit tests for your code.

We tend to do fairly close readings of pull requests, and you may get a lot of comments. Some common issues that are not code structure related, but still important:

* Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2.
* Give your operators some room. Not `a+b` but `a + b` and not `foo(int a,int b)` but `foo(int a, int b)`.
* Generally speaking, stick to the [Sun Java Code Conventions](http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html)
Expand All @@ -204,18 +208,20 @@ Thank you for getting involved!
## Authors and contributors

* [Contributors](https://github.com/apache/parquet-java/graphs/contributors)
* [Committers](dev/COMMITTERS.md)
* [Committers](https://projects.apache.org/committee.html?parquet)

## Code of Conduct

We hold ourselves and the Parquet developer community to two codes of conduct:

1. [The Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html)
2. [The Twitter OSS Code of Conduct](https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md)

## Discussions
* Mailing list: [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
* Bug tracker: [jira](https://issues.apache.org/jira/browse/PARQUET)
* Discussions also take place in github pull requests

* Mailing list: [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org)
* GitHub issues: [Issues](https://github.com/apache/parquet-java/issues)
* Discussions also take place in GitHub pull requests

## License

Expand Down
66 changes: 0 additions & 66 deletions dev/COMMITTERS.md

This file was deleted.