diff --git a/README.md b/README.md index c53ccbba66..ff6f162151 100644 --- a/README.md +++ b/README.md @@ -68,8 +68,7 @@ LC_ALL=C ./mvnw clean install ## Features -Parquet is a very active project, and new features are being added quickly. Here are a few features: - +Parquet is an active project, and new features are being added quickly. Here are a few features: * Type-specific encoding * Hive integration (deprecated) @@ -96,7 +95,9 @@ Parquet is a very active project, and new features are being added quickly. Here ## Java Vector API support `The feature is experimental and is currently not part of the parquet distribution`. + Parquet-Java has supported Java Vector API to speed up reading, to enable this feature: + * Java 17+, 64-bit * Requiring the CPU to support instruction sets: * avx512vbmi @@ -116,26 +117,29 @@ Note that to use an Input or Output format, you need to implement a WriteSupport We've implemented this for 2 popular data formats to provide a clean migration path as well: ### Thrift + Thrift integration is provided by the [parquet-thrift](https://github.com/apache/parquet-java/tree/master/parquet-thrift) sub-project. ### Avro + Avro conversion is implemented via the [parquet-avro](https://github.com/apache/parquet-java/tree/master/parquet-avro) sub-project. ### Protobuf + Protobuf conversion is implemented via the [parquet-protobuf](https://github.com/apache/parquet-java/tree/master/parquet-protobuf) sub-project. ### Create your own objects + * The ParquetOutputFormat can be provided a WriteSupport to write your own objects to an event based RecordConsumer. * The ParquetInputFormat can be provided a ReadSupport to materialize your own objects by implementing a RecordMaterializer See the APIs: + * [Record conversion API](https://github.com/apache/parquet-java/tree/master/parquet-column/src/main/java/org/apache/parquet/io/api) * [Hadoop API](https://github.com/apache/parquet-java/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api) ## Hive integration -Hive integration is provided via the [parquet-hive](https://github.com/apache/parquet-java/tree/master/parquet-hive) sub-project. - Hive integration is now deprecated within the Parquet project. It is now maintained by Apache Hive. ## Build @@ -178,22 +182,22 @@ The current release is version `1.15.1`. ### How To Contribute -We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-java.git +We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to `https://github.com/apache/parquet-java.git`. -If you are looking for some ideas on what to contribute, check out jira issues for this project labeled ["pick-me-up"](https://issues.apache.org/jira/browse/PARQUET-5?jql=project%20%3D%20PARQUET%20and%20labels%20%3D%20pick-me-up%20and%20status%20%3D%20open). -Comment on the issue and/or contact [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) with your questions and ideas. +If you are looking for some ideas on what to contribute, check out [GitHub issues](https://github.com/apache/parquet-java/issues) for labeled [Good first issue](https://github.com/apache/parquet-java/issues?q=state%3Aopen%20label%3A%22Good%20first%20issue%22). Comment on the issue and/or contact [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org) with your questions and ideas. -If you’d like to report a bug but don’t have time to fix it, you can still post it to our [issue tracker](https://issues.apache.org/jira/browse/PARQUET), or email the mailing list [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) +If you’d like to report a bug but don’t have time to fix it, you can still raise an [issue on GitHub](https://github.com/apache/parquet-java/issues/new/choose), or email the mailing list [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org). To contribute a patch: 1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features. - 2. Create a JIRA for your patch on the [Parquet Project JIRA](https://issues.apache.org/jira/browse/PARQUET). - 3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/parquet-java/pull/240). + 2. Create an issue for your patch on the [GitHub issues](https://github.com/apache/parquet-java/issues). + 3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the issue (ex: https://github.com/apache/parquet-java/pull/3260). 4. Make sure that your code passes the unit tests. You can run the tests with `./mvnw test` in the root directory. 5. Add new unit tests for your code. We tend to do fairly close readings of pull requests, and you may get a lot of comments. Some common issues that are not code structure related, but still important: + * Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2. * Give your operators some room. Not `a+b` but `a + b` and not `foo(int a,int b)` but `foo(int a, int b)`. * Generally speaking, stick to the [Sun Java Code Conventions](http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html) @@ -204,18 +208,20 @@ Thank you for getting involved! ## Authors and contributors * [Contributors](https://github.com/apache/parquet-java/graphs/contributors) -* [Committers](dev/COMMITTERS.md) +* [Committers](https://projects.apache.org/committee.html?parquet) ## Code of Conduct We hold ourselves and the Parquet developer community to two codes of conduct: + 1. [The Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html) 2. [The Twitter OSS Code of Conduct](https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md) ## Discussions -* Mailing list: [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) -* Bug tracker: [jira](https://issues.apache.org/jira/browse/PARQUET) -* Discussions also take place in github pull requests + +* Mailing list: [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org) +* GitHub issues: [Issues](https://github.com/apache/parquet-java/issues) +* Discussions also take place in GitHub pull requests ## License diff --git a/dev/COMMITTERS.md b/dev/COMMITTERS.md deleted file mode 100644 index f861cda83f..0000000000 --- a/dev/COMMITTERS.md +++ /dev/null @@ -1,66 +0,0 @@ - - -# Committers (in alphabetical order): - -The official list of committers can be found here: [Apache Parquet Committers and PMC](http://people.apache.org/committers-by-project.html#parquet) - -Below is more information about each committer (in alphabetical order). If this information becomes out of date, please send a PR to update! - -| Name | Apache Id | github id | JIRA id | -|--------------------|--------------|-----------------|--------------| -| Alex Levenson | alexlevenson | @isnotinvain | alexlevenson | -| Aniket Mokashi | aniket486 | @aniket486 | | -| Brock Noland | brock | @brockn | | -| Cheng Lian | lian | @liancheng | liancheng | -| Chris Aniszczyk | caniszczyk | @caniszczyk | | -| Chris Mattmann | mattmann | @chrismattmann | | -| Daniel C. Weeks | dweeks | @danielcweeks | | -| Dmitriy Ryaboy | dvryaboy | @dvryaboy | | -| Fokko Driesprong | Fokko | @Fokko | fokko | -| Gang Wu | gangwu | @wgtmac | | -| Gidon Gershinsky | gershinsky | @ggershinsky | gershinsky | -| Jake Farrell | jfarrell | | | -| Jonathan Coveney | jcoveney | @jcoveney | | -| Julien Le Dem | julien | @julienledem | julienledem | -| Lukas Nalezenec | lukas | @lukasnalezenec | | -| Marcel Kornacker | marcel | @mkornacker | | -| Mickael Lacour | mlacour | @mickaellcr | | -| Nong Li | nong | @nongli | | -| Remy Pecqueur | rpecqueur | @Lordshinjo | | -| Roman Shaposhnik | rvs | @rvs | | -| Ryan Blue | blue | @rdblue | | -| Sergio Pena | spena | @spena | spena | -| Tianshuo Deng | tianshuo | @tsdeng | | -| Todd Lipcon | todd | @toddlipcon | | -| Tom White | tomwhite | @tomwhite | | -| Wes McKinney | wesm | @wesm | | -| Wesley Graham Peck | wesleypeck | @wesleypeck | | -| Xinli Shang | shangxinli | @shangxinli | | - - -# Reviewing guidelines: -Committers have the responsibility to give constructive and timely feedback on the pull requests. -Anybody can give feedback on a pull request but only committers can merge it. - -First things to look at in a Pull Request: - - Is there a corresponding JIRA, and is it mentioned in the description? If not ask the contributor to make one. - - If a JIRA is open, make sure it is assigned to the contributor. (they need to have the contributor role here: https://issues.apache.org/jira/plugins/servlet/project-config/PARQUET/roles) - - Is it an uncontroversial change that looks good (has apropriate tests and the build is succesful)? => merge it - - Is it something that requires the attention of committers with a specific expertise? => mention those committers by their github id in the pull request.