Skip to content

[Feature Request]: AvroIO write improvement with number of shards #36846

@CherisPatelInfocusp

Description

@CherisPatelInfocusp

What would you like to happen?

The current implementation of AvroIO for the Go SDK doesn’t work with multiple shard writes and writes all data into a single file.

Current Go implementation: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/avroio/avroio.go#L116

We can take inspiration from Python’s iobase to split data across the specified number of shards using a round-robin approach.

Reference:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L1488

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions