-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Description
What would you like to happen?
The current implementation of AvroIO for the Go SDK doesn’t work with multiple shard writes and writes all data into a single file.
Current Go implementation: https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/avroio/avroio.go#L116
We can take inspiration from Python’s iobase to split data across the specified number of shards using a round-robin approach.
Reference:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L1488
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner