-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What would you like to happen?
It would be awesome to have the ability to specify output_type for ReadFromBigQuery Apache Beam YAML transform when using query.
Currently attempt to query the BigQuery table with this transform ends ups with "ValueError: Invalid transform specification at "Read from BigQuery" at line 3: Both a query and an output type of 'BEAM_ROW' were specified. 'BEAM_ROW' is not currently supported with queries." exception.
beam/sdks/python/apache_beam/io/gcp/bigquery.py
Lines 2973 to 2977 in c0a5895
| if self.output_type == 'BEAM_ROW' and self._kwargs.get('query', | |
| None) is not None: | |
| raise ValueError( | |
| "Both a query and an output type of 'BEAM_ROW' were specified. " | |
| "'BEAM_ROW' is not currently supported with queries.") |
The workaround is to use combination of table, fields and row_restriction config parameters, but this does not allow for any aggregation, meaning that in some cases users must read a lot of data into memory instead of having BigQuery take care of it.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner