Skip to content

[Feature Request]: Specify output_type in ReadFromBigQuery Beam YAML transform #36988

@zycietocalareszta

Description

@zycietocalareszta

What would you like to happen?

It would be awesome to have the ability to specify output_type for ReadFromBigQuery Apache Beam YAML transform when using query.
Currently attempt to query the BigQuery table with this transform ends ups with "ValueError: Invalid transform specification at "Read from BigQuery" at line 3: Both a query and an output type of 'BEAM_ROW' were specified. 'BEAM_ROW' is not currently supported with queries." exception.

if self.output_type == 'BEAM_ROW' and self._kwargs.get('query',
None) is not None:
raise ValueError(
"Both a query and an output type of 'BEAM_ROW' were specified. "
"'BEAM_ROW' is not currently supported with queries.")

The workaround is to use combination of table, fields and row_restriction config parameters, but this does not allow for any aggregation, meaning that in some cases users must read a lot of data into memory instead of having BigQuery take care of it.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions