Skip to content

Beam AnalyzeAndTransformDataset runs expensive transformation _InstanceDictInputToTFXIOInput Twice #296

@michaelwsherman

Description

@michaelwsherman

AnalyzeAndTransformDataset should not run _InstanceDictInputToTF twice.

AnalyzeAndTransformDataset runs AnalyzeDataset and TransformDataset back-to-back. AnalyzeDataset runs _InstanceDictInputToTFXIOInput and TransformDataset also runs _InstanceDictInputToTFXIOInput.

But when running AnalyzeAndTransformDataset, the _InstanceDictInputToTFXIOInput call in TransformDataset is unnecessary, since it was already run in AnalyzeDataset.

The _InstanceDictInputToTFXIOInput transformation is expensive, and this redundant call meaningfully increase runtime and cost

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions