[spark] Add startup mode for batch read by Yohahaha · Pull Request #2532 · apache/fluss

Yohahaha · 2026-01-30T14:54:12Z

Purpose

Add a new option start.up.mode to read different offset from fluss.
This PR only changes batch read related class.

Linked issue: close #2549

Brief change log

Tests

API and Format

Documentation

Yohahaha · 2026-02-02T06:37:52Z

@YannByron @wuchong please help take a look, thank you!

wuchong

Thanks, @Yohahaha! It looks like no new tests have been added, should we include some tests for the new configuration option?

Also, as a best practice, we recommend first creating a dedicated issue to describe the feature and proposed APIs before submitting a pull request. The PR can then be linked to that issue. This helps us better track progress and maintain visibility across all subtasks of the umbrella initiative.

wuchong · 2026-02-02T11:30:29Z

...s-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/SparkConnectorOptions.scala

+    ConfigBuilder
+      .key("scan.startup.mode")
+      .stringType()
+      .defaultValue(StartUpMode.LATEST.toString)


Should we use the default FULL mode to stay aligned with the Flink connector?
Using LATEST by default may result in empty results if the user doesn’t explicitly specify a startup mode for the query.

...uss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussAppendPartitionReader.scala

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussBatch.scala

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScan.scala

YannByron · 2026-02-03T13:51:33Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/SparkFlussConf.scala

+    val FULL, EARLIEST, LATEST, TIMESTAMP = Value
+  }
+
+  val SCAN_START_UP_MODE: ConfigOption[String] =


I suggest to place these common options used both spark and flink in ConfigOptions, like https://github.com/apache/paimon/blob/a10a44892cd5e9dbac705762ed6774674357692f/paimon-api/src/main/java/org/apache/paimon/CoreOptions.java#L947 in paimon. We can do this in separate pr maybe.

good advice, let's refactor these common config into fluss-common module in another PR, flink and spark can share it. cc @wuchong

Good point. We can introduce ConnectorOptions in fluss-common package org.apache.fluss.config to share the common options for different connectors.

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussScan.scala

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/SparkFlussConf.scala

YannByron · 2026-02-03T14:25:14Z

...park/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkPrimaryKeyTableReadTest.scala

    new Configuration()
  }

+  override protected def beforeEach(): Unit = {


All the configs are default values, this's meaningless, so I suggest to remove this.

@YannByron WithFlussAdmin use singleton fluss config that will affect integration test, if we dont reset config here, the fluss config working scope need to rethink.

UT launched by mvn will fail without it.

Yohahaha · 2026-02-04T09:18:03Z

@YannByron any more comments?

YannByron · 2026-02-05T02:26:51Z

+1. cc @wuchong

wuchong reviewed Feb 2, 2026

View reviewed changes

wuchong added the priority=critical label Feb 2, 2026

Yohahaha force-pushed the spark-read-log-table branch from 7c323dd to 9f47100 Compare February 3, 2026 06:03

Yohahaha mentioned this pull request Feb 3, 2026

[spark] Add basic streaming read support for sparksql with latest mode #2548

Open

YannByron reviewed Feb 3, 2026

View reviewed changes

Yohahaha added 9 commits February 4, 2026 14:49

stage

d5fbebe

stage

e2e1914

fix primary key table

f1fe339

fix comments

786b87d

fix comments

77303fa

fix conflict

3624c0c

fixut

10211d1

fix comments

e62156f

fix comments

67105fa

Yohahaha force-pushed the spark-read-log-table branch from 2d08347 to 67105fa Compare February 4, 2026 06:49

wuchong approved these changes Feb 5, 2026

View reviewed changes

wuchong merged commit 62ada45 into apache:main Feb 5, 2026
6 checks passed

Yohahaha deleted the spark-read-log-table branch February 5, 2026 06:03

Conversation

Yohahaha commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Yohahaha commented Feb 2, 2026

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YannByron Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

YannByron Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Yohahaha commented Feb 4, 2026

Uh oh!

YannByron commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yohahaha commented Jan 30, 2026 •

edited

Loading