[spark][bug] Batch read PK table failed when use random column as primary key

I found a bug while testing reading PK tables, it fails when using the last column as the primary key, current cases in SparkPrimaryKeyTableReadTest all use first column and partition column as primary key.

```scala
test("Spark Read: primary key table with last pk") {
    withTable("t") {
      sql("CREATE TABLE t (id int, name string, pk int, pk2 string) TBLPROPERTIES('primary.key'='pk,pk2')")
      checkAnswer(sql("SELECT * FROM t"), Nil)
      sql("INSERT INTO t VALUES (1, 'a', 10, 'x'), (2, 'b', 20, 'y')")
      checkAnswer(sql("SELECT * FROM t ORDER BY id"), Row(1, "a", 10, "x") :: Row(2, "b", 20, "y") :: Nil)
    }
  }
```

above case will failed with 
```
Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 4) (192.168.0.116 executor driver): java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
	at org.apache.fluss.row.ProjectedRow.getInt(ProjectedRow.java:90)
	at org.apache.fluss.row.InternalRow.lambda$createFieldGetter$ff31e09f$6(InternalRow.java:198)
	at org.apache.fluss.row.encode.CompactedKeyEncoder.encodeKey(CompactedKeyEncoder.java:83)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:113)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader$$anon$1.compare(FlussUpsertPartitionReader.scala:111)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasSamePrimaryKey(LogChangesIterator.scala:117)
	at org.apache.fluss.spark.utils.LogChangesIterator.hasNext(LogChangesIterator.scala:85)
	at org.apache.fluss.client.table.scanner.SortMergeReader.readBatch(SortMergeReader.java:90)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.initialize(FlussUpsertPartitionReader.scala:217)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReader.<init>(FlussUpsertPartitionReader.scala:86)
	at org.apache.fluss.spark.read.FlussUpsertPartitionReaderFactory.createReader(FlussPartitionReaderFactory.scala:61)
```

_Originally posted by @Yohahaha in https://github.com/apache/fluss/issues/2523#issuecomment-3835480927_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark][bug] Batch read PK table failed when use random column as primary key #2986

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[spark][bug] Batch read PK table failed when use random column as primary key #2986

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions