Search before asking
Paimon version
i tried using chain group parition to solve my #7503 , but still failed read some data
Compute Engine
spark 3.5.2
Minimal reproduce step
using test case :
CREATE TABLE IF NOT EXISTS `default`.`chain_test` (
`t1` BIGINT COMMENT 't1',
`t2` BIGINT COMMENT 't2',
`t3` STRING COMMENT 't3'
) PARTITIONED BY (`region` STRING, `dt` STRING COMMENT 'dt')
TBLPROPERTIES (
'bucket-key' = 't1',
'primary-key' = 'region,dt,t1',
'partition.timestamp-pattern' = '$dt',
'partition.timestamp-formatter' = 'yyyyMMdd',
'chain-table.enabled' = 'true',
'bucket' = '2',
'merge-engine' = 'deduplicate',
'sequence.field' = 't2',
'chain-table.chain-partition-keys' = 'dt'
);
CALL sys.create_branch('default.chain_test1', 'snapshot');
CALL sys.create_branch('default.chain_test1', 'delta');
ALTER TABLE default.chain_test1 SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
ALTER TABLE `default`.`chain_test$branch_snapshot` SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
ALTER TABLE `default`.`chain_test$branch_delta` SET TBLPROPERTIES (
'scan.fallback-snapshot-branch' = 'snapshot',
'scan.fallback-delta-branch' = 'delta'
);
insert some data
-- 写入主分支(delta)
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 1, '1'), (2, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 1, '1'), (12, 1, '1');
-- 写入 delta 分支
SET spark.paimon.branch = delta;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250809')
VALUES (1, 1, '1'), (2, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (3, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250811')
VALUES (2, 2, '1-1'), (4, 1, '1');
VALUES (5, 2, '1-1'), (6, 2, '1-1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250809')
VALUES (11, 1, '1'), (12, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (13, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250811')
VALUES (12, 2, '1-1'), (14, 1, '1');
-- 写入 snapshot 分支
SET spark.paimon.branch = snapshot;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'CN', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'US', dt = '20250810')
VALUES (11, 2, '1-1'), (12, 1, '1'), (13, 1, '1');
query
SELECT * FROM `default`.`chain_test` where dt = '20250811'
result
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
this is ok, why? because all partition are in all branch
if i insert a partition not in delta
SET spark.paimon.branch = snapshot;
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
still got query result
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
What doesn't meet your expectations?
when insert a partition not in delta ,and in snapshot
INSERT OVERWRITE TABLE `default`.`chain_test` PARTITION (region = 'UK', dt = '20250810')
VALUES (1, 2, '1-1'), (2, 1, '1'), (3, 1, '1');
query result should be
1 2 1-1 CN 20250811
2 2 1-1 CN 20250811
4 1 1 CN 20250811
3 1 1 CN 20250811
12 2 1-1 US 20250811
11 2 1-1 US 20250811
13 1 1 US 20250811
14 1 1 US 20250811
1 2 1-1 UK 20250811
2 1 1 UK 20250811
3 1 1 UK 20250811
or i try to change partition key order from PARTITIONED BY (regionSTRING,dt STRING COMMENT 'dt') to PARTITIONED BY (dt STRING COMMENT 'dt',region STRING)
when execute sql as above
result is
2 2 1-1 20250811 US
4 1 1 20250811 US
12 2 1-1 20250811 US
14 1 1 20250811 US
2 2 1-1 20250811 CN
4 1 1 20250811 CN
Anything else?
No response
Are you willing to submit a PR?
Search before asking
Paimon version
i tried using chain group parition to solve my #7503 , but still failed read some data
Compute Engine
spark 3.5.2
Minimal reproduce step
using test case :
insert some data
query
result
this is ok, why? because all partition are in all branch
if i insert a partition not in delta
still got query result
What doesn't meet your expectations?
when insert a partition not in delta ,and in snapshot
query result should be
or i try to change partition key order from
PARTITIONED BY (regionSTRING,dtSTRING COMMENT 'dt')toPARTITIONED BY (dtSTRING COMMENT 'dt',regionSTRING)when execute sql as above
result is
Anything else?
No response
Are you willing to submit a PR?