Skip to content

HIVE-28676: Support custom partition patterns in MSCK repair table#6271

Open
VenkatSNarayanan wants to merge 2 commits intoapache:masterfrom
VenkatSNarayanan:vnarayanan/HIVE-28676
Open

HIVE-28676: Support custom partition patterns in MSCK repair table#6271
VenkatSNarayanan wants to merge 2 commits intoapache:masterfrom
VenkatSNarayanan:vnarayanan/HIVE-28676

Conversation

@VenkatSNarayanan
Copy link
Contributor

Adds support for custom partitioning patterns to MSCK repair table.

What changes were proposed in this pull request?
Adds support for custom partitioning patterns to MSCK repair table.

Why are the changes needed?
HCatStorer supports custom partitioning patterns when using dynamic partitioning, but Hive itself does not support this. This change adds support for non-pathological cases to Hive.

Does this PR introduce any user-facing change?
MSCK repair table with a configured custom partition pattern would previously ignore that pattern and error on finding nonstandard paths. With the code from this PR, it will respect the defined custom pattern when it extracts partition key values from the paths.

Is the change a dependency upgrade?
No

How was this patch tested?
A test was added to TestHiveMetastoreChecker to test the common kinds of custom patterns supported.

Adds support for custom partitioning patterns to MSCK repair table.

Adds support for custom partitioning patterns to MSCK
repair table.
@sonarqubecloud
Copy link

@VenkatSNarayanan
Copy link
Contributor Author

@deniskuzZ Addressed all the review comments from the old PR. The hcat.dynamic.partitioning.custom.pattern setting needs to be used in hive-standalone-metastore-server in one place, so I moved the string with the name of the setting there(hcatalog-core and hive-exec depend on that module). I've rebased it on current master and rerun the MSCK unit tests, which all pass.

@deniskuzZ
Copy link
Member

deniskuzZ commented Feb 2, 2026

hi @VenkatSNarayanan, could you please share old PR to get the context. #5591?
does the new PR only handle the review comments?

@VenkatSNarayanan
Copy link
Contributor Author

Old PR: #5591

@deniskuzZ

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for custom partitioning patterns to the MSCK (Metastore Check) repair table functionality in Hive. Previously, only HCatStorer supported custom partitioning patterns for dynamic partitioning, but the MSCK repair table command would fail when encountering non-standard partition directory layouts. This change enables MSCK to recognize and extract partition values from custom path patterns like const/${state}/const2/${id}-${city} instead of only the standard key=value format.

Changes:

  • Introduced DynamicPartitioningCustomPattern class to parse custom partition patterns and generate regex matchers
  • Modified MetaStoreServerUtils.getPartitionName() to support custom pattern-based partition name extraction
  • Updated HiveMetaStoreChecker to validate paths against custom patterns during directory traversal
  • Refactored the HCAT_CUSTOM_DYNAMIC_PATTERN constant from HCatConstants to MetaStoreServerUtils for broader accessibility
  • Enhanced AlterTableAddPartitionDesc and related classes to propagate custom patterns through the partition creation pipeline
  • Added test coverage in TestHiveMetaStoreChecker for custom partition pattern validation

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
DynamicPartitioningCustomPattern.java New class that parses custom patterns, extracts partition column names, and builds regex matchers for path validation
MetaStoreServerUtils.java Added custom pattern handling to getPartitionName() method and defined the HCAT_CUSTOM_DYNAMIC_PATTERN constant
HiveMetaStoreChecker.java Modified PathDepthInfoCallable to support custom pattern depth calculation and path validation
TestHiveMetaStoreChecker.java Added test case for custom partition pattern with non-standard directory structure
Partition.java Enhanced createMetaPartitionObject() to construct partition paths from custom patterns
AlterTableAddPartitionDesc.java Added custom pattern field and constructor overload for backward compatibility
AlterTableAddPartitionOperation.java Updated to pass custom pattern to partition creation
AbstractAddPartitionAnalyzer.java Modified to extract and propagate custom pattern from configuration
ImportSemanticAnalyzer.java Updated to include custom pattern when creating partition descriptors
FSTableEvent.java Modified replication flow to propagate custom patterns
HCatOutputFormat.java Updated import to use new constant location
HCatFileUtil.java Updated to import shared customPathPattern from DynamicPartitioningCustomPattern
HCatConstants.java Removed HCAT_DYNAMIC_CUSTOM_PATTERN constant (moved to MetaStoreServerUtils)
TestHCatDynamicPartitioned.java Updated to use new constant location

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

boolean didMatch = pathMatcher.matches();
if (!didMatch) { //partition path doesn't match the pattern, should have been detected at an earlier step
throw new MetastoreException("Path " + relPath + "doesn't match custom partition pattern "
+ customPathPattern + "partitionPathFull: " + partitionPath);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is missing proper spacing. It should include a space before "partitionPathFull:" for better readability.

Suggested change
+ customPathPattern + "partitionPathFull: " + partitionPath);
+ customPathPattern + " partitionPathFull: " + partitionPath);

Copilot uses AI. Check for mistakes.
Comment on lines +1637 to +1641
if (!patternPartCols.isEmpty()) {
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The partition name is constructed using the order of columns as they appear in the custom pattern, but Hive expects partition names to follow the order defined in the table's partition columns. For example, if the table has partition columns (partcity, partstate, partid) but the custom pattern is "const/${partstate}/const2/${partid}-${partcity}", the code creates "partstate=CO/partid=1/partcity=denver" instead of the required "partcity=denver/partstate=CO/partid=1". The extracted values should be reordered to match the table's partition column order before constructing the partition name.

Suggested change
if (!patternPartCols.isEmpty()) {
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
// Map captured values by column name as they appear in the pattern.
Map<String, String> capturedValues = new HashMap<>();
for (int i = 0; i < patternPartCols.size(); i++) {
String colName = patternPartCols.get(i);
String value = pathMatcher.group(i + 1);
capturedValues.put(colName, value);
}
// Rebuild the partition name in the order of the table's partition columns.
for (String colName : partCols) {
String value = capturedValues.get(colName);
if (value == null) {
continue;
}
if (result == null) {
result = new StringBuilder(colName + "=" + value);
} else {
result.append(Path.SEPARATOR).append(colName).append("=").append(value);
}

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +88
StringBuffer sb = new StringBuffer();
Matcher m = customPathPattern.matcher(customPattern);
List<String> partColumns = new ArrayList<>();
while (m.find()) {
m.appendReplacement(sb, "(.*)");
partColumns.add(m.group(2));
}
m.appendTail(sb);
return new DynamicPartitioningCustomPattern(customPattern, Pattern.compile(sb.toString()), partColumns);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom pattern is compiled as a regex without escaping special regex characters. If the pattern contains characters like ., *, +, ?, etc., they will be interpreted as regex metacharacters rather than literals. For example, const./${partstate} would match constX/CO where X is any character, not just const./CO. Consider using Pattern.quote() to escape the literal portions of the pattern between the ${} placeholders, or document that the pattern supports regex metacharacters.

Suggested change
StringBuffer sb = new StringBuffer();
Matcher m = customPathPattern.matcher(customPattern);
List<String> partColumns = new ArrayList<>();
while (m.find()) {
m.appendReplacement(sb, "(.*)");
partColumns.add(m.group(2));
}
m.appendTail(sb);
return new DynamicPartitioningCustomPattern(customPattern, Pattern.compile(sb.toString()), partColumns);
StringBuilder sb = new StringBuilder();
Matcher m = customPathPattern.matcher(customPattern);
List<String> partColumns = new ArrayList<>();
int lastEnd = 0;
while (m.find()) {
// Append the literal text before this placeholder, escaped so regex metacharacters are treated literally.
if (m.start() > lastEnd) {
sb.append(Pattern.quote(customPattern.substring(lastEnd, m.start())));
}
// Replace the placeholder itself with a capturing group to extract the partition value.
sb.append("(.*)");
partColumns.add(m.group(2));
lastEnd = m.end();
}
// Append any remaining literal text after the last placeholder.
if (lastEnd < customPattern.length()) {
sb.append(Pattern.quote(customPattern.substring(lastEnd)));
}
Pattern partitionCapturePattern = Pattern.compile(sb.toString());
return new DynamicPartitioningCustomPattern(customPattern, partitionCapturePattern, partColumns);

Copilot uses AI. Check for mistakes.
Comment on lines +1637 to +1641
if (!patternPartCols.isEmpty()) {
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the custom pattern contains no partition column placeholders (empty patternPartCols), the result will be null, which is returned at line 1676. However, this case should be validated earlier and an appropriate error should be thrown, as a partition pattern without any partition columns doesn't make sense. Consider adding validation in the DynamicPartitioningCustomPattern.Builder.build() method to ensure at least one partition column is present.

Suggested change
if (!patternPartCols.isEmpty()) {
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
if (patternPartCols.isEmpty()) {
throw new MetastoreException("Custom dynamic partition pattern must contain at least one partition column: "
+ customPattern);
}
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i + 1));

Copilot uses AI. Check for mistakes.
Comment on lines +1638 to +1641
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom pattern path does not call getNormalisedPartitionValue for the extracted partition values, unlike the standard path (lines 1660-1661). This means numeric partition values won't be normalized (e.g., "01" vs "1") when using custom patterns, leading to inconsistent behavior. The extracted partition values should be normalized using getNormalisedPartitionValue(pathMatcher.group(i+1), partitionColToTypeMap.get(patternPartCols.get(i)), conf) before being appended to the result.

Suggested change
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
String firstColName = patternPartCols.get(0);
String firstRawValue = pathMatcher.group(1);
String firstNormalisedValue = getNormalisedPartitionValue(
firstRawValue, partitionColToTypeMap.get(firstColName), conf);
if (firstNormalisedValue == null) {
return null;
}
result = new StringBuilder(firstColName + "=" + firstNormalisedValue);
}
for (int i = 1; i < patternPartCols.size(); i++) {
String colName = patternPartCols.get(i);
String rawValue = pathMatcher.group(i + 1);
String normalisedValue = getNormalisedPartitionValue(
rawValue, partitionColToTypeMap.get(colName), conf);
if (normalisedValue == null) {
return null;
}
result.append(Path.SEPARATOR).append(colName).append("=").append(normalisedValue);

Copilot uses AI. Check for mistakes.
Comment on lines +1638 to +1641
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partition column names extracted from the custom pattern are not converted to lowercase, unlike the standard path (line 1656). Hive stores partition keys in lowercase, so the extracted column names from patternPartCols.get(i) should be lowercased before being used in the result to maintain consistency with the standard partition name handling.

Suggested change
result = new StringBuilder(patternPartCols.get(0) + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
result.append(Path.SEPARATOR).append(patternPartCols.get(i)).append("=").append(pathMatcher.group(i+1));
String partitionName = patternPartCols.get(0).toLowerCase();
result = new StringBuilder(partitionName + "=" + pathMatcher.group(1));
}
for (int i = 1; i < patternPartCols.size(); i++) {
String partitionName = patternPartCols.get(i).toLowerCase();
result.append(Path.SEPARATOR).append(partitionName).append("=").append(pathMatcher.group(i + 1));

Copilot uses AI. Check for mistakes.
Matcher m = customPathPattern.matcher(customPattern);
List<String> partColumns = new ArrayList<>();
while (m.find()) {
m.appendReplacement(sb, "(.*)");
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The greedy regex pattern (.*) can lead to performance issues and unexpected matching behavior. Consider using a non-greedy pattern (.*?) or, better yet, ([^/]+) to match partition values that don't contain path separators. The greedy pattern requires excessive backtracking when the regex engine tries to match subsequent literals in the pattern, and could incorrectly match values that span multiple path segments.

Suggested change
m.appendReplacement(sb, "(.*)");
m.appendReplacement(sb, "([^/]+)");

Copilot uses AI. Check for mistakes.
Matcher m = customPathPattern.matcher(customPattern);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, partSpec.get(m.group(2)));
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code partSpec.get(m.group(2)) can return null if the partition spec doesn't contain a value for the column name extracted from the pattern. This would result in appendReplacement inserting "null" as a string into the path. Add a null check and throw an appropriate exception if a required partition column is missing from the spec.

Suggested change
m.appendReplacement(sb, partSpec.get(m.group(2)));
String partColName = m.group(2);
String partValue = partSpec.get(partColName);
if (partValue == null) {
throw new HiveException("partition spec is invalid; required partition column '"
+ partColName + "' is missing for custom pattern: " + customPattern);
}
m.appendReplacement(sb, partValue);

Copilot uses AI. Check for mistakes.
Comment on lines +1622 to +1628
if (customPattern != null) {
DynamicPartitioningCustomPattern compiledCustomPattern = new DynamicPartitioningCustomPattern.Builder()
.setCustomPattern(customPattern)
.build();
Pattern customPathPattern = compiledCustomPattern.getPartitionCapturePattern();
//partition columns in order that they appear in the pattern
List<String> patternPartCols = compiledCustomPattern.getPartitionColumns();
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no validation that the partition columns extracted from the custom pattern match the table's partition columns (provided in partCols). The code should verify that: (1) all columns in patternPartCols exist in the table's partition columns, and (2) the number of columns matches. Without this validation, the code could create invalid partition names with columns that don't exist in the table definition or with mismatched column counts.

Copilot uses AI. Check for mistakes.
String relPath = String.join(Path.SEPARATOR,
Arrays.copyOfRange(parts, parts.length - maxDepth, parts.length));
if (!fullPattern.matcher(relPath).matches()) {
logOrThrowExceptionWithMsg("File path " + currentPath + " does not match custom partitioning pattern"
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is missing a space between the pattern and "partitionPathFull". It should be "does not match custom partitioning pattern " instead of "does not match custom partitioning pattern" to improve readability.

Suggested change
logOrThrowExceptionWithMsg("File path " + currentPath + " does not match custom partitioning pattern"
logOrThrowExceptionWithMsg("File path " + currentPath + " does not match custom partitioning pattern "

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants