Skip to content

Conversation

@Kota-SH
Copy link
Contributor

@Kota-SH Kota-SH commented Dec 16, 2025

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@Apache9 Apache9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the correct way, at least you should not let clients config meta table name right? They should ask the meta table name from bootstrap node.

Theoretically, we can even support changing meta table name at runtime, clients and region servers can ask meta table name from master when they hit table not found exception when accessing meta.

And if we think this is a bit difficult to implement, and maybe there is no need to change meta table name after initialization, then maybe we can introduce a config to set meta table name when bootstraping a cluster, then we record this name in master local region or on HDFS directly. But anyway, clients and region servers still need to ask for the meta table name from master, not from configuration. But they just do not need to deal with meta table name change when running.

Thanks.

if (instance == null) {
synchronized (MetaTableName.class) {
if (instance == null) {
instance = initializeHbaseMetaTableName(HBaseConfiguration.create());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static, what if a client wants to connect different read replica clusters? And it will difficult to write UTs when you want to set up two clusters with different meta table names...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static

I agree, this does not change the static referring.

what if a client wants to connect different read replica clusters?

Could clients use different configurations for each cluster being accessed?

@wchevreuil
Copy link
Contributor

And if we think this is a bit difficult to implement, and maybe there is no need to change meta table name after initialization, then maybe we can introduce a config to set meta table name when bootstraping a cluster, then we record this name in master local region or on HDFS directly. But anyway, clients and region servers still need to ask for the meta table name from master, not from configuration. But they just do not need to deal with meta table name change when running.

Could be as simple as the hbase.id approach (a single file under the root dir containing the meta suffix generated during bootstrap)? Then all processes would load that file when initialising?

Comment on lines 54 to 58
private static TableName initializeHbaseMetaTableName(Configuration conf) {
TableName metaTableName =
TableName.valueOf(NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR, "meta");
LOG.info("Meta table suffix value: {}", metaTableName);
return metaTableName;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the conf being used here? And the suffix?

if (instance == null) {
synchronized (MetaTableName.class) {
if (instance == null) {
instance = initializeHbaseMetaTableName(HBaseConfiguration.create());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is a good practise, as this is still static

I agree, this does not change the static referring.

what if a client wants to connect different read replica clusters?

Could clients use different configurations for each cluster being accessed?

@anmolnar
Copy link
Contributor

Shall we add meta table name to ConnectionRegistry?
ConnectionFactory will fetch it from active master or get it from ZooKeeper and set it in the Connection object where clients will find it easily.

ZooKeeper quorums will be distinct between replica clusters, so it could be a good place to store meta table name during the bootstrap process. We don't need to support changing it in runtime.

For instance, in cluster bootstrap process:

  • read configuration for suffix / replica id setting "replica1"
<property>
    <name>hbase.meta.table.suffix</name>
    <value>replica1</value>
 </property>
  • generate meta table name based on the above setting:
    • if suffix is not set: "hbase:meta"
    • if suffix is set: "hbase:meta_386330f"
  • store generated table name in ZooKeeper and retrieve it via ConnectionRegistry

wdyt?

@Apache9
Copy link
Contributor

Apache9 commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

@anmolnar
Copy link
Contributor

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

@Apache9
Copy link
Contributor

Apache9 commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

@anmolnar
Copy link
Contributor

anmolnar commented Dec 19, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

So basically, we read from the config:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

and create a file on HDFS: hbase.replica1

meta_table_name=hbase:meta_3a63c0f

?

@Apache9
Copy link
Contributor

Apache9 commented Dec 20, 2025

Let's not introduce new persistent data on zookeeper, we could store it on HDFS on in the master local region. Maybe we should introduce a shell command like 'add_read_replica_cluster xxx' and execute it at the primary cluster, the primary cluster will generate the meta name suffix file or record for read replica clusters.

HDFS won't be sufficient, because it's shared between the clusters. Could be in master local region, but that doesn't contradict with making it accessible via ConnectionRegistry. Is that correct?

We coud use different file names for different read replica clusters?

So basically, we read from the config:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

and create a file on HDFS: hbase.replica1

meta_table_name=hbase:meta_3a63c0f

?

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

@anmolnar
Copy link
Contributor

anmolnar commented Jan 9, 2026

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

We can certainly do that, but in that case I'm not sure how to generate meta table names for the replica clusters. They have to be unique at the same storage location and with files we would able to scan existing files for conflicts. By storing it in master local region I'm not sure how to do it, unless we use UUIDs as suffixes.

Same issue arises with the directory name of Master local region of a replica cluster. Shall we use the configured replica name or the generated suffix?

Actually this might be the right way:

  1. Generate suffix for replica: "3a63c0f"
  2. Try to create master local region directory: "MasterData_3a63c0f"
  3. If succeeds, use this suffix for meta table name: "meta_3a63c0f"
  4. If not, go back to step 1.

Question: how would the replica cluster know about its suffix?

@Apache9
Copy link
Contributor

Apache9 commented Jan 10, 2026

Was thinking of this approach. But after considering, if we could use different master local region for different HBase read replicas, we could just store this value in master local region? In this way the primary cluster does not need to know the read replicas. Maybe this is better?

We can certainly do that, but in that case I'm not sure how to generate meta table names for the replica clusters. They have to be unique at the same storage location and with files we would able to scan existing files for conflicts. By storing it in master local region I'm not sure how to do it, unless we use UUIDs as suffixes.

Same issue arises with the directory name of Master local region of a replica cluster. Shall we use the configured replica name or the generated suffix?

Actually this might be the right way:

  1. Generate suffix for replica: "3a63c0f"
  2. Try to create master local region directory: "MasterData_3a63c0f"
  3. If succeeds, use this suffix for meta table name: "meta_3a63c0f"
  4. If not, go back to step 1.

Question: how would the replica cluster know about its suffix?

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

@anmolnar
Copy link
Contributor

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

That's where we started. It's already there:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

I think we started the conversation about whether we should use this replica.id in the meta table and master region directory name or generate a random value for each replica?

@Apache9
Copy link
Contributor

Apache9 commented Jan 13, 2026

Anyway you need something in the configuration to tell a cluster that you are a read replica cluster or not, so you can just add a configuration named replica_suffix?

That's where we started. It's already there:

<property>
    <name>hbase.read.replica.id</name>
    <value>replica1</value>
</property>

I think we started the conversation about whether we should use this replica.id in the meta table and master region directory name or generate a random value for each replica?

As long as the replica id must be unique, I think use it as master local region's suffix is enough? For meta table, I'm nature on whether to reuse this id too, or we could generate one and store it in master local region, seems the former one is enough but the latter is more flexible, and as said above, theoretically we could even change the meta table name.

Thanks.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Apache9

I don't think we're getting anywhere with this patch and I tend to stop wasting more efforts here. We might be able to find an easier approach for integration testing with multiple clusters.

What do you think about this new approach? Is it the same what was on your mind?

Comment on lines 73 to 74
@Deprecated
public static TableName META_TABLE_NAME =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still being used somewhere? Shouldn't we just remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There no usages for this in the current patch. I have changed all the references to either use connection or hard coded "hbase:meta" (in just a couple of places but majorly in tests). I marked it Deprecated for backward compatibility.
We can still use this deprecated var in all those above mentioned references to make this patch smaller but needs some rework later and also here. We would need another PR later, maybe after read replica, to delete this from TableName and move this into TEST_UTILS.

// For now, always return the default meta table name.
// Future implementations may support custom meta table names from configuration or storage.
return org.apache.hadoop.hbase.TableName
.valueOf(org.apache.hadoop.hbase.NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR, "meta");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it still hardcoded?
You're already initializing it in HMaster properly and just need to return it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed updating this one. Will make this abstract and implement this method in HRegionServer as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As RegionServer does not have access to Master Region, I left this as default here and override the method in HRegionServer to use Connection. However during bootstrap, we would need meta table name for creating connection object, so only during bootstrap it falls back to default (here).

Further, when we implement the config for creating the meta table name suffix, we can add that calculation logic here and pass the name along to HMaster to store it in Master Region during bootstrap. Please let me know your thoughts.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 2s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for branch
+1 💚 mvninstall 2m 54s master passed
+1 💚 compile 10m 26s master passed
+1 💚 checkstyle 4m 37s master passed
+1 💚 spotbugs 10m 7s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 2m 52s the patch passed
+1 💚 compile 10m 20s the patch passed
+1 💚 cc 10m 20s the patch passed
-0 ⚠️ javac 0m 48s /results-compile-javac-hbase-client.txt hbase-client generated 1 new + 127 unchanged - 0 fixed = 128 total (was 127)
-0 ⚠️ javac 3m 16s /results-compile-javac-hbase-server.txt hbase-server generated 1 new + 192 unchanged - 1 fixed = 193 total (was 193)
+1 💚 blanks 0m 1s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 15s /results-checkstyle-hbase-common.txt hbase-common: The patch generated 3 new + 2 unchanged - 0 fixed = 5 total (was 2)
+1 💚 rubocop 0m 3s No new issues.
+1 💚 spotbugs 11m 49s the patch passed
+1 💚 hadoopcheck 11m 2s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 hbaseprotoc 4m 45s the patch passed
+1 💚 spotless 0m 44s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 2m 11s The patch does not generate ASF License warnings.
88m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/10/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7558
JIRA Issue HBASE-29691
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless cc buflint bufcompat hbaseprotoc rubocop
uname Linux a9e80cb6a1a8 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fe30eec
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-common hbase-hadoop-compat hbase-client hbase-zookeeper hbase-balancer hbase-server hbase-mapreduce hbase-diagnostics hbase-testing-util hbase-thrift hbase-shell hbase-backup hbase-it hbase-rest U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7558/10/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants