Skip to content

Fix initial format namenode corruption: Disable restart-controller #750

@maltesander

Description

@maltesander

Since the restart controller was enabled in #743 we had more and more test failures due to namenodes not coming up.

Description

The problem is the format-namenode init container, which checks if a certain file /stackable/data/namenode/current/VERSION is there.
If it exists, we assume that everything is formatted properly and do not format the node.

In the failed tests, the /stackable/data/namenode/current/VERSION file was there, but the directory was missing other important files like fsimage_xxx that would be there with a correct format. This leads to the fsImage not found error in the tests.

We think that increased restarts, partially introduced with the restart-controller (which just makes the problem appear more often, it existed before), as one of the problems that might interfere with the proper namenode formatting, or will interrupt an ongoing format and leave corrupt data / directory.

We identified one safe restart that is due to the HDFS ZNode ConfigMap (!) being applied AFTER the HDFS cluster, which happens in most product tests, but e.g. not the HDFS smoke test itself (previously it didn't use any ZNode). This results in a restart for journal and namenodes.

Now we tried to check if there are any "fsimage_xxx" files at all, and only then skip the formatting in the format-namenodes script.

But the format script itself checks if the directory exists and / or is empty:

Running in non-interactive mode, and data appears to exist in Storage Directory root= /stackable/data/namenode; location= null. Not formatting.

https://github.com/apache/hadoop/blob/fd18f0647dc1cae191925ac0f593b5962effc583/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java#L1191

TLDR: The problem exists only for fresh clusters (which is good), and we have a concept of a plan to solve it!

We thought about a couple of options for the case: VERSION file exists, fsimage does not.

  1. Error out, never start the namenode and let administrators clean up that PVC / folder? Blocking the cluster.
  2. Move the /stackable/data/namenode/current/ directory to something like /stackable/data/namenode/current_bk/ and restart the formatting, therefore not losing any existing data but not blocking the cluster (but clutter up the PVC)?
  3. Force format (cli option), since there are no fsimages and we can not delete any data (not too sure about this one)?

Tasks

We decided for Option 1.

  • Disable restart-controller
  • Emit a warning if the VERSION but e.g. no fsimage_xxx file exists
  • Document the behavior and add a troubleshoot guide

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions