Skip to content

Write an classloader that replaces VFS functionality. #27

@ddanielr

Description

@ddanielr

This ticket contains a potential solution for NationalSecurityAgency/datawave-accumulo-plugins#2

I did not add these details directly in order to avoid conflating specific implementation requirements with the currently known requirements.

Stage 1. Define possible components

  • Top Level SimpleHDFSClassLoaderFactory class
  • HDFS file fetcher (pluggable for testing)
  • ContextPath structure
  • Manifest File structure
  • Context cleanup thread

Context path structure

The context path should be similar to the following
hdfs://test:8020/contexts/contextA/manifest.json
This manifest file format should be machine readable. (Json is not required but used for this example)
Contexts are re-loadable due to limitations in client code.

Directory and Manifest file structure

The directory should contain a manifest file and jars.

/tmp/local-contexts/contextA/manifest.json
/tmp/local-contexts/contextA/Iterators.jar
/tmp/local-contexts/contextA/IteratorsV2.jar

The manifest file should consist of jar names and checksum values.

{
  "context": "contextA",
  "jars": [
    {
      "name": "Iterators.jar",
      "checksum": "f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2"
    },
    {
      "name": "IteratorsV2.jar",
      "checksum": "934ee77f70dc82403618c154aa63af6f4ebbe3ac1eaf22df21e7e64f0fb6643d"
    }
  ]
}

Stage 2. Create Factory

Create a SimpleHDFSClassLoaderFactory that implements the ContextClassLoaderFactory interface.

This class should use a cache that quickly returns classloaders for already defined context names.
This cache should store the classloader and the local directory used for the contextPath file cache.

The class should perform a property lookup to get the corresponding contextPath for a given context name. See the ContextManager class

It should resolve contextPaths to local directories and attempt to load classes from there.

Local File Cache Directory Resolution

This class should resolve context paths to a local directory location based off the immediate parent directory of the manifest file.

The local directory location should be a user-defined directory. (Similar to the VFS_CACHE_DIR property)

As an example:
ContextPath: hdfs://test:8020/contexts/contextA/manifest.json
User-defined dir: /tmp/local-contexts
Resolved dir: /tmp/local-contexts/contextA

The class should throw an error if that directory doesn't exist.

Once the directory is confirmed to exist, the class should use the manifest file to validate jars and generate a new list of jar urls. This list will then used to create a new URLClassloader.

This new classloader should be cached, along with the file cache dir, and then returned to the ClassLoaderUtil.

Write a test that can stage directory creation and a jar file, then successfully load a class from that jar using the SimpleHDFSClassLoaderFactory.

Stage 3. Fetch Files from HDFS

Create a class that will perform the following steps when given a manifest file location:

  1. Create a lock file in the user-defined directory.
    /tmp/local-contexts/contextA.lock
  2. Create a unique temp directory for the context
    /tmp/local-contexts/tmp-contextA-<uuid>
  3. Download the manifest file to the temp dir and use the contents to copy and validate defined jars from the source HDFS location to the tmp dir.
  4. Perform a rename option on the directory to promote it to the new context name.
    /tmp/local-contexts/tmp-contextA-<uuid> -> /tmp/local-contexts/contextA
  5. Delete the lock file.

Modify the SimpleHDFSClassLoaderFactory to use this class when the local context directory doesn't exist and the lock file also doesn't exist.

Write an IT for testing loading classes from HDFS using the SimpleHDFSClassLoaderFactory in a single Tserver.

Stage 4. Support multiple processes

Modify the SimpleHDFSClassLoaderFactory to do the following:

  1. Check if the lock file exists and wait to load classes until a user-defined period of time has passed since lock file modification.
  2. If the wait is achieved, the class should touch the lock file to reset it's modification date and proceed with fetching files from HDFS.

Stage 5. Cleanup old contexts

Start a thread that looks at property definitions every minute and if contexts are not defined, they should be removed from the cache.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions