Skip to content

Comments

task(34647) ContentletFactory migration to OS#34691

Open
fabrizzio-dotCMS wants to merge 24 commits intomainfrom
issue-34647-ContentletFactory-OS-Migration
Open

task(34647) ContentletFactory migration to OS#34691
fabrizzio-dotCMS wants to merge 24 commits intomainfrom
issue-34647-ContentletFactory-OS-Migration

Conversation

@fabrizzio-dotCMS
Copy link
Contributor

@fabrizzio-dotCMS fabrizzio-dotCMS commented Feb 18, 2026

Proposed Changes


🔧 Refactor: Separate Index Operations and Add OpenSearch Migration Foundation

Summary

This PR introduces a major architectural refactoring to separate index operations from the main content factory implementation and establishes the foundation for OpenSearch migration. This represents the beginning of a clean separation
of concerns between content management and search engine operations.

🚀 Key Changes

  1. Index Operations Extraction & Abstraction
  • Extracted index operations from ESContentFactoryImpl into dedicated, specialized classes
  • Created ContentFactoryIndexOperations interface - pure contract for search engine agnostic operations
  • Implemented ContentFactoryIndexOperationsES - Elasticsearch-specific implementation
  • Implemented ContentFactoryIndexOperationsOS - OpenSearch-specific implementation (partial)
  1. Search Engine Agnostic Domain Objects
  • Created immutable domain objects to eliminate direct library dependencies:
    • SearchHit - unified representation of individual search results
    • SearchHits - collection of search results with metadata
    • TotalHits - search count information with relation metadata
    • Relation - enum for count precision (exact vs estimate)
  • Factory methods for seamless conversion between Elasticsearch/OpenSearch types and domain objects
  • JSON serialization support for REST APIs and caching
  1. Migration Documentation & Analysis Annotations
  • @IndexMetadata - Documents current migration state and index interaction patterns
  • @IndexLibraryIndependent - Enforces architectural purity for search engine agnostic APIs
  • ArchUnit tests to prevent library-specific dependencies in abstraction layers
  1. Improved Code Quality
  • Comprehensive JavaDoc documentation with usage examples and architectural explanations
  • @inheritdoc annotations for implementation classes to maintain documentation consistency
  • Type-safe immutable objects using Immutables library

🔄 Architecture Benefits

Before:

ESContentFactoryImpl ──> Elasticsearch APIs (tightly coupled)

After:

ContentFactory ──> ContentFactoryIndexOperations (interface)
├── ContentFactoryIndexOperationsES
└── ContentFactoryIndexOperationsOS

⚠️ Current Limitations (OpenSearch Implementation)

Incomplete Features:

  • Scroll API implementation - Currently throws UnsupportedOperationException
  • Missing test coverage for OpenSearch implementation
  • Basic indexSearchScroll() - Uses simple pagination instead of true scroll functionality

Planned Next Steps:

  1. Complete OpenSearch Scroll API implementation
  2. Add comprehensive integration tests for OpenSearch operations
  3. Implement dynamic selection mechanism between ES/OS implementations
  4. Performance benchmarking and optimization

🏗️ Architectural Impact

Separation of Concerns:

  • Content operations remain in content factory classes
  • Search operations isolated in specialized index operation classes
  • Domain objects provide clean abstraction without vendor lock-in

Future Migration Benefits:

  • Zero-downtime migration capability between search engines
  • A/B testing different search engine implementations
  • Unified API surface regardless of underlying search technology

📋 Testing Status

  • ✅ Elasticsearch operations - Fully functional with existing test coverage
  • ⚠️ OpenSearch operations - Basic functionality implemented, tests needed
  • ✅ Domain object conversions - Factory methods tested
  • ✅ ArchUnit rules - Architectural constraints enforced

⚠️ Note: This is a foundational PR. OpenSearch implementation requires additional work for production readiness, particularly Scroll API functionality and comprehensive testing.

This PR fixes: #34647

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Feb 18, 2026
@fabrizzio-dotCMS fabrizzio-dotCMS marked this pull request as ready for review February 21, 2026 02:47
}

// Use a large limit to simulate scroll behavior
final int maxResults = 10000; // Consider making this configurable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire point of scroll is to handle datasets larger than the result window. Loading 10k results in one shot defeats this. This isn't a stub it's a working-ish
implementation that will fail silently for datasets > 10k. Should either truly implement OS Scroll API (using scroll_id / PIT) or throw UnsupportedOperationException
like createScrollQuery does, so callers know it's not ready. I am OK if we intend to come back to this shortly, just concerned leaving it like this for production and potentially causing memory issues down the line.

Copy link
Contributor Author

@fabrizzio-dotCMS fabrizzio-dotCMS Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already have another PR with the scroll API implementation. I did not include it here to keep it from being too large.


// Set timeout
//Time.of(t -> t.time(String.valueOf(INDEX_OPERATIONS_TIMEOUT_IN_MS) + "ms"))
searchRequestBuilder.timeout("");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to break, or have no timout are we sure this does not error, or if not set will it leave all OS queries unprotected against runaway searches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I generated these, and the moment I was going to write a test, I realized I couldn't because I didn't have any mapping in place. The OS index at this point isn't writable. I have to come back full circle on these, adding some testing (which I totally plan to do), but as an initial PR, the refactoring served its purpose.
Nice catch!

import org.elasticsearch.search.sort.SortOrder;
import org.jetbrains.annotations.NotNull;

public class ContentFactoryIndexOperationsES implements ContentFactoryIndexOperations {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some javadoc would be helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[TASK] Migrate ESContentFactoryImpl

3 participants