Skip to content

Conversation

@scott-parillo
Copy link
Collaborator

@scott-parillo scott-parillo commented Dec 6, 2025

Summary

  • The environment size was updated and now defines X-Large. Collapsed multiple tables into a single Elastic Stack infrastructure table to describe the infrastructure recommendations using the Server 2025 GA EW production certification results.
  • Retension Policy Guideline page added.
  • Environment Watch Performance Impact page added.

Test Evidence for Retention Policy Guidelines - SQE: #44 (review)
Test Evidence for Retention Policy Guidelines - DEV: 12_16_2025_Retention_Guideline_Dev_Evidence.docx

…. Collapsed multiple tables into a single Elastic Stack infrastructure table to describe the infrastructure recommendations using the Server 2025 GA EW production certification results.
| Kibana | 1 | 4 |
| APM Server | 1 | 4 |

| Environment Size | Web Servers | Agent Servers | Worker Servers | SQL Distributed Servers |
Copy link

@dinesh1010101 dinesh1010101 Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scott-parillo This is mismatching with the slack thread mentioned numbers(https://kcura-pd.slack.com/archives/C0616SVFYBU/p1764962920595329?thread_ts=1764959199.337929&cid=C0616SVFYBU)

Image

Also mismatching with our initial table here

May i know whether these numbers are intentional or do i have make some changes?

**A few other key notes and reminders:**

- **Tuning for speed** – Review Elastic’s guidance on how to tune the environment for speed [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html).
- **Hosting Elastic** – While the guidance below recommends installing the Elastic components on many dedicated servers, there are no hard requirements to isolate Elasticsearch, Kibana, or APM Server on dedicated hosts. As evident with the Development environment specifications, the full Elastic stack can be deployed on a single host if that server can meet the storage needs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinesh1010101 dinesh1010101 changed the title REL-1207540-telemetry-volume-reduction REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines Dec 15, 2025
| **Processing** | **+450% faster** | Processing performance has improved dramatically, delivering a 450% speed increase that will noticeably accelerate end-to-end workflows. |
| **Review (Conversion)** | **+5% faster** | Review operations saw a modest 5% improvement, providing slightly faster document conversion without any workflow disruption. |
| **Imaging & Production** | **Stable (±4%)** | Imaging and production performance remained stable, with changes within a ±4% range, resulting in no meaningful impact to customer workflows. |
| **Data Transfer** | **Mixed results** | Native file operations improved by 4–38%, offering smoother import/export performance. Image-based workflows saw some declines—most notably a 157% slowdown in RIP image export—which may impact image-heavy projects. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Here we are disclosing 157% slowdown which might give negative perspective to client

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that huge performance increases/decreases have yet to be vetted by the teams. Honestly, it defies logic but I don't want to completely discount. Until this has been thoroughly investigated and a clear conclusion drawn, I would advise removing any such results (positive or negative) until then. As it's currently worded, this sounds concerning but the results indicate otherwise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the info as of now until other team comes to a conclusion
966d848


## Conclusion

Environment Watch delivers significant performance improvements for processing workloads while maintaining stable performance for most other Relativity operations. Organizations with heavy image-based data transfer workflows should evaluate their specific use cases to ensure alignment with their performance requirements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Depends on decision in including RIP image slowdown info, last line need to be modified/removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the info as of now until other team comes to a conclusion
966d848

@dinesh1010101 dinesh1010101 marked this pull request as ready for review December 15, 2025 10:28
@dinesh1010101 dinesh1010101 requested a review from a team as a code owner December 15, 2025 10:28
Copy link
Collaborator

@Rahiman-Nadaf Rahiman-Nadaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed retention policy guidelines and looks good to me

Copy link

@DaRealRahul1 DaRealRahul1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completed reviewing the document. Could you please check the document below and make the necessary changes accordingly.
https://jira.kcura.com/secure/attachment/758132/EW%20Review.docx

@dinesh1010101
Copy link

Thanks @DaRealRahul1 , i have accommodated all your feedback, Please find the list below and respective action.

  1. Mention the steps exactly where need to navigate for Kibana Dev Console - Added steps in 1st step, next steps haven't repeat the same as it will understood in the 1st step itself.
  2. The output is different I can see the extra entry "mode": "standard" in the above doc but in my TestVm I cant see it, as well as the entries present in "composed_of ": is in not in series, If it is expected please mention the same - As it mentioned sample, its already covering it.
  3. Better to mention the GET and PUT command for metrics-apm.app@template and traces-apm@template - Corrected as i also felt the same from user perspective.
  4. In above step its not mention to update the highlighted template according to the template need to be update for metrics-apm.app@template and traces-apm@template - Invalid as i have added seperate steps for metrics and traces as well.
  5. No need to update as the lifecycle is already 90 - Content updated.
  6. Same for this as well mention the step like step 1 - Datastream - Mentioned in initial stage on steps to navigate to DEV console
  7. The outputs are different for all the below commands compared to the above document, please correct it. - Corrected
  8. Where we can check data lifecycle management is active, mention the steps for this as well if possible - Altered wordings.

Requesting you to review and approve the PR.

@dinesh1010101 dinesh1010101 removed their request for review December 18, 2025 04:21
@dinesh1010101 dinesh1010101 changed the title REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines (DO NOT MERGE) - REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines Dec 19, 2025
@dinesh1010101
Copy link

As discussed with @KarunaDhawan , removed the performance impact info. Until teams conclude that information, this PR will be blocked.

CC: @Phani61284 @anujavinash @satishdayala-relativity

@anujavinash anujavinash changed the base branch from main to REL-1238635-cumulative-cum-folder-hierarchy-change December 22, 2025 10:33

These guidelines define retention policies for logs, metrics, and traces collected in Elasticsearch and viewed through Kibana. Proper retention management is critical for:

- **Storage Optimization** – Prevents excessive disk usage by automatically removing outdated data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage and cost are tied together, will be best to combine them here:
Storage Optimization & Cost Control – Prevents excessive disk usage and reduces infrastructure costs by automatically removing outdated data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

```
Docs/Day (Daily Documents) = 6M + (Web_Server_Count × 2M) + (Agent_Server_Count × 2M) + (Worker_Server_Count × 400k) + (SQL_Distributed_Server_Count × 500k)
GiB/Day (Daily Storage) = Docs/Day × 380 / 1024³
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this 380 comes from? Perhaps we should explain what that number is?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated the same.

### Step 3: Delete Existing Data Streams (Setup Time Only)

> [!WARNING]
> This step should only be performed once during initial setup. Deleting data streams will permanently remove all data and indices under those data streams.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data stream deletion step is destructive. Can we make the warning more prominent or clarify when it is safe to run?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, Updated the same.

**Sample Request:**

```
# Here logs-apm.app@template is the name of the index template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Logs, Metrics, and Traces sections repeat the same workflow. Can we describe the pattern once and only show full JSON once?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, I had kept it the same way. However, during SQE verification, I received feedback that they felt stuck or confused and requested that all three be shown separately, so users can follow more smoothly. However, please let me know if you still prefer combining them into a single section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinesh1010101 If compressing into one JSON causes confusion, perhaps we can keep it how it is set up currently, thanks

@dinesh1010101 dinesh1010101 changed the title (DO NOT MERGE) - REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines REL-1207540-telemetry-volume-reduction & REL-1224050-Retension-Policy-Guidelines Jan 23, 2026
### Purpose

These guidelines define retention policies for logs, metrics, and traces collected in Elasticsearch and viewed through Kibana. Proper retention management is critical for:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinesh1010101 Please add EW context clarifying that configuring Elasticsearch retention policies is optional. Environment Watch operates correctly using default retention settings, and this configuration should be applied only when customers need to customize retention behavior. This helps prevent readers from assuming the configuration is required.
Note text could be something like this: Configuring Elasticsearch retention policies is optional. Environment Watch works out of the box using default retention settings. The configurations described here should be applied only if you need to customize how long data is retained to align with your organization’s storage, performance, or compliance requirements.

**Sample Request:**

```
# Here logs-apm.app@template is the name of the index template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinesh1010101 If compressing into one JSON causes confusion, perhaps we can keep it how it is set up currently, thanks

> - You are performing **initial setup** and no production data exists yet
>
> **Do NOT run this on production systems with active data.**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the Note but still it seems like this is a necessary step by the users. Pls update this to explicitly mark the step as optional and intended only for initial setup or controlled scenarios. For example:

Step 3: Delete Existing Data Streams (Setup Time Only)

⚠️ DESTRUCTIVE OPERATION – PERMANENT DATA LOSS

This step is optional and is not required for most Environment Watch deployments. It should only be performed during initial setup or in controlled, non-production scenarios.

This step will permanently delete all data and indices in the specified data streams. There is no recovery. Only proceed if:

  • You are in a development or non-production environment, OR
  • You have backed up all critical data from these data streams, OR
  • You are performing initial setup and no production data exists yet

Do NOT run this on production systems with active data.

}
```

> [!IMPORTANT]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note is great but will be helpful to have it right before the Index template update steps.


## Overview

This document provides transparent information about the performance overhead Environment Watch introduces to standard Relativity workloads, based on comprehensive testing in a production-like environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update this to: This document provides transparent information about the performance overhead Environment Watch introduces to standard Relativity workloads, based on testing conducted in a controlled, production-like environment. Actual performance may vary depending on workload characteristics, environment size, infrastructure configuration, and usage patterns.

| **Review (Conversion)** | **+5% faster** | Review operations saw a modest 5% improvement, providing slightly faster document conversion without any workflow disruption. |
| **Imaging & Production** | **Stable (±4%)** | Imaging and production performance remained stable, with changes within a ±4% range, resulting in no meaningful impact to customer workflows. |
| **Data Transfer** | **~5-6% faster on average** | Data transfer operations showed performance improvements with Imports demonstrating ~10% faster performance on average, while Exports (excluding RIP Images Export) were ~1% faster on average, resulting in an approximate 5-6% overall improvement. |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add a note before the table: The results below reflect observed outcomes from internal testing and are provided for transparency. These results should not be interpreted as guaranteed performance improvements for all Environment Watch deployments.


## Conclusion

Environment Watch has demonstrated minimal to positive impact on Relativity workloads across comprehensive testing. Most operations showed performance improvements, with Processing, Data Transfer, and Review all performing faster. Imaging and Production workflows remained stable. These results confirm that Environment Watch provides valuable observability and monitoring capabilities without compromising your Relativity system's performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls update this to something like this as the results may vary in diff env: Environment Watch has demonstrated minimal to positive impact on Relativity workloads based on comprehensive testing in a controlled, production-like environment. Most operations showed performance improvements, with Processing, Data Transfer, and Review performing faster, while Imaging and Production workflows remained stable. Environment Watch is designed to deliver observability and monitoring capabilities with minimal overhead; however, actual performance results may vary based on customer-specific configurations, environment size, and workload characteristics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants