Skip to content

Conversation

@abdulraqeeb33
Copy link
Contributor

@abdulraqeeb33 abdulraqeeb33 commented Dec 28, 2025

Description

One Line Summary

Logging, Crash handling using Otel

Details

Motivation

We want to be able to capture logs, crashes and ANRs - report them to our servers so that we can inspect and fix them on devices.

Scope

Major Architectural Changes and Improvements from this implementation PR

1. Platform-Agnostic Design & Dependency Reduction

  • Created OtelIdResolver - Reads OneSignal IDs directly from SharedPreferences (no service dependencies)
  • Refactored OtelPlatformProvider - Uses OtelIdResolver internally, removing all service dependencies
  • Created factory function createAndroidOtelPlatformProvider(context) - Single parameter, no services needed
  • Removed service dependencies from initializeOtelLogging() - No longer needs IInstallIdService, ConfigModelStore, IdentityModelStore

2. Initialization Consolidation

  • Converted OneSignalCrashLogInit from object to class - Manages platform provider lifecycle as instance variable
  • Platform provider created once and reused - Single instance used for both crash handler and logging initialization
  • Early crash handler initialization - Initialized before other services to capture all crashes

3. Component Simplification

  • OneSignalCrashHandlerFactory - Simplified to only need Context and IOtelLogger
  • OtelFactory.createCrashHandler - Refactored to compose other factory methods, reducing duplication

Code Organization & Centralization

4. Constants & Utilities

  • Created AnrConstants.kt - Centralized DEFAULT_ANR_THRESHOLD_MS and DEFAULT_CHECK_INTERVAL_MS
  • Centralized isOneSignalAtFault() logic - Moved to OtelCrashHandler.kt (shared by crash handler and ANR detector)
  • Renamed constants - CONFIG_NAME_SPACE, IDENTITY_NAME_SPACE to match naming conventions (updated 77 usages across 8 files)

5. Code Quality Improvements

  • Extracted helper methods in OtelAnrDetector - Reduced complexity (LongMethod, NestedBlockDepth)
  • Fixed all detekt issues - Added appropriate @Suppress annotations where needed
  • Improved exception handling - More specific exception types where appropriate

Test Coverage & Reliability

6. Comprehensive Test Suite

  • Created OtelIdResolverTest.kt - Comprehensive tests for all ID resolution methods with fallback scenarios
  • Created OtelPlatformProviderTest.kt - Tests for platform provider properties and factory function
  • Created OneSignalCrashUploaderWrapperTest.kt - Tests for wrapper initialization and error handling
  • Updated integration tests - OtelIntegrationTest, CrashReportUploadTest now use Context + SharedPreferences

OPTIONAL - Other

OPTIONAL - Feel free to add any other sections or sub-sections that can explain your PR better.

Testing

Unit testing

Lot of unit testing

Manual testing

Tested by manually crashing the app

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
    • If it is hard to explain how any codes changes are related to each other then it most likely needs to be more than one PR
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
    • Simplify with less code, followed by splitting up code into well named functions and variables, followed by adding comments to the code.
  • I have reviewed this PR myself, ensuring it meets each checklist item
    • WIP (Work In Progress) is ok, but explain what is still in progress and what you would like feedback on. Start the PR title with "WIP" to indicate this.

This change is Reviewable

jkasten2 and others added 13 commits December 8, 2025 18:39
Adding disk-buffering to otel logging required bumping to Kotlin 2.2.
Adding the code to implement this in a follow up commit.
When the app is started again we send any pending crash reports.
Also refactored crash handle classes into their own namespace.

Refactored IOneSignalOpenTelemetry properties into suspend functions,
as some values we want to add to all Otel requests were suspend.
Also made all otel types internal and a few other misc cleanup.
We need to start it sooner to catch crashes in things like
other IStartableService's.
Models can throw if something goes wrong initializing them.
@abdulraqeeb33 abdulraqeeb33 changed the base branch from otel-crash-reporting to main December 28, 2025 20:21
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

📊 Diff Coverage Report

Diff Coverage Report

Threshold: 80%

Changed Files Coverage

  • ⚠️ MainApplicationKT.kt: Not in coverage report (may not be compiled/tested)
  • ⚠️ CoreModule.kt: Not in coverage report (may not be compiled/tested)
  • IParamsBackendService.kt: 0/35 lines (0.0%)
    • ⚠️ Below threshold: 35 uncovered lines
  • ParamsBackendService.kt: 0/95 lines (0.0%)
    • ⚠️ Below threshold: 95 uncovered lines
  • ConfigModel.kt: 46/143 lines (32.2%)
    • ⚠️ Below threshold: 97 uncovered lines
  • ConfigModelStore.kt: 0/2 lines (0.0%)
    • ⚠️ Below threshold: 2 uncovered lines
  • ConfigModelStoreListener.kt: 0/67 lines (0.0%)
    • ⚠️ Below threshold: 67 uncovered lines
  • HttpClient.kt: 131/151 lines (86.8%)
  • Time.kt: 2/3 lines (66.7%)
    • ⚠️ Below threshold: 1 uncovered lines
  • OneSignalCrashHandlerFactory.kt: 0/9 lines (0.0%)
    • ⚠️ Below threshold: 9 uncovered lines
  • OneSignalCrashUploaderWrapper.kt: 0/11 lines (0.0%)
    • ⚠️ Below threshold: 11 uncovered lines
  • OtelAnrDetector.kt: 0/99 lines (0.0%)
    • ⚠️ Below threshold: 99 uncovered lines
  • Logging.kt: 52/102 lines (51.0%)
    • ⚠️ Below threshold: 50 uncovered lines
  • AndroidOtelLogger.kt: 0/9 lines (0.0%)
    • ⚠️ Below threshold: 9 uncovered lines
  • OtelIdResolver.kt: 0/95 lines (0.0%)
    • ⚠️ Below threshold: 95 uncovered lines
  • OtelPlatformProvider.kt: 0/76 lines (0.0%)
    • ⚠️ Below threshold: 76 uncovered lines
  • OneSignalCrashLogInit.kt: 0/52 lines (0.0%)
    • ⚠️ Below threshold: 52 uncovered lines
  • OneSignalImp.kt: 60/262 lines (22.9%)
    • ⚠️ Below threshold: 202 uncovered lines
  • IdentityModelStore.kt: 1/3 lines (33.3%)
    • ⚠️ Below threshold: 2 uncovered lines
  • NotificationRestoreWorkManager.kt: 0/31 lines (0.0%)
    • ⚠️ Below threshold: 31 uncovered lines
  • OneSignalOpenTelemetry.kt: 0/66 lines (0.0%)
    • ⚠️ Below threshold: 66 uncovered lines
  • OtelFactory.kt: 0/23 lines (0.0%)
    • ⚠️ Below threshold: 23 uncovered lines
  • OtelLoggingHelper.kt: 0/31 lines (0.0%)
    • ⚠️ Below threshold: 31 uncovered lines
  • OtelFieldsPerEvent.kt: 0/13 lines (0.0%)
    • ⚠️ Below threshold: 13 uncovered lines
  • OtelFieldsTopLevel.kt: 0/21 lines (0.0%)
    • ⚠️ Below threshold: 21 uncovered lines
  • OtelConfigCrashFile.kt: 0/19 lines (0.0%)
    • ⚠️ Below threshold: 19 uncovered lines
  • OtelConfigRemoteOneSignal.kt: 0/18 lines (0.0%)
    • ⚠️ Below threshold: 18 uncovered lines
  • OtelConfigShared.kt: 0/18 lines (0.0%)
    • ⚠️ Below threshold: 18 uncovered lines
  • OtelCrashHandler.kt: 0/42 lines (0.0%)
    • ⚠️ Below threshold: 42 uncovered lines
  • OtelCrashReporter.kt: 0/33 lines (0.0%)
    • ⚠️ Below threshold: 33 uncovered lines
  • OtelCrashUploader.kt: 0/29 lines (0.0%)
    • ⚠️ Below threshold: 29 uncovered lines

Overall Coverage

292/1558 lines covered (18.7%)

❌ Coverage Check Failed

Files below 80% threshold:

  • IParamsBackendService.kt: 0.0% (35 uncovered lines)

  • ParamsBackendService.kt: 0.0% (95 uncovered lines)

  • ConfigModel.kt: 32.2% (97 uncovered lines)

  • ConfigModelStore.kt: 0.0% (2 uncovered lines)

  • ConfigModelStoreListener.kt: 0.0% (67 uncovered lines)

  • Time.kt: 66.7% (1 uncovered lines)

  • OneSignalCrashHandlerFactory.kt: 0.0% (9 uncovered lines)

  • OneSignalCrashUploaderWrapper.kt: 0.0% (11 uncovered lines)

  • OtelAnrDetector.kt: 0.0% (99 uncovered lines)

  • Logging.kt: 51.0% (50 uncovered lines)

  • AndroidOtelLogger.kt: 0.0% (9 uncovered lines)

  • OtelIdResolver.kt: 0.0% (95 uncovered lines)

  • OtelPlatformProvider.kt: 0.0% (76 uncovered lines)

  • OneSignalCrashLogInit.kt: 0.0% (52 uncovered lines)

  • OneSignalImp.kt: 22.9% (202 uncovered lines)

  • IdentityModelStore.kt: 33.3% (2 uncovered lines)

  • NotificationRestoreWorkManager.kt: 0.0% (31 uncovered lines)

  • OneSignalOpenTelemetry.kt: 0.0% (66 uncovered lines)

  • OtelFactory.kt: 0.0% (23 uncovered lines)

  • OtelLoggingHelper.kt: 0.0% (31 uncovered lines)

  • OtelFieldsPerEvent.kt: 0.0% (13 uncovered lines)

  • OtelFieldsTopLevel.kt: 0.0% (21 uncovered lines)

  • OtelConfigCrashFile.kt: 0.0% (19 uncovered lines)

  • OtelConfigRemoteOneSignal.kt: 0.0% (18 uncovered lines)

  • OtelConfigShared.kt: 0.0% (18 uncovered lines)

  • OtelCrashHandler.kt: 0.0% (42 uncovered lines)

  • OtelCrashReporter.kt: 0.0% (33 uncovered lines)

  • OtelCrashUploader.kt: 0.0% (29 uncovered lines)

📥 View workflow run

@abdulraqeeb33 abdulraqeeb33 marked this pull request as ready for review January 6, 2026 22:44
@abdulraqeeb33 abdulraqeeb33 changed the title Ar otel crash reporting feat: tel crash reporting Jan 6, 2026

object SdkLoggerProviderConfig {
// NOTE: Switch to https://sdklogs.onesignal.com:443/sdk/otel when ready
const val BASE_URL = "https://api.staging.onesignal.com/sdk/otel"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkasten2 we need to externalize this.

/**
* Hardcoded error appId prefix when appId cannot be resolved.
*/
private const val ERROR_APP_ID_RESOLVE = "00000000-0000-4000-a000-000000000000"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkasten2 @jlbelmonte please check this out.
we will be adding this in the ConfigCat

@claude
Copy link

claude bot commented Jan 28, 2026

Issue 1: Infinite recursion in crashApp()

File: Examples/OneSignalDemo/app/src/main/java/com/onesignal/sdktest/application/MainApplicationKT.kt:116

This recursive call to crashApp() will cause infinite recursion and a StackOverflowError before the intended RuntimeException can be thrown.

The function calls itself immediately, so the code below (creating the date and throwing the RuntimeException) will never execute. This line should be removed.

See:

}
private fun crashApp() {
val sdf = SimpleDateFormat(
"MMM dd, yyyy HH:mm:ss",
Locale.getDefault()
)
crashApp()
val currentTimeMillis = System.currentTimeMillis()
val date = Date(currentTimeMillis)
val formattedDate = sdf.format(date)
throw RuntimeException("test crash from AR $formattedDate")


Issue 2: Test expects wrong attribute key

File: OneSignalSDK/onesignal/otel/src/test/java/com/onesignal/otel/attributes/OtelFieldsPerEventTest.kt:32

The test expects the attribute key "android.app.state" but the implementation in OtelFieldsPerEvent.kt uses "app.state". This mismatch will cause the test to fail.

Implementation (OtelFieldsPerEvent.kt:38):

// Use platform-agnostic attribute name (works for both Android and iOS)
attributes["app.state"] = platformProvider.appState

Test (OtelFieldsPerEventTest.kt:32 and 49):

attributes["android.app.state"] shouldBe "foreground"
attributes["android.app.state"] shouldBe "background"

The test should use "app.state" to match the implementation.

See:


Issue 3: Missing removeCallbacks in ANR detector stop()

File: OneSignalSDK/onesignal/core/src/main/java/com/onesignal/debug/internal/crash/OtelAnrDetector.kt:148

The stop() method sets mainThreadRunnable = null but doesn't remove pending callbacks from the handler. If stop() is called while a callback is pending in the handler's message queue, it will still execute after monitoring has stopped.

Before setting mainThreadRunnable = null, you should call:

mainHandler.removeCallbacks(mainThreadRunnable)

This ensures no callbacks execute after the ANR detector has been stopped.

See:

logger.info("$TAG: Stopping ANR detection...")
// Interrupt the watchdog thread to stop it
watchdogThread?.interrupt()
watchdogThread = null
watchdogRunnable = null
mainThreadRunnable = null
logger.info("$TAG: ✅ ANR detection stopped")
}
private fun reportAnr(unresponsiveDurationMs: Long) {
try {
logger.info("$TAG: Checking if ANR is OneSignal-related (unresponsive for ${unresponsiveDurationMs}ms)")


Issue 4: API compatibility issue - crashes on API 21-23

File: OneSignalSDK/onesignal/core/src/main/java/com/onesignal/core/internal/time/impl/Time.kt:13

The processUptimeMillis property uses android.os.Process.getStartUptimeMillis() which requires API 24, but the SDK's minSdkVersion is 21. This will crash on devices running API 21-23 with a NoSuchMethodError.

The @RequiresApi annotation is only a compile-time lint hint - it doesn't provide runtime protection. You need an explicit version check:

override val processUptimeMillis: Long
    get() = if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.N) {
        SystemClock.uptimeMillis() - android.os.Process.getStartUptimeMillis()
    } else {
        // Fallback for API < 24
        // Consider using an alternative approach or returning -1 to indicate unavailable
        SystemClock.uptimeMillis()
    }

See:

get() = System.currentTimeMillis()
override val processUptimeMillis: Long
@RequiresApi(Build.VERSION_CODES.N)
get() = SystemClock.uptimeMillis() - android.os.Process.getStartUptimeMillis()
}

@abdulraqeeb33
Copy link
Contributor Author

@jkasten2 lets try to add units tests for the newly added files so as to cover at least 80% of the code. This is a new feature that we are launching after we have added the code coverage check. Lets set a standard here.

logger.info("OtelCrashHandler: Uncaught exception detected - ${throwable.javaClass.simpleName}: ${throwable.message}")

// Check if this is an ANR exception (though standalone ANR detector already handles ANRs)
// This would only catch ANRs if they're thrown as exceptions, which is rare
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdulraqeeb33 I am not familiar with the cases where an ANR throws an exception. Can you provides details or a link to a page explaining this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants