Handle exceptions in data reload loop to prevent silent data staleness by bzantium · Pull Request #7087 · tensorflow/tensorboard

bzantium · 2026-04-04T15:49:05Z

Summary

The _reload function in LocalDataIngester has no exception handling, so any transient error (e.g., network timeout when reading from GCS) kills the Reloader thread permanently. TensorBoard then silently serves stale data with no way to recover short of a restart.

This PR wraps the reload loop body in try/except Exception so that:

Transient errors are logged with full traceback via logger.error
The reload loop continues to the next cycle instead of crashing
TensorBoard automatically recovers once the transient issue resolves

Changes

data_ingester.py: Wrap reload loop body in try/except, log errors with exc_info=True
data_ingester_test.py: Add tests verifying the reload loop survives exceptions from both AddRunsFromDirectory and Reload

Test plan

Added unit tests for exception handling in reload loop
Verified manually with GCS logdir + simulated network interruption

The Reloader thread/process in LocalDataIngester crashes on any unhandled exception (e.g. transient network errors when reading from remote filesystems like GCS). Once the reload loop dies, TensorBoard continues serving stale data with no indication to the user. Wrap the reload loop body in a try/except so that transient errors are logged and the next reload cycle proceeds normally.

bzantium force-pushed the fix/reload-error-handling branch from f814a01 to 2c78489 Compare April 4, 2026 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle exceptions in data reload loop to prevent silent data staleness#7087

Handle exceptions in data reload loop to prevent silent data staleness#7087
bzantium wants to merge 1 commit into
tensorflow:masterfrom
bzantium:fix/reload-error-handling

bzantium commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bzantium commented Apr 4, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant