[SPARK-38101] Make metadata fetch failure repeat the task, not restart stages by EnricoMi · Pull Request #55347 · apache/spark

EnricoMi · 2026-04-15T11:54:43Z

What changes were proposed in this pull request?

Changes handling of MetadataFetchFailedException from immediate stage retry to immediate task retry to recover from task metadata failures. This is repeated at most spark.task.maxFailures (default: 4) before stage and job is aborted.

This fix handles INTERNAL_ERROR_BROADCAST that occur while fetching task metadata more gracefully:

failing to retrieve map statuses via broadcast variable already throws MetadataFetchFailedException: [SPARK-38101] execuors fail fetching map statuses with INTERNAL_ERROR_BROADCAST #54723
failing to retrieve task binary via broadcast variable now also throws MetadataFetchFailedException.
such MetadataFetchFailedException now retries the task, not the full stage.

Adds config option spark.task.maxFailures.countsMetadataFetchFailures (default: true). If false, task re-attempts due to MetadataFetchFailedException do count towards spark.task.maxFailures, and therefore cannot cause stage failures. Tasks with MetadataFetchFailedException are repeated until they succeed or fail for a different reason.

Fixes #54723.

Why are the changes needed?

Currently, MetadataFetchFailedExceptions are handled like FetchFailedExceptions, causing an immediate stage retry of the affected stage and its mapping stage (the stage that produced the shuffle input). After spark.stage.maxConsecutiveAttempts (default: 4) attempts, the respective job is aborted. This is expensive while an immediate retry of the task would fix the problem.

The MetadataFetchFailedException is thrown when

map statuses cannot be retrieved from driver: [SPARK-38101] execuors fail fetching map statuses with INTERNAL_ERROR_BROADCAST #54723
task binary cannot be retrieved from driver

output location are not known:

spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala

Lines 1775 to 1783 in 2999728

    
           def validateStatus(status: ShuffleOutputStatus, shuffleId: Int, partition: Int) : Unit = { 
        
             if (status == null) { 
        
               // scalastyle:off line.size.limit 
        
               val errorMessage = log"Missing an output location for shuffle ${MDC(SHUFFLE_ID, shuffleId)} partition ${MDC(PARTITION_ID, partition)}" 
        
               // scalastyle:on 
        
               logError(errorMessage) 
        
               throw new MetadataFetchFailedException(shuffleId, partition, errorMessage.message) 
        
             } 
        
           }

While 1. and 2. can be recovered by simply retrying the failed task, 3. indicates an issue with the mapping stage (the stage that produced the shuffle input). Therefore, 2. should through a FetchFailedException to repeat the mapping stage as it currently does.

Does this PR introduce any user-facing change?

Adds config option spark.task.maxFailures.countsMetadataFetchFailures.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No.

…dataFetchFailedException

EnricoMi added 2 commits April 15, 2026 13:52

Make MetadataFetchFailureException not restart the stage

10f4d29

Make deseralization error in ResultTask and ShuffleMapTask throw Meta…

87b78f2

…dataFetchFailedException

EnricoMi mentioned this pull request Apr 15, 2026

[SPARK-38101] execuors fail fetching map statuses with INTERNAL_ERROR_BROADCAST #54723

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-38101] Make metadata fetch failure repeat the task, not restart stages#55347

[SPARK-38101] Make metadata fetch failure repeat the task, not restart stages#55347
EnricoMi wants to merge 2 commits intoapache:masterfrom
G-Research-Forks:metadata-fetch-failure

EnricoMi commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	def validateStatus(status: ShuffleOutputStatus, shuffleId: Int, partition: Int) : Unit = {
	if (status == null) {
	// scalastyle:off line.size.limit
	val errorMessage = log"Missing an output location for shuffle ${MDC(SHUFFLE_ID, shuffleId)} partition ${MDC(PARTITION_ID, partition)}"
	// scalastyle:on
	logError(errorMessage)
	throw new MetadataFetchFailedException(shuffleId, partition, errorMessage.message)
	}
	}

Conversation

EnricoMi commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EnricoMi commented Apr 15, 2026 •

edited

Loading