Skip to content

pods get oom-killed when multiple heavy spark applications start at the same time #666

@maxgruber19

Description

@maxgruber19

Affected Stackable version

25.3.0

Current and expected behavior

when multiple spark applications get started at the same time, all of them having 300 executors, a lot of pvcs are submitted which need to be satisfied by the secret-operator. in basic configuration the secret operator container has 128mb of memory which seems to be not enough because in that case all the pods get oom-killed

Possible solution

Increase the memory limit of the secret-operator container from 128mb to 1gb

Additional context

@soenkeliebau like mentioned today, I'll try to get further details. I think this is reproducible by submitting a 3000 exec application and keeping an eye on the secret-operator ds

Environment

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions