Currently when iwf server restarts, the state api will fail and wait for next attempt by the startToClose timeout + backoff retry interval.
If the startToClose timeout is very large (e.g. >10 mins), it will wait for a long time. To avoid the unnecessary waiting, Temporal/Cadence has a concept of "activity heartbeat" to tell Temporal/Cadence server that the worker is still alive. If no heartbeat is received within heartbeat timeout, Temporal/Cadence will reschedule next activity immediately based on backoff retry policy.
Note: this is also because of the fact that Temporal/Cadence activity task/worker is "polling based". iWF task/worker is "pushing" so it doesn't have such issues.
Need to add a side thread(gorotine) in the activity code:
go (){
sleep(10 mins)
activity.heartbeat()
}
^^ is simplified code. We also need to cancel the goroutine when the activity is finished (so need to use golang channel and timer), to avoid goroutine leaks.
Maybe make 10mins configurable.