Skip to content

Simplify thread clean / exit / terminate logic#26857

Open
kleisauke wants to merge 17 commits intoemscripten-core:mainfrom
kleisauke:simplify-thread-exit
Open

Simplify thread clean / exit / terminate logic#26857
kleisauke wants to merge 17 commits intoemscripten-core:mainfrom
kleisauke:simplify-thread-exit

Conversation

@kleisauke
Copy link
Copy Markdown
Collaborator

Mostly inspired by musl.

Split out from #13007.

kleisauke added 2 commits May 5, 2026 10:52
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (2) test expectation files were updated by
running the tests with `--rebaseline`:

```
codesize/test_codesize_minimal_pthreads.json: 26370 => 26360 [-10 bytes / -0.04%]
codesize/test_codesize_minimal_pthreads_memgrowth.json: 26774 => 26764 [-10 bytes / -0.04%]

Average change: -0.04% (-0.04% - -0.04%)
```

int _emscripten_thread_is_valid(pthread_t thread) {
return thread->self == thread;
return thread->tid;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more likely to return false positives if called on, e.g. an uninitialized region of memory?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would align with musl, which uses the ->tid field to determine whether a thread is valid/available.

/* After the kernel thread exits, its tid may be reused. Clear it
* to prevent inadvertent use and inform functions that would use
* it that it's no longer available. At this point the killlock
* may be released, since functions that use it will consistently
* see the thread as having exited. Release it now so that no
* remaining locks (except thread list) are held if we end up
* resetting need_locks below. */
self->tid = 0;

https://github.com/search?q=repo%3Aemscripten-core%2Fmusl+-%3Etid+ESRCH&type=code

In any case, passing an uninitialized pthread_t to functions relying on _emscripten_thread_is_valid() is already undefined behavior.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, isn't the whole point of _emscripten_thread_is_valid to detect invalid/uninitialized pointers to pthreads?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it only makes sense to catch threads that have already terminated/joined. This also aligns with glibc (according to https://stackoverflow.com/a/78311551).

Commit 16c33d2 inlines _emscripten_thread_is_valid() to check for t->tid instead.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see... but does isn't the memory pointed to by already terminated/joined memory that has been free'd? I.e. it contains basically undefined contents doesn;'t it?

Copy link
Copy Markdown
Collaborator Author

@kleisauke kleisauke May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I understand what you mean. In this code:

// Free the entire thread block (called map_base because
// musl normally allocates this using mmap). This region
// includes the pthread structure itself.
unsigned char* block = t->map_base;
dbg("_emscripten_thread_free_data thread=%p map_base=%p", t, block);
#ifndef NDEBUG
// To aid in debugging, set the entire region to zero.
memset(block, 0, sizeof(struct pthread));
#endif
emscripten_builtin_free(block);

->tid can end up in an undefined (dangling pointer) state because the entire pthread structure is free()'d. IIUC, this was already an issue before as well, since accessing ->self is also undefined behavior.

In any case, calling pthread_detach() or pthread_join() on a thread that has already been successfully joined is already UB, so it's unclear how much value there is in trying to make this logic "smarter".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I think I just wanted to make it smart enough to satisfy the posixtest suite. In practice its undefined behaviour I think.

Comment thread system/lib/pthread/pthread_create.c Outdated
unsigned char* block = t->map_base;
dbg("_emscripten_thread_free_data thread=%p map_base=%p", t, block);
// To aid in debugging, set the entire region to zero.
memset(block, 0, sizeof(struct pthread));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not useful for debugging to reset this memory to something ? Maybe 0xfe ? Maybe only in debug builds?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit e206e45 does this only for debug builds.


ret = pthread_join(self, NULL);
printf("pthread_join -> %s\n", strerror(ret));
assert(ret == EINVAL);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why we were testing this behaviour? Is this something that glibc does? Or maybe that the posixtests expect?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was only needed for this test, according to this TODO comment:

// TODO: The detached check here is just to satisfy the
// `other.test_{proxy,main}_pthread_join_detach` tests.
if (t->detach_state != DT_DETACHED && __pthread_self() == t) return EDEADLK;

AFAIK, the posixtests didn't require this. I'm not entirely sure why we were testing this, I'll investigate.

@kleisauke kleisauke changed the title Simplify thread clean / exit / terminate logic. NFC Simplify thread clean / exit / terminate logic May 6, 2026
// `other.test_{proxy,main}_pthread_join_detach` tests.
if (t->detach_state != DT_DETACHED && __pthread_self() == t) return EDEADLK;
// XXX Emscripten: Add check for invalid (already joined) thread.
if (!t->tid) return ESRCH;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the explicitness of having a _emscripten_thread_is_valid function.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-added with commit b19bce9.

// The pthread struct has a field that points to itself.
new->self = new;

// Thread ID, this becomes zero when the thread is no longer available.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. is no longer valid? Or .. is destroyed?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was based on:

/* After the kernel thread exits, its tid may be reused. Clear it
* to prevent inadvertent use and inform functions that would use
* it that it's no longer available. At this point the killlock
* may be released, since functions that use it will consistently
* see the thread as having exited. Release it now so that no
* remaining locks (except thread list) are held if we end up
* resetting need_locks below. */
self->tid = 0;

#endif
// The tid may be reused. Clear it to prevent inadvertent use
// and inform functions that would use it that it's no longer
// available.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/available/valid/?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this comment was synced with musl.

dbg("_emscripten_thread_free_data thread=%p map_base=%p", t, block);
#ifndef NDEBUG
// To aid in debugging, set the entire region to zero.
memset(block, 0, sizeof(struct pthread));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually use 0xdd (or some other recognizable value) rather than zero?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think that would break the ->tid check in _emscripten_thread_is_valid().

if (!t->tid) return ESRCH;

// Note that we don't use pthread_join here, to avoid
// returning EDEADLK when attempting to detach itself.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a self-check here could we share more of the code with musl?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted this with commit b19bce9.

kleisauke added 10 commits May 7, 2026 10:18
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (7) test expectation files were updated by
running the tests with `--rebaseline`:

```
codesize/test_codesize_cxx_ctors1.json: 151887 => 151887 [+0 bytes / +0.00%]
codesize/test_codesize_cxx_ctors2.json: 151292 => 151290 [-2 bytes / -0.00%]
codesize/test_codesize_cxx_except_wasm.json: 167018 => 167018 [+0 bytes / +0.00%]
codesize/test_codesize_cxx_lto.json: 120793 => 120793 [+0 bytes / +0.00%]
codesize/test_codesize_cxx_noexcept.json: 153917 => 153917 [+0 bytes / +0.00%]
codesize/test_codesize_cxx_wasmfs.json: 179773 => 179773 [+0 bytes / +0.00%]
codesize/test_codesize_hello_dylink.json: 43922 => 43922 [+0 bytes / +0.00%]

Average change: -0.00% (-0.00% - +0.00%)
```
This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (2) test expectation files were updated by
running the tests with `--rebaseline`:

```
codesize/test_codesize_minimal_pthreads.json: 26409 => 26427 [+18 bytes / +0.07%]
codesize/test_codesize_minimal_pthreads_memgrowth.json: 26819 => 26837 [+18 bytes / +0.07%]

Average change: +0.07% (+0.07% - +0.07%)
```
if (!_emscripten_thread_is_valid(t))
return ESRCH;
if (!_emscripten_thread_is_valid(t)) return ESRCH;
#endif
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you may as well just revert this file now? Unless you think the formatting is important here?


int _emscripten_thread_is_valid(pthread_t thread) {
return thread->self == thread;
return thread->tid;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why would want to change this. The old code will catch more cases of invalid thread data won't it?

// The tid may be reused. Clear it to prevent inadvertent use
// and inform functions that would use it that it's no longer
// available.
t->tid = 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t->self = NULL too and revert _emscripten_thread_is_valid?


if (!--libc.threads_minus_1) libc.need_locks = 0;
__do_orphaned_stdio_locks();
__dl_thread_cleanup();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like added these two function, and moving the emscripten_is_main_runtime_thread up are the two substantive remaining change.

Can you explain why each of those is good thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants