Simplify thread clean / exit / terminate logic#26857
Simplify thread clean / exit / terminate logic#26857kleisauke wants to merge 17 commits intoemscripten-core:mainfrom
Conversation
Mostly inspired by musl.
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26370 => 26360 [-10 bytes / -0.04%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26774 => 26764 [-10 bytes / -0.04%] Average change: -0.04% (-0.04% - -0.04%) ```
|
|
||
| int _emscripten_thread_is_valid(pthread_t thread) { | ||
| return thread->self == thread; | ||
| return thread->tid; |
There was a problem hiding this comment.
This seems more likely to return false positives if called on, e.g. an uninitialized region of memory?
There was a problem hiding this comment.
This would align with musl, which uses the ->tid field to determine whether a thread is valid/available.
emscripten/system/lib/libc/musl/src/thread/pthread_create.c
Lines 110 to 117 in 4ceac4e
https://github.com/search?q=repo%3Aemscripten-core%2Fmusl+-%3Etid+ESRCH&type=code
In any case, passing an uninitialized pthread_t to functions relying on _emscripten_thread_is_valid() is already undefined behavior.
There was a problem hiding this comment.
Hmm, isn't the whole point of _emscripten_thread_is_valid to detect invalid/uninitialized pointers to pthreads?
There was a problem hiding this comment.
I think it only makes sense to catch threads that have already terminated/joined. This also aligns with glibc (according to https://stackoverflow.com/a/78311551).
Commit 16c33d2 inlines _emscripten_thread_is_valid() to check for t->tid instead.
There was a problem hiding this comment.
I see... but does isn't the memory pointed to by already terminated/joined memory that has been free'd? I.e. it contains basically undefined contents doesn;'t it?
There was a problem hiding this comment.
Oh, I think I understand what you mean. In this code:
emscripten/system/lib/pthread/pthread_create.c
Lines 289 to 298 in 16c33d2
->tid can end up in an undefined (dangling pointer) state because the entire pthread structure is free()'d. IIUC, this was already an issue before as well, since accessing ->self is also undefined behavior.
In any case, calling pthread_detach() or pthread_join() on a thread that has already been successfully joined is already UB, so it's unclear how much value there is in trying to make this logic "smarter".
There was a problem hiding this comment.
Indeed, I think I just wanted to make it smart enough to satisfy the posixtest suite. In practice its undefined behaviour I think.
| unsigned char* block = t->map_base; | ||
| dbg("_emscripten_thread_free_data thread=%p map_base=%p", t, block); | ||
| // To aid in debugging, set the entire region to zero. | ||
| memset(block, 0, sizeof(struct pthread)); |
There was a problem hiding this comment.
Is it not useful for debugging to reset this memory to something ? Maybe 0xfe ? Maybe only in debug builds?
There was a problem hiding this comment.
Commit e206e45 does this only for debug builds.
|
|
||
| ret = pthread_join(self, NULL); | ||
| printf("pthread_join -> %s\n", strerror(ret)); | ||
| assert(ret == EINVAL); |
There was a problem hiding this comment.
Do you know why we were testing this behaviour? Is this something that glibc does? Or maybe that the posixtests expect?
There was a problem hiding this comment.
It was only needed for this test, according to this TODO comment:
emscripten/system/lib/libc/musl/src/thread/pthread_join.c
Lines 18 to 20 in 4ceac4e
AFAIK, the posixtests didn't require this. I'm not entirely sure why we were testing this, I'll investigate.
| // `other.test_{proxy,main}_pthread_join_detach` tests. | ||
| if (t->detach_state != DT_DETACHED && __pthread_self() == t) return EDEADLK; | ||
| // XXX Emscripten: Add check for invalid (already joined) thread. | ||
| if (!t->tid) return ESRCH; |
There was a problem hiding this comment.
I kind of like the explicitness of having a _emscripten_thread_is_valid function.
| // The pthread struct has a field that points to itself. | ||
| new->self = new; | ||
|
|
||
| // Thread ID, this becomes zero when the thread is no longer available. |
There was a problem hiding this comment.
.. is no longer valid? Or .. is destroyed?
There was a problem hiding this comment.
This comment was based on:
emscripten/system/lib/libc/musl/src/thread/pthread_create.c
Lines 110 to 117 in d137bef
| #endif | ||
| // The tid may be reused. Clear it to prevent inadvertent use | ||
| // and inform functions that would use it that it's no longer | ||
| // available. |
There was a problem hiding this comment.
Same as above, this comment was synced with musl.
| dbg("_emscripten_thread_free_data thread=%p map_base=%p", t, block); | ||
| #ifndef NDEBUG | ||
| // To aid in debugging, set the entire region to zero. | ||
| memset(block, 0, sizeof(struct pthread)); |
There was a problem hiding this comment.
Should we actually use 0xdd (or some other recognizable value) rather than zero?
There was a problem hiding this comment.
I'm not sure, but I think that would break the ->tid check in _emscripten_thread_is_valid().
| if (!t->tid) return ESRCH; | ||
|
|
||
| // Note that we don't use pthread_join here, to avoid | ||
| // returning EDEADLK when attempting to detach itself. |
There was a problem hiding this comment.
If we had a self-check here could we share more of the code with musl?
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (7) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_cxx_ctors1.json: 151887 => 151887 [+0 bytes / +0.00%] codesize/test_codesize_cxx_ctors2.json: 151292 => 151290 [-2 bytes / -0.00%] codesize/test_codesize_cxx_except_wasm.json: 167018 => 167018 [+0 bytes / +0.00%] codesize/test_codesize_cxx_lto.json: 120793 => 120793 [+0 bytes / +0.00%] codesize/test_codesize_cxx_noexcept.json: 153917 => 153917 [+0 bytes / +0.00%] codesize/test_codesize_cxx_wasmfs.json: 179773 => 179773 [+0 bytes / +0.00%] codesize/test_codesize_hello_dylink.json: 43922 => 43922 [+0 bytes / +0.00%] Average change: -0.00% (-0.00% - +0.00%) ```
This is an automatic change generated by tools/maint/rebaseline_tests.py. The following (2) test expectation files were updated by running the tests with `--rebaseline`: ``` codesize/test_codesize_minimal_pthreads.json: 26409 => 26427 [+18 bytes / +0.07%] codesize/test_codesize_minimal_pthreads_memgrowth.json: 26819 => 26837 [+18 bytes / +0.07%] Average change: +0.07% (+0.07% - +0.07%) ```
This reverts commit a919f40.
| if (!_emscripten_thread_is_valid(t)) | ||
| return ESRCH; | ||
| if (!_emscripten_thread_is_valid(t)) return ESRCH; | ||
| #endif |
There was a problem hiding this comment.
Seems like you may as well just revert this file now? Unless you think the formatting is important here?
|
|
||
| int _emscripten_thread_is_valid(pthread_t thread) { | ||
| return thread->self == thread; | ||
| return thread->tid; |
There was a problem hiding this comment.
I still don't understand why would want to change this. The old code will catch more cases of invalid thread data won't it?
| // The tid may be reused. Clear it to prevent inadvertent use | ||
| // and inform functions that would use it that it's no longer | ||
| // available. | ||
| t->tid = 0; |
There was a problem hiding this comment.
t->self = NULL too and revert _emscripten_thread_is_valid?
|
|
||
| if (!--libc.threads_minus_1) libc.need_locks = 0; | ||
| __do_orphaned_stdio_locks(); | ||
| __dl_thread_cleanup(); |
There was a problem hiding this comment.
It seems like added these two function, and moving the emscripten_is_main_runtime_thread up are the two substantive remaining change.
Can you explain why each of those is good thing?
Mostly inspired by musl.
Split out from #13007.