replace the exec spawning implementation; add compat zygote instance which uses scudo instead of hardened_malloc#364
Conversation
This reverts commit 97f2e33.
…d GC" This reverts commit 4a2b6c9.
… VA space" This reverts commit 3e721e6.
… crashes" This reverts commit 4d61cd0.
This reverts commit 6716125
…ess init Run the onBind() hook at an earlier point by removing its dependency on the app context object. This is required by the next commit.
Use AppBindArgs to pass systemIdmapPaths to the app instead of performing an extra Binder IPC call to ActivityManagerService during app process launch.
For background, see https://grapheneos.org/usage#exec-spawning New app processes are created by forking zygote, then immediately exec()-ing a new zygote and replaying commands that were sent to the original zygote in the new zygote. Commands are passed to the new zygote through a memfd. Exec spawning can be disabled on a per-app basis. App processes that are spawned through secondary zygotes (WebView processes and service processes which use an app zygote) are instead spawned through the primary zygote. Boot-time WebView zygote startup is disabled to reduce memory usage. The WebView zygote is started on-demand to support apps that are opted out of exec spawning. Test: atest GosCompatSecureSpawnTests
This zygote instance is used when both exec spawning and hardened_malloc are disabled.
…s.Global ConnChecksSetting.get() is used by Vanadium. Accessing it worked without the UnsupportedAppUsage annotation due to broken hidden API access enforcement with the previous exec spawning implementation.
Remove leftover useFifoUi change.
| if (type == ZygoteType.Compat) { | ||
| // compat zygote is started on-demand, it might not be running when bootCompleted() | ||
| // is dispatched to other zygotes | ||
| bootCompleted(Build.SUPPORTED_64_BIT_ABIS[0], ZygoteSelectionMode.PreferCompatZygote); | ||
| } |
There was a problem hiding this comment.
Could this be called before bootCompleted() is dispatched?
There was a problem hiding this comment.
No, since the core system processes don't use the compat zygote. The bootCompleted() callback is used by ART JIT compiler for deferred boot-profile-related tasks which aren't relevant on GrapheneOS.
| if (!r.definingPackageName.equals(r.appInfo.packageName)) { | ||
| new SystemErrorNotification("exec spawning error", "mismatch between definingPackageName (" + r.definingPackageName + ") and appInfo package name (" + r.appInfo.packageName + ") for service " + r.name).show(mAm.mContext); | ||
| } | ||
| if (!AswUseExecSpawning.I.get(mAm.mContext, r.userId, r.appInfo, GosPackageState.get(r.definingPackageName, r.userId))) { | ||
| // app zygotes are pointless when exec spawning is used | ||
| hostingRecord = HostingRecord.byAppZygote_(r.instanceName, r.definingPackageName, | ||
| r.definingUid, r.serviceInfo.processName); | ||
| } |
There was a problem hiding this comment.
With secure app spawning enabled by default, atest CtsExternalServiceTestCases:android.externalservice.cts.ExternalServiceTest has test failures:
[5/13] android.externalservice.cts.ExternalServiceTest#testBindExternalServiceWithZygote: FAILED (529ms)
STACKTRACE:
junit.framework.AssertionFailedError: Expected <-1> should not be equal to actual <-1>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at android.externalservice.cts.ExternalServiceTest.assertNotEquals(ExternalServiceTest.java:481)
at android.externalservice.cts.ExternalServiceTest.testBindExternalServiceWithZygote(ExternalServiceTest.java:315)
[6/13] android.externalservice.cts.ExternalServiceTest#testBindExternalServiceWithZygoteWithRunningOwn: FAILED (1.105s)
STACKTRACE:
junit.framework.AssertionFailedError: Expected <-1> should not be equal to actual <-1>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at android.externalservice.cts.ExternalServiceTest.assertNotEquals(ExternalServiceTest.java:481)
at android.externalservice.cts.ExternalServiceTest.testBindExternalServiceWithZygoteWithRunningOwn(ExternalServiceTest.java:380)
When secure app spawning is disabled, those 13 tests pass but still has these error notifications:
E SystemErrorNotification: exec spawning error, title: exec spawning error, message: mismatch between definingPackageName (android.externalservice.service) and appInfo package name (android.externalservice.cts) for service ComponentInfo{android.externalservice.cts/android.externalservice.service.ExternalServiceWithZygote}
https://github.com/GrapheneOS/platform_frameworks_base/blob/2026060600/services/core/java/com/android/server/am/ProcessList.java#L2098-L2101 might explain this mismatch
There was a problem hiding this comment.
How should we handle apps that specify a ZygotePreload (android:zygotePreloadName) class with android:useAppZygote="true" services in application manifest, e.g.
<application android:label="External Service Host"
android:zygotePreloadName="android.externalservice.service.ZygotePreload">
...
<service android:name=".ExternalServiceWithZygote"
android:isolatedProcess="true"
android:externalService="true"
android:exported="true"
android:useAppZygote="true"
android:allowSharedIsolatedProcess="true"/>
Some other CTS tests also rely on android:zygotePreloadName such as CtsSeccompHostTestCases and CtsAppServiceTestCases, so those tests are currently failing with secure app spawning on as well
There was a problem hiding this comment.
App zygote preloading wasn't done in exec-spawned children of app zygotes in the previous implementation. The purpose of preloading is to create memory regions that are shared among zygote children, which is fundamentally incompatible with exec spawning. AFAIK Chromium-based browsers are the only users of the app zygote functionality. I'm not aware of any reports of app compatibility issues that are caused by ZygotePreload being ignored.
There was a problem hiding this comment.
Also updated CTS tests: GrapheneOS-Archive/platform_cts#3
| private void startCompatZygote() throws IOException { | ||
| SystemProperties.set("sys.start_compat_zygote", "1"); | ||
| boolean started = false; | ||
| LocalSocketAddress zygoteSocketAddress = mZygoteSocketAddresses[ZygoteType.Compat.ordinal()]; | ||
|
|
||
| try (var zygoteSocket = new LocalSocket()) { | ||
| for (int i = 0; i < 2000; ++i) { | ||
| try { | ||
| zygoteSocket.connect(zygoteSocketAddress); | ||
| started = true; | ||
| break; | ||
| } catch (IOException e) { | ||
| if ((i % 20) == 0) { | ||
| Log.d(LOG_TAG, "waiting for compat zygote to start"); | ||
| } | ||
| SystemClock.sleep(10); | ||
| } | ||
| } | ||
| } | ||
| if (!started) { | ||
| throw new RuntimeException("timed out while waiting for compat zygote"); | ||
| } | ||
| } |
There was a problem hiding this comment.
startCompatZygote can do a blocking wait of up to 20s while holding mLock, as this is called from attemptConnectionToZygote which is guarded by mLock. The lock might not be needed (mZygoteSocketAddresses looks like it's already initialized in constructor with immutable LocalSocketAddresses) and could be avoided by just reacquiring lock after and checking if mZygoteStates was already initialized after the call
There was a problem hiding this comment.
App process launches are guarded by the global ActivityManagerService lock.
There was a problem hiding this comment.
Can avoid this and other lazy-init-related issues by starting the compat zygote at boot, but it uses ~200 MiB of RAM.
There was a problem hiding this comment.
The upstream ZygoteProcess.waitForConnectionToZygote() method has a timeout of 60 seconds and holds the same locks.
|
Tesco app (com.tesco.grocery.view) no longer crashes a few seconds after launch with secure spawning disabled (it does still run into memory issues that hardened_malloc detects, so users should probably still turn on the broad exploit compatibility mode anyway) |
Simplify and speed up handling of parent file descriptors in the child process by using the close_range(CLOSE_RANGE_CLOEXEC) + exec() pattern. This change also fixes a potential issue of a Java daemon thread opening a file descriptor between GetOpenFdsIgnoring() and fork() which would have caused file descriptor table check to fail in the child process.
strerror() is not guaranteed to be async-safe. Use "%#m" instead, which outputs the name (not description) of errno value, e.g. EPERM.
Add handling of app zygote preloading.
Add missing handling for external services which are spawned from an app zygote.
| } | ||
| } | ||
|
|
||
| pid_t pid = fork(); |
There was a problem hiding this comment.
Remark: bionic fork() does extra function calls and hooks: https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/bionic/fork.cpp, https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/bionic/pthread_atfork.cpp#L139, https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/include/unistd.h#L93-L98
This might be unnecessary if child intended to only mark fds CLOEXEC and immediately exec (vs the normal zygote spawning path staying in the process). Child looks async-signal-safe anyway for _Fork
| if (disable_hardened_malloc) { | ||
| if (setenv("DISABLE_HARDENED_MALLOC", "1", 1) != 0) { | ||
| ALOGE("setenv failed: %s", strerror(errno)); | ||
| close(cmd_fd); | ||
| return -1; | ||
| } | ||
| } |
There was a problem hiding this comment.
Parent process is multithreaded but this is technically modifying shared global state for parent process. bionic setenv doesn't seem thread-safe. Probably could snapshot an envp, though execv() fallback in child is also just using global environ so that'd also need updating
Replace non-thread-safe usages of setenv() and unsetenv().
2926e43 to
43a668e
Compare
Ensure that no new file descriptors are racily opened by signal handlers in the child process.
43a668e to
3063d1a
Compare
New app processes are created by forking zygote, then immediately exec()-ing a new zygote and
replaying commands that were sent to the original zygote in the new zygote. Commands are passed to
the new zygote through a memfd.
Exec spawning can be disabled on a per-app basis.
App processes that are spawned through secondary zygotes (WebView processes and service processes
which use an app zygote) are instead spawned through the primary zygote. Boot-time WebView zygote
startup is disabled to reduce memory usage. The WebView zygote is started on-demand to support apps
that are opted out of exec spawning.
Part of a changelist:
GrapheneOS/platform_art#37
GrapheneOS/platform_packages_apps_AppCompatConfig#15
GrapheneOS/platform_packages_apps_Settings#432
GrapheneOS/platform_packages_modules_ConfigInfrastructure#8
GrapheneOS/platform_system_core#40
GrapheneOS/platform_system_sepolicy#82
GrapheneOS-Archive/platform_cts#3
GrapheneOS/platform_manifest#96
GrapheneOS/script#112
Includes #358
Obsoletes #359
Closes: GrapheneOS/os-issue-tracker#413
Closes: GrapheneOS/os-issue-tracker#611
Closes: GrapheneOS/os-issue-tracker#2170
Closes: GrapheneOS/os-issue-tracker#2531
Closes: GrapheneOS/os-issue-tracker#2886
Closes: GrapheneOS/os-issue-tracker#6349
Closes: GrapheneOS/os-issue-tracker#7877
Closes: GrapheneOS/os-issue-tracker#7899