Skip to content

replace the exec spawning implementation; add compat zygote instance which uses scudo instead of hardened_malloc#364

Open
muhomorr wants to merge 35 commits into
GrapheneOS:16-qpr2from
muhomorr:16-qpr2_06-12_base
Open

replace the exec spawning implementation; add compat zygote instance which uses scudo instead of hardened_malloc#364
muhomorr wants to merge 35 commits into
GrapheneOS:16-qpr2from
muhomorr:16-qpr2_06-12_base

Conversation

@muhomorr

@muhomorr muhomorr commented Jun 12, 2026

Copy link
Copy Markdown
Member

New app processes are created by forking zygote, then immediately exec()-ing a new zygote and
replaying commands that were sent to the original zygote in the new zygote. Commands are passed to
the new zygote through a memfd.

Exec spawning can be disabled on a per-app basis.

App processes that are spawned through secondary zygotes (WebView processes and service processes
which use an app zygote) are instead spawned through the primary zygote. Boot-time WebView zygote
startup is disabled to reduce memory usage. The WebView zygote is started on-demand to support apps
that are opted out of exec spawning.

Part of a changelist:
GrapheneOS/platform_art#37
GrapheneOS/platform_packages_apps_AppCompatConfig#15
GrapheneOS/platform_packages_apps_Settings#432
GrapheneOS/platform_packages_modules_ConfigInfrastructure#8
GrapheneOS/platform_system_core#40
GrapheneOS/platform_system_sepolicy#82

GrapheneOS-Archive/platform_cts#3
GrapheneOS/platform_manifest#96
GrapheneOS/script#112

Includes #358
Obsoletes #359

Closes: GrapheneOS/os-issue-tracker#413
Closes: GrapheneOS/os-issue-tracker#611
Closes: GrapheneOS/os-issue-tracker#2170
Closes: GrapheneOS/os-issue-tracker#2531
Closes: GrapheneOS/os-issue-tracker#2886
Closes: GrapheneOS/os-issue-tracker#6349
Closes: GrapheneOS/os-issue-tracker#7877
Closes: GrapheneOS/os-issue-tracker#7899

muhomorr and others added 19 commits June 12, 2026 17:13
…ess init

Run the onBind() hook at an earlier point by removing its dependency on the app context object.
This is required by the next commit.
Use AppBindArgs to pass systemIdmapPaths to the app instead of performing an extra Binder IPC call
to ActivityManagerService during app process launch.
For background, see https://grapheneos.org/usage#exec-spawning

New app processes are created by forking zygote, then immediately exec()-ing a new zygote and
replaying commands that were sent to the original zygote in the new zygote. Commands are passed to
the new zygote through a memfd.

Exec spawning can be disabled on a per-app basis.

App processes that are spawned through secondary zygotes (WebView processes and service processes
which use an app zygote) are instead spawned through the primary zygote. Boot-time WebView zygote
startup is disabled to reduce memory usage. The WebView zygote is started on-demand to support apps
that are opted out of exec spawning.

Test: atest GosCompatSecureSpawnTests
This zygote instance is used when both exec spawning and hardened_malloc are disabled.
…s.Global

ConnChecksSetting.get() is used by Vanadium. Accessing it worked without the UnsupportedAppUsage
annotation due to broken hidden API access enforcement with the previous exec spawning
implementation.
Remove leftover useFifoUi change.
Comment thread core/java/android/content/pm/GosPackageStateFlag.java
Comment on lines +1070 to +1074
if (type == ZygoteType.Compat) {
// compat zygote is started on-demand, it might not be running when bootCompleted()
// is dispatched to other zygotes
bootCompleted(Build.SUPPORTED_64_BIT_ABIS[0], ZygoteSelectionMode.PreferCompatZygote);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be called before bootCompleted() is dispatched?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since the core system processes don't use the compat zygote. The bootCompleted() callback is used by ART JIT compiler for deferred boot-profile-related tasks which aren't relevant on GrapheneOS.

Comment on lines +6093 to +6100
if (!r.definingPackageName.equals(r.appInfo.packageName)) {
new SystemErrorNotification("exec spawning error", "mismatch between definingPackageName (" + r.definingPackageName + ") and appInfo package name (" + r.appInfo.packageName + ") for service " + r.name).show(mAm.mContext);
}
if (!AswUseExecSpawning.I.get(mAm.mContext, r.userId, r.appInfo, GosPackageState.get(r.definingPackageName, r.userId))) {
// app zygotes are pointless when exec spawning is used
hostingRecord = HostingRecord.byAppZygote_(r.instanceName, r.definingPackageName,
r.definingUid, r.serviceInfo.processName);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With secure app spawning enabled by default, atest CtsExternalServiceTestCases:android.externalservice.cts.ExternalServiceTest has test failures:

[5/13] android.externalservice.cts.ExternalServiceTest#testBindExternalServiceWithZygote: FAILED (529ms)

STACKTRACE:
junit.framework.AssertionFailedError: Expected <-1> should not be equal to actual <-1>
	at junit.framework.Assert.fail(Assert.java:50)
	at junit.framework.Assert.assertTrue(Assert.java:20)
	at junit.framework.Assert.assertFalse(Assert.java:34)
	at android.externalservice.cts.ExternalServiceTest.assertNotEquals(ExternalServiceTest.java:481)
	at android.externalservice.cts.ExternalServiceTest.testBindExternalServiceWithZygote(ExternalServiceTest.java:315)

[6/13] android.externalservice.cts.ExternalServiceTest#testBindExternalServiceWithZygoteWithRunningOwn: FAILED (1.105s)

STACKTRACE:
junit.framework.AssertionFailedError: Expected <-1> should not be equal to actual <-1>
	at junit.framework.Assert.fail(Assert.java:50)
	at junit.framework.Assert.assertTrue(Assert.java:20)
	at junit.framework.Assert.assertFalse(Assert.java:34)
	at android.externalservice.cts.ExternalServiceTest.assertNotEquals(ExternalServiceTest.java:481)
	at android.externalservice.cts.ExternalServiceTest.testBindExternalServiceWithZygoteWithRunningOwn(ExternalServiceTest.java:380)

When secure app spawning is disabled, those 13 tests pass but still has these error notifications:

E SystemErrorNotification: exec spawning error, title: exec spawning error, message: mismatch between definingPackageName (android.externalservice.service) and appInfo package name (android.externalservice.cts) for service ComponentInfo{android.externalservice.cts/android.externalservice.service.ExternalServiceWithZygote}

https://github.com/GrapheneOS/platform_frameworks_base/blob/2026060600/services/core/java/com/android/server/am/ProcessList.java#L2098-L2101 might explain this mismatch

@inthewaves inthewaves Jun 13, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we handle apps that specify a ZygotePreload (android:zygotePreloadName) class with android:useAppZygote="true" services in application manifest, e.g.

https://cs.android.com/android/platform/superproject/+/android-latest-release:cts/tests/tests/externalservice/service/AndroidManifest.xml

    <application android:label="External Service Host"
                 android:zygotePreloadName="android.externalservice.service.ZygotePreload">
                 
        ...
        
        <service android:name=".ExternalServiceWithZygote"
                 android:isolatedProcess="true"
                 android:externalService="true"
                 android:exported="true"
                 android:useAppZygote="true"
                 android:allowSharedIsolatedProcess="true"/>

Some other CTS tests also rely on android:zygotePreloadName such as CtsSeccompHostTestCases and CtsAppServiceTestCases, so those tests are currently failing with secure app spawning on as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App zygote preloading wasn't done in exec-spawned children of app zygotes in the previous implementation. The purpose of preloading is to create memory regions that are shared among zygote children, which is fundamentally incompatible with exec spawning. AFAIK Chromium-based browsers are the only users of the app zygote functionality. I'm not aware of any reports of app compatibility issues that are caused by ZygotePreload being ignored.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 837ec57 and aa8c224

Also updated CTS tests: GrapheneOS-Archive/platform_cts#3

Comment thread core/java/android/os/ZygoteProcess.java
Comment thread core/java/android/os/ZygoteProcess.java
Comment on lines +1079 to +1101
private void startCompatZygote() throws IOException {
SystemProperties.set("sys.start_compat_zygote", "1");
boolean started = false;
LocalSocketAddress zygoteSocketAddress = mZygoteSocketAddresses[ZygoteType.Compat.ordinal()];

try (var zygoteSocket = new LocalSocket()) {
for (int i = 0; i < 2000; ++i) {
try {
zygoteSocket.connect(zygoteSocketAddress);
started = true;
break;
} catch (IOException e) {
if ((i % 20) == 0) {
Log.d(LOG_TAG, "waiting for compat zygote to start");
}
SystemClock.sleep(10);
}
}
}
if (!started) {
throw new RuntimeException("timed out while waiting for compat zygote");
}
}

@inthewaves inthewaves Jun 13, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startCompatZygote can do a blocking wait of up to 20s while holding mLock, as this is called from attemptConnectionToZygote which is guarded by mLock. The lock might not be needed (mZygoteSocketAddresses looks like it's already initialized in constructor with immutable LocalSocketAddresses) and could be avoided by just reacquiring lock after and checking if mZygoteStates was already initialized after the call

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

App process launches are guarded by the global ActivityManagerService lock.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid this and other lazy-init-related issues by starting the compat zygote at boot, but it uses ~200 MiB of RAM.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upstream ZygoteProcess.waitForConnectionToZygote() method has a timeout of 60 seconds and holds the same locks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments: f5ac3d7

@inthewaves

Copy link
Copy Markdown
Member

atest CtsHiddenApiBlocklistCurrentApiTestCases CtsHiddenApiBlocklistTestApiTestCases passing with Settings > Security & privacy > Exploit protection > Secure app spawning set to both on and then retested with it off

Tesco app (com.tesco.grocery.view) no longer crashes a few seconds after launch with secure spawning disabled (it does still run into memory issues that hardened_malloc detects, so users should probably still turn on the broad exploit compatibility mode anyway)

muhomorr added 6 commits June 13, 2026 05:11
Simplify and speed up handling of parent file descriptors in the child process by using the
close_range(CLOSE_RANGE_CLOEXEC) + exec() pattern.

This change also fixes a potential issue of a Java daemon thread opening a file descriptor between
GetOpenFdsIgnoring() and fork() which would have caused file descriptor table check to fail in the
child process.
muhomorr added 5 commits June 13, 2026 07:33
strerror() is not guaranteed to be async-safe. Use "%#m" instead, which outputs the name (not
description) of errno value, e.g. EPERM.
Add handling of app zygote preloading.
Add missing handling for external services which are spawned from an app zygote.
Comment thread core/jni/com_android_internal_os_Zygote.cpp Outdated
}
}

pid_t pid = fork();

@inthewaves inthewaves Jun 13, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remark: bionic fork() does extra function calls and hooks: https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/bionic/fork.cpp, https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/bionic/pthread_atfork.cpp#L139, https://github.com/GrapheneOS/platform_bionic/blob/16-qpr2/libc/include/unistd.h#L93-L98

This might be unnecessary if child intended to only mark fds CLOEXEC and immediately exec (vs the normal zygote spawning path staying in the process). Child looks async-signal-safe anyway for _Fork

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +3086 to +3092
if (disable_hardened_malloc) {
if (setenv("DISABLE_HARDENED_MALLOC", "1", 1) != 0) {
ALOGE("setenv failed: %s", strerror(errno));
close(cmd_fd);
return -1;
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parent process is multithreaded but this is technically modifying shared global state for parent process. bionic setenv doesn't seem thread-safe. Probably could snapshot an envp, though execv() fallback in child is also just using global environ so that'd also need updating

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muhomorr muhomorr force-pushed the 16-qpr2_06-12_base branch 2 times, most recently from 2926e43 to 43a668e Compare June 13, 2026 15:14
Ensure that no new file descriptors are racily opened by signal handlers in the child process.
@muhomorr muhomorr force-pushed the 16-qpr2_06-12_base branch from 43a668e to 3063d1a Compare June 13, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment