Auto-generate session names for serverless SSH connect by anton-107 · Pull Request #4701 · databricks/cli

anton-107 · 2026-03-11T08:26:56Z

Summary

Remove the requirement for --name in serverless ssh connect; --accelerator alone is now sufficient
Auto-generate human-readable session names (e.g. databricks-gpu-a10-20260310-a1b2c3)
Track sessions in ~/.databricks/ssh-tunnel-sessions.json and offer reconnection on subsequent runs
Clean up stale sessions (local keys, SSH config, secret scopes, workspace content) automatically
Sessions expire after 24 hours
Fix known_hosts key mismatches for serverless by using StrictHostKeyChecking=no + UserKnownHostsFile=/dev/null

Stacked on #4697.

Test plan

Unit tests for session store (load/save/add/remove/find/expiry)
Unit tests for name generation (format, uniqueness, regex validity)
Updated validation and proxy command tests
Manual test: databricks ssh connect --accelerator GPU_1xA10 creates new session
Manual test: subsequent run detects existing session and prompts to reconnect
Manual test: stale sessions are cleaned up when server is no longer running
Manual test: known_hosts errors no longer occur on serverless reconnects

🤖 Generated with Claude Code

eng-dev-ecosystem-bot · 2026-03-11T08:38:02Z

Commit: a177bea

Run: 22953709117

	Env	🔄flaky	💚RECOVERED	🙈SKIP	✅pass	🙈skip	Time
💚	aws linux		8	7	268	781	5:31
💚	aws windows		8	7	270	779	5:46
🔄	aws-ucws linux	2	7	7	364	696	6:57
🔄	aws-ucws windows	2	7	7	366	694	6:56
💚	azure linux		2	9	271	779	4:51
💚	azure windows		2	9	273	777	5:07
🔄	azure-ucws linux	2	1	9	369	692	6:40
🔄	azure-ucws windows	2	1	9	371	690	6:12
💚	gcp linux		2	9	267	782	5:26
💚	gcp windows		2	9	269	780	5:39

16 interesting tests: 7 SKIP, 6 RECOVERED, 3 flaky

	Test Name	aws linux	aws windows	aws-ucws linux	aws-ucws windows	azure linux	azure windows	azure-ucws linux	azure-ucws windows	gcp linux	gcp windows
🔄	TestAccept	💚R	💚R	💚R	🔄f	💚R	💚R	💚R	🔄f	💚R	💚R
🙈	TestAccept/bundle/resources/permissions	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions	💚R	💚R	💚R	💚R	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct	💚R	💚R	💚R	💚R
💚	TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform	💚R	💚R	💚R	💚R
🙈	TestAccept/bundle/resources/postgres_branches/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/update_protected	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_branches/without_branch_id	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/postgres_endpoints/recreate	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🙈	TestAccept/bundle/resources/synced_database_tables/basic	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S	🙈S
🔄	TestAccept/ssh/connect-serverless-gpu	🙈s	🙈s	🔄f	🔄f	🙈s	🙈s	🔄f	🔄f	🙈s	🙈s
🔄	TestAccept/ssh/connection	💚R	💚R	🔄f	💚R	💚R	💚R	🔄f	💚R	💚R	💚R

Top 20 slowest tests (at least 2 minutes):

duration	env	testname
3:55	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:46	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:43	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:42	aws-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39	gcp windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:36	gcp linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:23	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:19	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:06	aws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:00	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:56	aws-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:49	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:42	aws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:41	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:19	azure linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:18	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:13	azure-ucws linux	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:10	azure windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:08	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:08	azure-ucws windows	TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform

…upport Remove the requirement for --name in serverless SSH connect. Sessions are now auto-generated with human-readable names (e.g. databricks-gpu-a10-20260310-a1b2c3), tracked in ~/.databricks/ssh-tunnel-sessions.json, and offered for reconnection on subsequent runs. Stale sessions are cleaned up automatically. Sessions expire after 24 hours. Also fixes known_hosts key mismatches for serverless by disabling strict host key checking (identity verified via Databricks auth). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonfaltum

[Agent Swarm Review] Verdict: Not ready yet

1 Critical
3 Major
1 Gap
2 Nit
2 Suggestion

The feature design (auto-generate session names, track sessions, offer reconnection) is good UX. However, there are several correctness issues that need addressing before this is safe to ship:

Critical: Any probe failure triggers irreversible remote resource cleanup, even for transient network errors
Major: No identity tracking means profile switches can cause cross-identity destructive cleanup
Major: Expired sessions accumulate forever (remote resources leaked)
Major: CLI version changes break reconnection within the 24h window

See inline comments for details.

simonfaltum · 2026-03-12T09:38:33Z

experimental/ssh/internal/client/client.go

+			alive = append(alive, s)
+		} else {
+			cleanupStaleSession(ctx, client, s, version)
+		}


[Agent Swarm Review] [Critical]

Any probe error is treated as proof that the session is stale.

resolveServerlessSession() calls cleanupStaleSession() for every getServerMetadata() failure. That probe can fail for transient auth, network, workspace API, or version-mismatch reasons. In those cases the CLI will delete local SSH config, remove the session from state, and best-effort delete secret scopes and workspace content for a session that may still be alive.

Both reviewers flagged this. Isaac confirmed Critical in cross-review due to irreversible blast radius.

Suggestion: Only run destructive cleanup on definitive stale signals (e.g., 404/not-found). For transient errors, keep the session and surface a warning.

simonfaltum · 2026-03-12T09:38:33Z

experimental/ssh/internal/sessions/sessions.go

+	Accelerator   string    `json:"accelerator"`
+	WorkspaceHost string    `json:"workspace_host"`
+	CreatedAt     time.Time `json:"created_at"`
+	ClusterID     string    `json:"cluster_id,omitempty"`


[Agent Swarm Review] [Major]

Session matching ignores Databricks identity. No username/profile is stored, so switching profiles on the same workspace causes cross-identity session mixup. Combined with the probe-failure cleanup issue, probing another identity's session fails and triggers cleanup of their remote resources.

Suggestion: Persist an identity key (e.g., workspace username) in the Session struct and include it in FindMatching.

simonfaltum · 2026-03-12T09:38:33Z

experimental/ssh/internal/sessions/sessions.go

+			result = append(result, s)
+		}
+	}
+	return result, nil


[Agent Swarm Review] [Major]

Expired sessions accumulate in state file indefinitely. FindMatching filters out expired sessions from results but never removes them from the store on disk or triggers cleanup of their remote resources (secret scopes, workspace content). Over time the state file grows unboundedly and remote resources are leaked.

Suggestion: Prune expired sessions during Load or FindMatching, saving the cleaned store back.

simonfaltum · 2026-03-12T09:38:33Z

experimental/ssh/internal/sessions/namegen.go

+
+	date := time.Now().Format("20060102")
+	b := make([]byte, 3)
+	_, _ = rand.Read(b)


[Agent Swarm Review] [Nit]

_, _ = rand.Read(b) discards the error. If crypto/rand.Read fails, b is all zeros, producing predictable session names. Consider checking the error or adding a comment explaining why it's intentionally ignored.

ilia-db · 2026-03-13T14:07:49Z

experimental/ssh/internal/client/client.go

+	hostKeyChecking := "StrictHostKeyChecking=accept-new"
+	if opts.IsServerlessMode() {
+		hostKeyChecking = "StrictHostKeyChecking=no"
+	}
+
 	sshArgs := []string{
 		"-l", userName,
 		"-i", privateKeyPath,
 		"-o", "IdentitiesOnly=yes",
-		"-o", "StrictHostKeyChecking=accept-new",
+		"-o", hostKeyChecking,
 		"-o", "ConnectTimeout=360",
 		"-o", "ProxyCommand=" + proxyCommand,
 	}
-	if opts.UserKnownHostsFile != "" {
+	if opts.IsServerlessMode() {
+		sshArgs = append(sshArgs, "-o", "UserKnownHostsFile=/dev/null")
+	} else if opts.UserKnownHostsFile != "" {


We've had such options before, but the security didn't like it.

With auto host name generation we should not have that many host conflicts, right?

Before you would get them if you re-used the same name to connect to a different workspace. Re-using the same name for the same workspace is fine, as our server will get the server ssh key from the secrets scope that's tied to the name (and with the same name the scope will be the same). But across different workspaces we will get a problem, since server keys will be different.

Can we also add workspace id (real one, or based on the host url) to the generated session name?

ilia-db · 2026-03-13T14:08:41Z

experimental/ssh/internal/client/client.go

+// resolveServerlessSession handles auto-generation and reconnection for serverless sessions.
+// It checks local state for existing sessions matching the workspace and accelerator,
+// probes them to see if they're still alive, and prompts the user to reconnect or create new.
+func resolveServerlessSession(ctx context.Context, client *databricks.WorkspaceClient, opts *ClientOptions) error {


Nit, but this can be a method on the ClientOptions struct, might be easier to understand that we are mutating the options here then

ilia-db · 2026-03-13T14:32:37Z

experimental/ssh/internal/client/client.go

+func resolveServerlessSession(ctx context.Context, client *databricks.WorkspaceClient, opts *ClientOptions) error {
+	version := build.GetInfo().Version
+
+	matching, err := sessions.FindMatching(ctx, client.Config.Host, opts.Accelerator)


I feel like majority of this logic can be moved to the sessions package (up until line 788). getServerMetadata can be passed as an argument. Then it will be easier to test.

Same for cleanupStaleSession. Or will there be circular dependencies it we do that? (since that function has a lot of them)

ilia-db · 2026-03-13T14:36:07Z

experimental/ssh/internal/sessions/namegen.go

+
+// GenerateSessionName creates a human-readable session name from the accelerator type.
+// Format: <prefix>-<random_hex>, e.g. "gpu-a10-f3a2b1c0".
+func GenerateSessionName(accelerator string) string {


As mentioned above, will it help with known_hosts conflicts if we add a workspace id/host here?

anton-107 requested review from andrewnester, denik, pietern, shreyas-goenka and simonfaltum as code owners March 11, 2026 08:26

anton-107 temporarily deployed to test-trigger-is March 11, 2026 08:27 — with GitHub Actions Inactive

anton-107 requested a review from ilia-db March 11, 2026 08:27

anton-107 mentioned this pull request Mar 11, 2026

Improve IDE settings prompt formatting and default to yes #4705

Open

3 tasks

anton-107 force-pushed the antonnek/auto-session-names branch from d678d8d to 857404f Compare March 11, 2026 11:53

anton-107 temporarily deployed to test-trigger-is March 11, 2026 11:53 — with GitHub Actions Inactive

fix lint

a177bea

anton-107 temporarily deployed to test-trigger-is March 11, 2026 12:57 — with GitHub Actions Inactive

anton-107 added the ai-assisted label Mar 11, 2026

simonfaltum requested changes Mar 12, 2026

View reviewed changes

ilia-db reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-generate session names for serverless SSH connect#4701

Auto-generate session names for serverless SSH connect#4701
anton-107 wants to merge 2 commits intossh-connect-elapsed-timefrom
antonnek/auto-session-names

anton-107 commented Mar 11, 2026 •

edited

Loading

Uh oh!

eng-dev-ecosystem-bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

simonfaltum left a comment

Uh oh!

simonfaltum Mar 12, 2026

Uh oh!

simonfaltum Mar 12, 2026

Uh oh!

simonfaltum Mar 12, 2026

Uh oh!

simonfaltum Mar 12, 2026

Uh oh!

ilia-db Mar 13, 2026 •

edited

Loading

Uh oh!

ilia-db Mar 13, 2026

Uh oh!

ilia-db Mar 13, 2026

Uh oh!

ilia-db Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

anton-107 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

eng-dev-ecosystem-bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonfaltum left a comment

Choose a reason for hiding this comment

Uh oh!

simonfaltum Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

simonfaltum Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

ilia-db Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilia-db Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

ilia-db Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

ilia-db Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anton-107 commented Mar 11, 2026 •

edited

Loading

eng-dev-ecosystem-bot commented Mar 11, 2026 •

edited

Loading

ilia-db Mar 13, 2026 •

edited

Loading