Skip to content

[SPARK-56489][SQL] Add CURRENT_PATH() builtin expression and keywords#55354

Closed
srielau wants to merge 2 commits intoapache:masterfrom
srielau:SPARK-56489-path-syntax
Closed

[SPARK-56489][SQL] Add CURRENT_PATH() builtin expression and keywords#55354
srielau wants to merge 2 commits intoapache:masterfrom
srielau:SPARK-56489-path-syntax

Conversation

@srielau
Copy link
Copy Markdown
Contributor

@srielau srielau commented Apr 15, 2026

What changes were proposed in this pull request?

Add the CURRENT_PATH() builtin function that returns the current SQL resolution search path as a comma-separated string of qualified schema names (e.g. system.builtin,system.session,spark_catalog.default).
Also register the grammar keywords needed by the upcoming SQL PATH feature: CURRENT_PATH, CURRENT_SCHEMA, CURRENT_DATABASE, DEFAULT_PATH, SYSTEM_PATH, PATH. CURRENT_PATH and CURRENT_SCHEMA are reserved in ANSI mode per SQL:2023; the others are non-reserved.
In non-ANSI mode, CURRENT_PATH, CURRENT_DATABASE, and CURRENT_SCHEMA always resolve to their respective expressions (not UnresolvedAttribute), matching the behavior of CURRENT_CATALOG.
This is part 1 of the SQL PATH feature (SPARK-54810), split out to keep the review scope manageable.

Why are the changes needed?

CURRENT_PATH() is a SQL-standard function (SQL:2023) that exposes the resolution search path to users. The grammar keywords are prerequisites for the SET PATH command and path-based resolution coming in follow-up PRs.

Does this PR introduce any user-facing change?

Yes. New builtin function CURRENT_PATH() and new reserved/non-reserved keywords.

How was this patch tested?

  • Added test in FunctionQualificationSuite verifying current_path() returns a non-empty qualified path string.
  • Updated keyword golden files (keywords.sql.out, keywords-enforced.sql.out, nonansi/keywords.sql.out).
  • Updated sql-expression-schema.md and SparkConnectDatabaseMetaDataSuite keyword assertions.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.6

…ated grammar keywords

Add the CURRENT_PATH() builtin function that returns the current SQL
resolution search path as a comma-separated string of qualified schema
names (e.g. "system.builtin,system.session,spark_catalog.default").

Also register the grammar keywords needed by the upcoming SQL PATH
feature (CURRENT_PATH, CURRENT_SCHEMA, CURRENT_DATABASE, DEFAULT_PATH,
SYSTEM_PATH, PATH). CURRENT_PATH and CURRENT_SCHEMA are reserved in
ANSI mode per SQL:2023; the others are non-reserved.

In non-ANSI mode, CURRENT_PATH, CURRENT_DATABASE, and CURRENT_SCHEMA
always resolve to their respective expressions (not UnresolvedAttribute),
matching the behavior of CURRENT_CATALOG.

This is part 1 of the SQL PATH feature (SPARK-54810), split out to keep
the review scope manageable.
@srielau srielau force-pushed the SPARK-56489-path-syntax branch from 4f162ca to 1e1b838 Compare April 15, 2026 21:52
Copy link
Copy Markdown
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

|DEFAULT|non-reserved|non-reserved|non-reserved|
|DEFINED|non-reserved|non-reserved|non-reserved|
|DEFINER|non-reserved|non-reserved|non-reserved|
|DEFAULT_PATH|non-reserved|non-reserved|not a keyword|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an initial look at the PR, some of the new keywords including this one are defined in the parser as keywords but not used elsewhere yet. Is this intended? I see CURRENT_PATH used in here so far.

Also, just curious why we are implementing CURRENT_PATH() using built-in keywords instead of regular SQL functions in the FunctionRegistry?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teh parens are options. It's teh class of "currentLike". And yes these keywords will get relevant when I add teh SET PATh statement which will follow nect.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these keywords (DEFAULT_PATH, PATH, SYSTEM_PATH) are forward-declared for the upcoming SET PATH statement (next PR in the series). CURRENT_PATH is used in this PR. As for why CURRENT_PATH uses the currentLike keyword pattern instead of regular FunctionRegistry: it follows SQL:2023 where CURRENT_PATH is a special value like CURRENT_USER — parentheses are optional and it is reserved in ANSI mode.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

@srielau srielau requested a review from dtenedor April 16, 2026 01:15
@srielau
Copy link
Copy Markdown
Contributor Author

srielau commented Apr 16, 2026

/spark-dev:review

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Prior state and problem. Spark's currentLike grammar rule handles SQL expressions that can appear without parentheses: CURRENT_DATE, CURRENT_TIMESTAMP, CURRENT_TIME, CURRENT_USER, USER, SESSION_USER. Other similar functions — current_database(), current_catalog() — require parentheses because they're resolved through FunctionRegistry as regular function calls, not as parser keywords. SQL:2023 defines CURRENT_PATH as a standard function returning the resolution search path, which Spark doesn't yet support.

Design approach. The PR extends the existing currentLike pattern: adds CurrentPath as an Unevaluable leaf expression (matching CurrentCatalog, CurrentDatabase), registers it in FunctionRegistry, adds it to the currentLike grammar rule, and replaces it with a literal string in the ReplaceCurrentLike optimizer rule. It also promotes CURRENT_DATABASE and CURRENT_SCHEMA from function-only identifiers to parser keywords in the currentLike rule, and adds forward-declared keywords (DEFAULT_PATH, PATH, SYSTEM_PATH) for the upcoming SET PATH command.

Key design decisions.

  1. In non-ANSI mode, CURRENT_PATH, CURRENT_DATABASE, and CURRENT_SCHEMA always resolve to their builtin expressions, while CURRENT_DATE/CURRENT_TIMESTAMP/CURRENT_TIME still produce UnresolvedAttribute. This asymmetry means CURRENT_DATABASE and CURRENT_SCHEMA can no longer be used as bare column references in non-ANSI mode — a breaking behavioral change.
  2. CURRENT_PATH and CURRENT_SCHEMA are ANSI-reserved per SQL:2023; CURRENT_DATABASE is non-reserved in all modes.
  3. The path string is formatted as system.builtin,system.session,spark_catalog.default (unquoted dot-separated, comma-delimited) with ordering determined by sessionFunctionResolutionOrder.

Implementation sketch. Four components are changed: (1) CurrentPath expression class (misc.scala) — the value node; (2) FunctionRegistry — enables current_path() with parens; (3) SqlBaseParser.g4 + AstBuilder.visitCurrentLike — enables CURRENT_PATH without parens plus CURRENT_DATABASE/CURRENT_SCHEMA; (4) ReplaceCurrentLike (finishAnalysis.scala) — replaces the expression with a string literal using SQLConf.resolutionSearchPath. All other changes are golden file / test assertions for the new keywords.


Existing review context. @dtenedor asked why the PR defines keywords that aren't used in grammar rules yet, and why CURRENT_PATH() uses keywords instead of regular function resolution. @srielau replied that parens are optional (it's currentLike-class) and the unused keywords will become relevant with the upcoming SET PATH statement.


Missing DataFrame API function. current_catalog(), current_database(), current_schema(), and current_user() all have DataFrame API entries in functions.scala. current_path() does not, so users of the DataFrame/Dataset API cannot call it programmatically. Is this planned for a follow-up?

Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala Outdated
Comment thread sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala Outdated
@srielau
Copy link
Copy Markdown
Contributor Author

srielau commented Apr 16, 2026

Thanks for the thorough review @cloud-fan! Addressed all comments in the upcoming push:

  • DataFrame API: Added current_path() to functions.scala alongside current_catalog(), current_database(), etc.
  • Breaking change: Removed CURRENT_DATABASE/CURRENT_SCHEMA from the currentLike grammar rule and visitCurrentLike to avoid the non-ANSI mode breaking change. They remain available through FunctionRegistry (with parens) and are still registered as keywords for the upcoming SET PATH command.
  • Identifier quoting: Changed to .quoted from CatalogV2Implicits (backtick-quotes identifiers with special characters), matching resolutionSearchPathForError.
  • Comment fix: Applied the suggested wording removing the forward reference.
  • Tests: Expanded coverage — without-parens syntax in ANSI mode, path reflecting USE <database>.

- Remove CURRENT_DATABASE/CURRENT_SCHEMA from currentLike grammar rule
  to avoid breaking change in non-ANSI mode (they remain available via
  FunctionRegistry with parentheses)
- Use .quoted for proper identifier quoting in CURRENT_PATH() output
- Fix CurrentPath doc comment: remove forward reference to PATH feature
- Add current_path() to DataFrame API (functions.scala)
- Expand test coverage: without-parens ANSI syntax, USE DATABASE context
Copy link
Copy Markdown
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also adding the DataFrame API support!

|DEFAULT|non-reserved|non-reserved|non-reserved|
|DEFINED|non-reserved|non-reserved|non-reserved|
|DEFINER|non-reserved|non-reserved|non-reserved|
|DEFAULT_PATH|non-reserved|non-reserved|not a keyword|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

@dtenedor dtenedor closed this in 96e0f8f Apr 16, 2026
@dtenedor
Copy link
Copy Markdown
Contributor

LGTM, merging to master.

@dtenedor
Copy link
Copy Markdown
Contributor

dtenedor commented Apr 16, 2026

@srielau Update: It looks like this test is now broken in CI:
image

We should fix it ASAP or revert this PR to unblock CI.

@gaogaotiantian
Copy link
Copy Markdown
Contributor

The CI for this PR failed https://github.com/srielau/spark/actions/runs/24513039822/job/71706695434 and it's pretty obvious that it's related

Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/sql/tests/test_functions.py", line 87, in test_function_parity
    self.assertEqual(
AssertionError: Items in the second set but not the first:
'current_path' : Missing functions in pyspark not as expected

We should be extra careful when we merge PRs that have a failed CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants