Skip to content

CometFromUnixTime reports unsupported format patterns as Incompatible instead of Unsupported #4575

@andygrove

Description

@andygrove

Describe the bug

CometFromUnixTime currently reports a single support level of Incompatible(None) and folds two unrelated limitations into one getIncompatibleReasons() string:

override def getIncompatibleReasons(): Seq[String] = Seq(
  "Only supports the default datetime format pattern `yyyy-MM-dd HH:mm:ss`." +
    " DataFusion's valid timestamp range differs from Spark" +
    " (https://github.com/apache/datafusion/issues/16594)")

override def getSupportLevel(expr: FromUnixTime): SupportLevel = Incompatible(None)

(spark/src/main/scala/org/apache/comet/serde/unixtime.scala)

These are two different things:

  1. DataFusion's valid timestamp range differs from Spark. This is a genuine incompatibility. It applies even to the default format pattern, where Comet executes natively but can produce different results outside DataFusion's supported range. Incompatible is the correct classification for this.
  2. Only the default datetime pattern yyyy-MM-dd HH:mm:ss is supported. A non-default pattern cannot be executed at all. convert falls back to Spark via withFallbackReason. This is an unsupported limitation, not an incompatibility.

Reporting the unsupported format pattern as an incompatible reason is the misclassification noted in the third bullet of #4074. There is no getUnsupportedReasons() to surface the format limitation properly, so the generated compatibility docs describe a fall-back-only case as an incompatibility.

This is a classification and documentation accuracy problem, not a correctness bug. Runtime behavior is already safe. By default (spark.comet.expr.FromUnixTime.allowIncompatible=false) the whole expression falls back to Spark, and with that flag enabled the default format runs natively while non-default formats fall back.

Steps to reproduce

Look at CometFromUnixTime in spark/src/main/scala/org/apache/comet/serde/unixtime.scala and the generated from_unixtime entry on the datetime compatibility page. The format limitation is rendered as an incompatibility bullet rather than as an unsupported limitation.

Expected behavior

getSupportLevel receives the concrete FromUnixTime, so it can distinguish the two cases. For example:

override def getSupportLevel(expr: FromUnixTime): SupportLevel =
  if (expr.format != Literal(TimestampFormatter.defaultPattern)) {
    Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported"))
  } else {
    Incompatible(None) // DataFusion's valid timestamp range differs from Spark
  }

Then split the reason strings. Keep the DataFusion timestamp-range difference in getIncompatibleReasons(), and add a getUnsupportedReasons() for the non-default-format limitation.

One thing to confirm during implementation: check how GenerateDocs invokes getSupportLevel. If it calls it without a concrete FromUnixTime instance, a format-dependent getSupportLevel needs to degrade sensibly for the docs path.

Additional context

Follow-up to the third item in #4074. The CometSum item from that issue is addressed in #4111. The CometStringRepeat item has been fixed separately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdf compatibilityAny issue related to DF & Spark compatibilityrequires-triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions