Describe the bug
CometFromUnixTime currently reports a single support level of Incompatible(None) and folds two unrelated limitations into one getIncompatibleReasons() string:
override def getIncompatibleReasons(): Seq[String] = Seq(
"Only supports the default datetime format pattern `yyyy-MM-dd HH:mm:ss`." +
" DataFusion's valid timestamp range differs from Spark" +
" (https://github.com/apache/datafusion/issues/16594)")
override def getSupportLevel(expr: FromUnixTime): SupportLevel = Incompatible(None)
(spark/src/main/scala/org/apache/comet/serde/unixtime.scala)
These are two different things:
- DataFusion's valid timestamp range differs from Spark. This is a genuine incompatibility. It applies even to the default format pattern, where Comet executes natively but can produce different results outside DataFusion's supported range.
Incompatible is the correct classification for this.
- Only the default datetime pattern
yyyy-MM-dd HH:mm:ss is supported. A non-default pattern cannot be executed at all. convert falls back to Spark via withFallbackReason. This is an unsupported limitation, not an incompatibility.
Reporting the unsupported format pattern as an incompatible reason is the misclassification noted in the third bullet of #4074. There is no getUnsupportedReasons() to surface the format limitation properly, so the generated compatibility docs describe a fall-back-only case as an incompatibility.
This is a classification and documentation accuracy problem, not a correctness bug. Runtime behavior is already safe. By default (spark.comet.expr.FromUnixTime.allowIncompatible=false) the whole expression falls back to Spark, and with that flag enabled the default format runs natively while non-default formats fall back.
Steps to reproduce
Look at CometFromUnixTime in spark/src/main/scala/org/apache/comet/serde/unixtime.scala and the generated from_unixtime entry on the datetime compatibility page. The format limitation is rendered as an incompatibility bullet rather than as an unsupported limitation.
Expected behavior
getSupportLevel receives the concrete FromUnixTime, so it can distinguish the two cases. For example:
override def getSupportLevel(expr: FromUnixTime): SupportLevel =
if (expr.format != Literal(TimestampFormatter.defaultPattern)) {
Unsupported(Some("Only the default datetime pattern `yyyy-MM-dd HH:mm:ss` is supported"))
} else {
Incompatible(None) // DataFusion's valid timestamp range differs from Spark
}
Then split the reason strings. Keep the DataFusion timestamp-range difference in getIncompatibleReasons(), and add a getUnsupportedReasons() for the non-default-format limitation.
One thing to confirm during implementation: check how GenerateDocs invokes getSupportLevel. If it calls it without a concrete FromUnixTime instance, a format-dependent getSupportLevel needs to degrade sensibly for the docs path.
Additional context
Follow-up to the third item in #4074. The CometSum item from that issue is addressed in #4111. The CometStringRepeat item has been fixed separately.
Describe the bug
CometFromUnixTimecurrently reports a single support level ofIncompatible(None)and folds two unrelated limitations into onegetIncompatibleReasons()string:(
spark/src/main/scala/org/apache/comet/serde/unixtime.scala)These are two different things:
Incompatibleis the correct classification for this.yyyy-MM-dd HH:mm:ssis supported. A non-default pattern cannot be executed at all.convertfalls back to Spark viawithFallbackReason. This is an unsupported limitation, not an incompatibility.Reporting the unsupported format pattern as an incompatible reason is the misclassification noted in the third bullet of #4074. There is no
getUnsupportedReasons()to surface the format limitation properly, so the generated compatibility docs describe a fall-back-only case as an incompatibility.This is a classification and documentation accuracy problem, not a correctness bug. Runtime behavior is already safe. By default (
spark.comet.expr.FromUnixTime.allowIncompatible=false) the whole expression falls back to Spark, and with that flag enabled the default format runs natively while non-default formats fall back.Steps to reproduce
Look at
CometFromUnixTimeinspark/src/main/scala/org/apache/comet/serde/unixtime.scalaand the generatedfrom_unixtimeentry on the datetime compatibility page. The format limitation is rendered as an incompatibility bullet rather than as an unsupported limitation.Expected behavior
getSupportLevelreceives the concreteFromUnixTime, so it can distinguish the two cases. For example:Then split the reason strings. Keep the DataFusion timestamp-range difference in
getIncompatibleReasons(), and add agetUnsupportedReasons()for the non-default-format limitation.One thing to confirm during implementation: check how
GenerateDocsinvokesgetSupportLevel. If it calls it without a concreteFromUnixTimeinstance, a format-dependentgetSupportLevelneeds to degrade sensibly for the docs path.Additional context
Follow-up to the third item in #4074. The
CometSumitem from that issue is addressed in #4111. TheCometStringRepeatitem has been fixed separately.