Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,24 @@ import org.apache.daffodil.core.dsom.ElementBase
import org.apache.daffodil.core.dsom.ModelGroup
import org.apache.daffodil.core.dsom.QuasiElementDeclBase
import org.apache.daffodil.core.dsom.Root
import org.apache.daffodil.core.dsom.SequenceTermBase
import org.apache.daffodil.core.dsom.Term
import org.apache.daffodil.lib.exceptions.Assert
import org.apache.daffodil.lib.schema.annotation.props.gen.AlignmentKind
import org.apache.daffodil.lib.schema.annotation.props.gen.AlignmentUnits
import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind
import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind.Explicit
import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind.Prefixed
import org.apache.daffodil.lib.schema.annotation.props.gen.LengthUnits
import org.apache.daffodil.lib.schema.annotation.props.gen.SeparatorPosition
import org.apache.daffodil.lib.util.Math

case class AlignmentMultipleOf(nBits: Long) {
// To combine AlignmentMultipleOfs, use *
def *(that: AlignmentMultipleOf) = AlignmentMultipleOf(Math.gcd(nBits, that.nBits))
def %(that: AlignmentMultipleOf) = nBits % that.nBits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these changes are making me wonder if we need a def +(that: AlignmentMultipleOf) that has slightly different logic than gcd.

I believe the intention of * was to combine alignments that could all happen at the same point in data (e.g. a choice of elements, optional siblings elements).

But in the new code, we are taking an existing approximate alignment, and potentially adding a new alignment to it, which is slightly different.

For example, say we are currently at 2-byte alignment (AlignmentMultipleof(16)) and we are adding an element that needs to be 1-byte (AlignmentMultipleOf(8)). In this case, we do not need to perform any alignment here because we are already byte align (we are actually 2-byte aligned), and our logic handles that correctly.

But the contentStartAlignment of that new element still really wants to be 2-byte aligned, so it should remain AlignmentMultipleOf(16). But right now, using * it's alignment will be AlignmentMultilpeOf(8) since gcd(16,8) => 8. So we actually have less information about our true alignment, and could miss out on future optimizations.

So it feels like a + operation might want to be something like

def *(that: AlignmentMultipleOf) = if (this.nBits % that.nBits == 0) this else that

The idea is that if our new alignment (that) evenly divides our existing alignment (this), then no alignment is actually needed, and we can keep our more accurate current alignment. Otherwise we use the new alignment.

// To combine AlignmentMultipleOfs and lengths, use +
def +(that: LengthApprox) = AlignmentMultipleOf(Math.gcd(nBits, nBits + that.nBits))
}

Expand Down Expand Up @@ -134,17 +141,17 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
LengthExact(trailingSkipInBits)
}

// FIXME: DAFFODIL-2295
// Does not take into account that in a sequence, what may be prior may be a separator.
// The separator is text in some encoding, might not be the same as this term's encoding, and
// the alignment will be left where that text leaves it.
//
/**
* The priorAlignmentApprox doesn't need to take into account the separator
* alignment/length as that comes after the priorAlignmentApprox, but before
* the leadingSkip.
*/
private lazy val priorAlignmentApprox: AlignmentMultipleOf =
LV(Symbol("priorAlignmentApprox")) {
if (this.isInstanceOf[Root] || this.isInstanceOf[QuasiElementDeclBase]) {
AlignmentMultipleOf(
0
) // root and quasi elements are aligned with anything // TODO: really? Why quasi-elements - they should have implicit alignment ?
) // root and quasi elements are aligned with anything (quasi elements are always 1 aligned)
} else {
val priorSibs = potentialPriorTerms
val arraySelfAlignment =
Expand Down Expand Up @@ -201,7 +208,7 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
// Return 0 here, unordered alignment will be handled by unorderedSequenceSelfAlignment
AlignmentMultipleOf(0)
} else {
ps.endingAlignmentApprox
ps.endingAlignmentWithRightFramingApprox
}
eaa
}
Expand All @@ -228,29 +235,161 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
}
}.value

private lazy val separatorPrefixMTAApprox =
this.optLexicalParent match {
case Some(s: SequenceTermBase) if s.hasSeparator =>
import SeparatorPosition.*
s.separatorPosition match {
case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTA is an alignment thing, so I think these MTA approx functions what to return AligmentMultipleOf instead of LengthApprox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that's right.

An MTA "region" like any region is a part of the data stream that has a length. So it depends are we computing the length of something, computing the alignment before or after it, or asking what it's required alignment is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separatorPrefixMTAApprox val, and other MTAApprox vals are calculating where the MTA region needs to align to, not how big the region is.

This way we can look at approximate end alignment (we should probably call this approximate position) of whatever came before the MTA region and determine if the MTA region is known to already be aligned and so can be excluded.

And then based on the previous alignment and the MTA, we can then calculate where that MTA will have put us. For example, we might know it just put us on a byte boundary, but we might know more and know that it actually put us on a 2-byte boundary, allowing for better optimizations if a later elements needs to be 2-byte aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I think you are correct.

I think some of the complexity in this code could be reduced by rigorously naming the various things to avoid confusion.

Being clear about lengthApprox vs positionApprox vs. alignmentRequirement would go a long ways. We're overusing the term "alignment" here to mean position and requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. The naming could definitely be improved.

Adding more variables might also be helpful. This document has great diagrams of all the different parts that are considered in our alignment

https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/

Some of our current variables kindof mush multiple boxes together which can make it difficult to tease apart what's going on and how well we actually match the document.

It might makes sense to have a variable that calculates the approximate start of every one of the boxes with matching names (e.g. allignFilleApproximateStart, initiatorMTAApproximateStart),. It adds more variables, but might make things more clear.

And that might simplify other logic that tries to statically compile out different grams. For example, the gram guards can become something like:

val needsAlignFillGram = alignFillApproxStart % alignment != 0

val needsInitiatorMTAGram = initiatorMTAApproxStart % initiator.encoding.mandatoryTextAlignment != 0

case Postfix => LengthMultipleOf(0)
}
case _ => LengthMultipleOf(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really wants to be AlignmentApprox(1) (MTA is alignment after all so using length isn't quite right) and then with the new + operator mentioned above it will work as expected. This is because 1 will always evenly divide the existing alignment so the existing alignment will be used. Same with the other MTA things.

In fact, I think AligmentApprox(0) never wants to be used except for the root element, since it's kindof a 1-based thing.

}

private lazy val separatorPostfixMTAApprox =
this.optLexicalParent match {
case Some(s: SequenceTermBase) if s.hasSeparator =>
import SeparatorPosition.*
s.separatorPosition match {
case Prefix | Infix => LengthMultipleOf(0)
case Postfix => LengthMultipleOf(s.knownEncodingAlignmentInBits)
}
case _ => LengthMultipleOf(0)
}

private lazy val separatorLengthApprox = this.optLexicalParent match {
case Some(s: SequenceTermBase) if s.hasSeparator =>
getEncodingLengthApprox(s)
case _ => LengthMultipleOf(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of LengthMultipleOf(0), it might be more clear to make this LengthExact(0). I imigine all the math ends up the same so it probably doesn't really matter, but it feels a bit odd to say something has a length that is a multiple of zero.

}

private lazy val initiatorMTAApprox =
if (this.hasInitiator) {
AlignmentMultipleOf(this.knownEncodingAlignmentInBits)
} else {
AlignmentMultipleOf(0)
}

private lazy val initiatorLengthApprox = if (this.hasInitiator) {
getEncodingLengthApprox(this)
} else {
LengthMultipleOf(0)
}

private lazy val terminatorMTAApprox =
if (this.hasTerminator) {
AlignmentMultipleOf(this.knownEncodingAlignmentInBits)
} else {
AlignmentMultipleOf(0)
}

private lazy val terminatorLengthApprox = if (this.hasTerminator) {
getEncodingLengthApprox(this)
} else {
LengthMultipleOf(0)
}

private def getEncodingLengthApprox(t: Term) = {
if (t.isKnownEncoding) {
val dfdlCharset = t.charsetEv.optConstant.get
val fixedWidth =
if (dfdlCharset.maybeFixedWidth.isDefined) dfdlCharset.maybeFixedWidth.get else 8
LengthMultipleOf(fixedWidth)
} else {
// runtime encoding, which must be a byte-length encoding
LengthMultipleOf(8)
}
}

/**
* prior alignment with leading skip includes the prefix/infix separator MTA and length,
* any leading skips
*/
private lazy val priorAlignmentWithLeadingSkipApprox: AlignmentMultipleOf = {
priorAlignmentApprox + leadingSkipApprox
val priorAlignmentWithSeparatorApprox = priorAlignmentApprox
+ separatorPrefixMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that with the new + operator, this does want to stay a + and not use the old * operator. Since this isn't combining potential alignments, this is taking an existing alignment and adding to it.

+ separatorLengthApprox
val leadingAlignmentApprox = (priorAlignmentWithSeparatorApprox
+ leadingSkipApprox)
leadingAlignmentApprox
}

/**
* The length of the prefix length quasi element.
*
* This is only relevant for quasi elements, or prefixed terms.
*/
private lazy val prefixLengthElementLength = {
this match {
case eb: ElementBase => {
if (eb.isInstanceOf[QuasiElementDeclBase])
LengthExact(
eb.asInstanceOf[QuasiElementDeclBase].elementLengthInBitsEv.constValue.get
)
else if (eb.lengthKind == Prefixed)
LengthExact(
this
.asInstanceOf[ElementBase]
.prefixedLengthElementDecl
.elementLengthInBitsEv
.constValue
.get
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes quasi elements and prefix length elements have constant lengths, but I'm not sure that's always true. For example, I think we allow a prefix length to have a prefix length, in which case one of the prefix lenghts isn't constant. I think prefix lengths is already broken in a number of ways, so maybe this doesn't have to necessarily have be fixed as part of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I updated the code to recursively call prefixLengthElementLength for elements with LK=prefixed to get the length of it's prefix length, though we don't currently support nested prefixed elements in Daffodil as it's a subset error. I left the fix in regardless for when we do implement that. Added test_prefix_alignment_3 test.

else
LengthExact(0)
}
case _ => LengthExact(0)
}
}

/**
* The alignment of the value of the term.
*
* This is only relevant for simple elements.
*/
private lazy val valueMTAApprox = {
this match {
case eb: ElementBase if eb.isSimpleType => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this also only apply if representation is text? For example, if this is a binary number then there is no MTA. I'm not sure if there are other cases where it does/does not apply (e.g. zoned numbers have to be aligned, but maybe that's not MTA?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the term sharing doc does mention text representation in addition to the simpletType and element term conditions

https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/

AlignmentMultipleOf(eb.knownEncodingAlignmentInBits)
}
case _ => AlignmentMultipleOf(0)
}
}

/**
* This alignment is made up of the alignment of the term itself,
* the initiator MTA and length, the prefix length quasi
* element length, and the value MTA (we add it for optimization
* but is extremely difficult to test, as other things such
* as unparsers will provide MTA, even elementSpecifiedLengthApprox
* considers the encoding also).
*/
protected lazy val contentStartAlignment: AlignmentMultipleOf = {
if (priorAlignmentWithLeadingSkipApprox % alignmentApprox == 0) {
// alignment won't be needed, continue using prior alignment as start alignment
priorAlignmentWithLeadingSkipApprox
} else {
// alignment will be needed, it will be forced to be aligned to alignmentApprox
val leftFramingApprox =
alignmentApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this actually does want to take into account the prirorAlignmentWithLeadingSkip, but do so with the new + operator. For example, if the prior alignment was 16 and this.alignmentApprox is 8, we'll keep the 16-bit alignment moving forward, which is the more accurate alignment and still satisfies alignmentApprox.

Note that the + operator has similar logic to what we used to have (i.e. if alignApprox divides priorAlignmentWithLeadingSkipApprox then we keep using the prior).

}
* initiatorMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wants use the new + operator.

+ initiatorLengthApprox
val csa = (leftFramingApprox + prefixLengthElementLength) * valueMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also wants to use the new + operator.

csa
}

/**
* Accounts for the start of the content alignment, element length,
* terminator MTA/Length and trailing skip
*/
protected lazy val endingAlignmentApprox: AlignmentMultipleOf = {
this match {
case eb: ElementBase => {
if (eb.isComplexType && eb.lengthKind == LengthKind.Implicit) {
eb.complexType.group.endingAlignmentApprox + trailingSkipApprox
val res = if (eb.isComplexType && eb.lengthKind == LengthKind.Implicit) {
eb.complexType.group.endingAlignmentApprox
} else {
// simple type or complex type with specified length
contentStartAlignment + (elementSpecifiedLengthApprox + trailingSkipApprox)
contentStartAlignment + elementSpecifiedLengthApprox
}
val cea = res * terminatorMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wants to be the new + operator

+ terminatorLengthApprox
+ trailingSkipApprox
cea
}
case mg: ModelGroup => {
//
Expand All @@ -260,37 +399,53 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
//
val (lastChildren, couldBeLast) = mg.potentialLastChildren
val lastApproxesConsideringChildren: Seq[AlignmentMultipleOf] = lastChildren.map { lc =>
//
// for each possible last child, add its ending alignment
// to our trailing skip to get where it would leave off were
// it the actual last child.
//
// for each possible last child, gather its endingAlignmentApprox
val lceaa = lc.endingAlignmentApprox
val res = lceaa + trailingSkipApprox
res
lceaa
}
val optApproxIfNoChildren =
//
// gather possibilities for this item itself
//
if (couldBeLast)
if (couldBeLast) {
//
// if this model group could be last, then consider
// if none of its content was present.
// We'd just have the contentStart, nothing, and the trailing Skip.
// We'd just have the contentStart
//
Some(contentStartAlignment + trailingSkipApprox)
else
val res = Some(contentStartAlignment)
res
} else
// can't be last, no possibilities to gather.
None
val lastApproxes = lastApproxesConsideringChildren ++ optApproxIfNoChildren
Assert.invariant(lastApproxes.nonEmpty)
val res = lastApproxes.reduce { _ * _ }
// take all the gathered possibilities and add the ending alignment
// to terminator MTA/length and our trailing skip to get where it would leave off were
// each the actual last child.
val lastApproxesFinal = lastApproxes.map {
_ * terminatorMTAApprox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wants to be a +

+ terminatorLengthApprox
+ trailingSkipApprox
}
Assert.invariant(lastApproxesFinal.nonEmpty)
val res = lastApproxesFinal.reduce {
_ * _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right here is the only place we really want to use the * operator. The goal of this is to combine a bunch of different alignemtns that could happen at the same place time due to choice or optional elements, so this finds the the common alignment they all have.

}
res
}
}
}

/**
* The postfix separator MTA/length needs to be added after the trailing skip
*/
protected lazy val endingAlignmentWithRightFramingApprox: AlignmentMultipleOf = {
val res = endingAlignmentApprox
+ separatorPostfixMTAApprox
+ separatorLengthApprox
res
}

protected lazy val elementSpecifiedLengthApprox: LengthApprox = {
this match {
case eb: ElementBase => {
Expand Down Expand Up @@ -321,23 +476,24 @@ trait AlignedMixin extends GrammarMixin { self: Term =>
case LengthKind.Delimited => encodingLengthApprox
case LengthKind.Pattern => encodingLengthApprox
case LengthKind.EndOfParent => LengthMultipleOf(1) // NYI
case LengthKind.Prefixed => LengthMultipleOf(1) // NYI
case LengthKind.Prefixed => {
val prefixElem = eb.prefixedLengthElementDecl
if (prefixElem.lengthKind == Explicit) {
LengthExact(
prefixElem.elementLengthInBitsEv.optConstant.get.get
) + prefixLengthElementLength
} else {
getEncodingLengthApprox(prefixElem)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix elements dont' necessarily have encoding. They could just be binary numbers. And I think it's possible for prefix lengths to have lengthKind="prefixed", in which case this would have to be LengthMultipleOf(lengthUnits) or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are binary numbers, wouldn't the isKnownEncoding condition be false, and result in the value being LengthMultipleOf(8)?

}
}
}
}
case _: ModelGroup => Assert.usageError("Only for elements")
}
}

private lazy val encodingLengthApprox: LengthApprox = {
if (isKnownEncoding) {
val dfdlCharset = charsetEv.optConstant.get
val fixedWidth =
if (dfdlCharset.maybeFixedWidth.isDefined) dfdlCharset.maybeFixedWidth.get else 8
LengthMultipleOf(fixedWidth)
} else {
// runtime encoding, which must be a byte-length encoding
LengthMultipleOf(8)
}
getEncodingLengthApprox(this)
}

protected lazy val isKnownToBeByteAlignedAndByteLength: Boolean = {
Expand Down
Loading