-
Notifications
You must be signed in to change notification settings - Fork 71
Add logic for separator/initiator/prefixed/terminator alignment #1604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,17 +21,24 @@ import org.apache.daffodil.core.dsom.ElementBase | |
| import org.apache.daffodil.core.dsom.ModelGroup | ||
| import org.apache.daffodil.core.dsom.QuasiElementDeclBase | ||
| import org.apache.daffodil.core.dsom.Root | ||
| import org.apache.daffodil.core.dsom.SequenceTermBase | ||
| import org.apache.daffodil.core.dsom.Term | ||
| import org.apache.daffodil.lib.exceptions.Assert | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.AlignmentKind | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.AlignmentUnits | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind.Explicit | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.LengthKind.Prefixed | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.LengthUnits | ||
| import org.apache.daffodil.lib.schema.annotation.props.gen.SeparatorPosition | ||
| import org.apache.daffodil.lib.util.Math | ||
|
|
||
| case class AlignmentMultipleOf(nBits: Long) { | ||
| // To combine AlignmentMultipleOfs, use * | ||
| def *(that: AlignmentMultipleOf) = AlignmentMultipleOf(Math.gcd(nBits, that.nBits)) | ||
| def %(that: AlignmentMultipleOf) = nBits % that.nBits | ||
|
|
||
| // To combine AlignmentMultipleOfs and lengths, use + | ||
| def +(that: LengthApprox) = AlignmentMultipleOf(Math.gcd(nBits, nBits + that.nBits)) | ||
| } | ||
|
|
||
|
|
@@ -134,17 +141,17 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| LengthExact(trailingSkipInBits) | ||
| } | ||
|
|
||
| // FIXME: DAFFODIL-2295 | ||
| // Does not take into account that in a sequence, what may be prior may be a separator. | ||
| // The separator is text in some encoding, might not be the same as this term's encoding, and | ||
| // the alignment will be left where that text leaves it. | ||
| // | ||
| /** | ||
| * The priorAlignmentApprox doesn't need to take into account the separator | ||
| * alignment/length as that comes after the priorAlignmentApprox, but before | ||
| * the leadingSkip. | ||
| */ | ||
| private lazy val priorAlignmentApprox: AlignmentMultipleOf = | ||
| LV(Symbol("priorAlignmentApprox")) { | ||
| if (this.isInstanceOf[Root] || this.isInstanceOf[QuasiElementDeclBase]) { | ||
| AlignmentMultipleOf( | ||
| 0 | ||
| ) // root and quasi elements are aligned with anything // TODO: really? Why quasi-elements - they should have implicit alignment ? | ||
| ) // root and quasi elements are aligned with anything (quasi elements are always 1 aligned) | ||
| } else { | ||
| val priorSibs = potentialPriorTerms | ||
| val arraySelfAlignment = | ||
|
|
@@ -201,7 +208,7 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| // Return 0 here, unordered alignment will be handled by unorderedSequenceSelfAlignment | ||
| AlignmentMultipleOf(0) | ||
| } else { | ||
| ps.endingAlignmentApprox | ||
| ps.endingAlignmentWithRightFramingApprox | ||
| } | ||
| eaa | ||
| } | ||
|
|
@@ -228,29 +235,161 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| } | ||
| }.value | ||
|
|
||
| private lazy val separatorPrefixMTAApprox = | ||
| this.optLexicalParent match { | ||
| case Some(s: SequenceTermBase) if s.hasSeparator => | ||
| import SeparatorPosition.* | ||
| s.separatorPosition match { | ||
| case Prefix | Infix => LengthMultipleOf(s.knownEncodingAlignmentInBits) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MTA is an alignment thing, so I think these MTA approx functions what to return
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure that's right. An MTA "region" like any region is a part of the data stream that has a length. So it depends are we computing the length of something, computing the alignment before or after it, or asking what it's required alignment is.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The This way we can look at approximate end alignment (we should probably call this approximate position) of whatever came before the MTA region and determine if the MTA region is known to already be aligned and so can be excluded. And then based on the previous alignment and the MTA, we can then calculate where that MTA will have put us. For example, we might know it just put us on a byte boundary, but we might know more and know that it actually put us on a 2-byte boundary, allowing for better optimizations if a later elements needs to be 2-byte aligned.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha. I think you are correct. I think some of the complexity in this code could be reduced by rigorously naming the various things to avoid confusion. Being clear about lengthApprox vs positionApprox vs. alignmentRequirement would go a long ways. We're overusing the term "alignment" here to mean position and requirement.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely. The naming could definitely be improved. Adding more variables might also be helpful. This document has great diagrams of all the different parts that are considered in our alignment https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/ Some of our current variables kindof mush multiple boxes together which can make it difficult to tease apart what's going on and how well we actually match the document. It might makes sense to have a variable that calculates the approximate start of every one of the boxes with matching names (e.g. allignFilleApproximateStart, initiatorMTAApproximateStart),. It adds more variables, but might make things more clear. And that might simplify other logic that tries to statically compile out different grams. For example, the gram guards can become something like: val needsAlignFillGram = alignFillApproxStart % alignment != 0
val needsInitiatorMTAGram = initiatorMTAApproxStart % initiator.encoding.mandatoryTextAlignment != 0 |
||
| case Postfix => LengthMultipleOf(0) | ||
| } | ||
| case _ => LengthMultipleOf(0) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this really wants to be AlignmentApprox(1) (MTA is alignment after all so using length isn't quite right) and then with the new In fact, I think |
||
| } | ||
|
|
||
| private lazy val separatorPostfixMTAApprox = | ||
| this.optLexicalParent match { | ||
| case Some(s: SequenceTermBase) if s.hasSeparator => | ||
| import SeparatorPosition.* | ||
| s.separatorPosition match { | ||
| case Prefix | Infix => LengthMultipleOf(0) | ||
| case Postfix => LengthMultipleOf(s.knownEncodingAlignmentInBits) | ||
| } | ||
| case _ => LengthMultipleOf(0) | ||
| } | ||
|
|
||
| private lazy val separatorLengthApprox = this.optLexicalParent match { | ||
| case Some(s: SequenceTermBase) if s.hasSeparator => | ||
| getEncodingLengthApprox(s) | ||
| case _ => LengthMultipleOf(0) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of |
||
| } | ||
|
|
||
| private lazy val initiatorMTAApprox = | ||
| if (this.hasInitiator) { | ||
| AlignmentMultipleOf(this.knownEncodingAlignmentInBits) | ||
| } else { | ||
| AlignmentMultipleOf(0) | ||
| } | ||
|
|
||
| private lazy val initiatorLengthApprox = if (this.hasInitiator) { | ||
| getEncodingLengthApprox(this) | ||
| } else { | ||
| LengthMultipleOf(0) | ||
| } | ||
|
|
||
| private lazy val terminatorMTAApprox = | ||
| if (this.hasTerminator) { | ||
| AlignmentMultipleOf(this.knownEncodingAlignmentInBits) | ||
| } else { | ||
| AlignmentMultipleOf(0) | ||
| } | ||
|
|
||
| private lazy val terminatorLengthApprox = if (this.hasTerminator) { | ||
| getEncodingLengthApprox(this) | ||
| } else { | ||
| LengthMultipleOf(0) | ||
| } | ||
|
|
||
| private def getEncodingLengthApprox(t: Term) = { | ||
| if (t.isKnownEncoding) { | ||
| val dfdlCharset = t.charsetEv.optConstant.get | ||
| val fixedWidth = | ||
| if (dfdlCharset.maybeFixedWidth.isDefined) dfdlCharset.maybeFixedWidth.get else 8 | ||
| LengthMultipleOf(fixedWidth) | ||
| } else { | ||
| // runtime encoding, which must be a byte-length encoding | ||
| LengthMultipleOf(8) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * prior alignment with leading skip includes the prefix/infix separator MTA and length, | ||
| * any leading skips | ||
| */ | ||
| private lazy val priorAlignmentWithLeadingSkipApprox: AlignmentMultipleOf = { | ||
| priorAlignmentApprox + leadingSkipApprox | ||
| val priorAlignmentWithSeparatorApprox = priorAlignmentApprox | ||
| + separatorPrefixMTAApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that with the new + operator, this does want to stay a |
||
| + separatorLengthApprox | ||
| val leadingAlignmentApprox = (priorAlignmentWithSeparatorApprox | ||
| + leadingSkipApprox) | ||
| leadingAlignmentApprox | ||
| } | ||
|
|
||
| /** | ||
| * The length of the prefix length quasi element. | ||
| * | ||
| * This is only relevant for quasi elements, or prefixed terms. | ||
| */ | ||
| private lazy val prefixLengthElementLength = { | ||
| this match { | ||
| case eb: ElementBase => { | ||
| if (eb.isInstanceOf[QuasiElementDeclBase]) | ||
| LengthExact( | ||
| eb.asInstanceOf[QuasiElementDeclBase].elementLengthInBitsEv.constValue.get | ||
| ) | ||
| else if (eb.lengthKind == Prefixed) | ||
| LengthExact( | ||
| this | ||
| .asInstanceOf[ElementBase] | ||
| .prefixedLengthElementDecl | ||
| .elementLengthInBitsEv | ||
| .constValue | ||
| .get | ||
| ) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assumes quasi elements and prefix length elements have constant lengths, but I'm not sure that's always true. For example, I think we allow a prefix length to have a prefix length, in which case one of the prefix lenghts isn't constant. I think prefix lengths is already broken in a number of ways, so maybe this doesn't have to necessarily have be fixed as part of this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I updated the code to recursively call prefixLengthElementLength for elements with LK=prefixed to get the length of it's prefix length, though we don't currently support nested prefixed elements in Daffodil as it's a subset error. I left the fix in regardless for when we do implement that. Added test_prefix_alignment_3 test. |
||
| else | ||
| LengthExact(0) | ||
| } | ||
| case _ => LengthExact(0) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The alignment of the value of the term. | ||
| * | ||
| * This is only relevant for simple elements. | ||
| */ | ||
| private lazy val valueMTAApprox = { | ||
| this match { | ||
| case eb: ElementBase if eb.isSimpleType => { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't this also only apply if representation is text? For example, if this is a binary number then there is no MTA. I'm not sure if there are other cases where it does/does not apply (e.g. zoned numbers have to be aligned, but maybe that's not MTA?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, the term sharing doc does mention text representation in addition to the simpletType and element term conditions https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/ |
||
| AlignmentMultipleOf(eb.knownEncodingAlignmentInBits) | ||
| } | ||
| case _ => AlignmentMultipleOf(0) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * This alignment is made up of the alignment of the term itself, | ||
| * the initiator MTA and length, the prefix length quasi | ||
| * element length, and the value MTA (we add it for optimization | ||
| * but is extremely difficult to test, as other things such | ||
| * as unparsers will provide MTA, even elementSpecifiedLengthApprox | ||
| * considers the encoding also). | ||
| */ | ||
| protected lazy val contentStartAlignment: AlignmentMultipleOf = { | ||
| if (priorAlignmentWithLeadingSkipApprox % alignmentApprox == 0) { | ||
| // alignment won't be needed, continue using prior alignment as start alignment | ||
| priorAlignmentWithLeadingSkipApprox | ||
| } else { | ||
| // alignment will be needed, it will be forced to be aligned to alignmentApprox | ||
| val leftFramingApprox = | ||
| alignmentApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think maybe this actually does want to take into account the prirorAlignmentWithLeadingSkip, but do so with the new Note that the + operator has similar logic to what we used to have (i.e. if alignApprox divides priorAlignmentWithLeadingSkipApprox then we keep using the prior). |
||
| } | ||
| * initiatorMTAApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This wants use the new + operator. |
||
| + initiatorLengthApprox | ||
| val csa = (leftFramingApprox + prefixLengthElementLength) * valueMTAApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this also wants to use the new + operator. |
||
| csa | ||
| } | ||
|
|
||
| /** | ||
| * Accounts for the start of the content alignment, element length, | ||
| * terminator MTA/Length and trailing skip | ||
| */ | ||
| protected lazy val endingAlignmentApprox: AlignmentMultipleOf = { | ||
| this match { | ||
| case eb: ElementBase => { | ||
| if (eb.isComplexType && eb.lengthKind == LengthKind.Implicit) { | ||
| eb.complexType.group.endingAlignmentApprox + trailingSkipApprox | ||
| val res = if (eb.isComplexType && eb.lengthKind == LengthKind.Implicit) { | ||
| eb.complexType.group.endingAlignmentApprox | ||
| } else { | ||
| // simple type or complex type with specified length | ||
| contentStartAlignment + (elementSpecifiedLengthApprox + trailingSkipApprox) | ||
| contentStartAlignment + elementSpecifiedLengthApprox | ||
| } | ||
| val cea = res * terminatorMTAApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wants to be the new + operator |
||
| + terminatorLengthApprox | ||
| + trailingSkipApprox | ||
| cea | ||
| } | ||
| case mg: ModelGroup => { | ||
| // | ||
|
|
@@ -260,37 +399,53 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| // | ||
| val (lastChildren, couldBeLast) = mg.potentialLastChildren | ||
| val lastApproxesConsideringChildren: Seq[AlignmentMultipleOf] = lastChildren.map { lc => | ||
| // | ||
| // for each possible last child, add its ending alignment | ||
| // to our trailing skip to get where it would leave off were | ||
| // it the actual last child. | ||
| // | ||
| // for each possible last child, gather its endingAlignmentApprox | ||
| val lceaa = lc.endingAlignmentApprox | ||
| val res = lceaa + trailingSkipApprox | ||
| res | ||
| lceaa | ||
| } | ||
| val optApproxIfNoChildren = | ||
| // | ||
| // gather possibilities for this item itself | ||
| // | ||
| if (couldBeLast) | ||
| if (couldBeLast) { | ||
| // | ||
| // if this model group could be last, then consider | ||
| // if none of its content was present. | ||
| // We'd just have the contentStart, nothing, and the trailing Skip. | ||
| // We'd just have the contentStart | ||
| // | ||
| Some(contentStartAlignment + trailingSkipApprox) | ||
| else | ||
| val res = Some(contentStartAlignment) | ||
| res | ||
| } else | ||
| // can't be last, no possibilities to gather. | ||
| None | ||
| val lastApproxes = lastApproxesConsideringChildren ++ optApproxIfNoChildren | ||
| Assert.invariant(lastApproxes.nonEmpty) | ||
| val res = lastApproxes.reduce { _ * _ } | ||
| // take all the gathered possibilities and add the ending alignment | ||
| // to terminator MTA/length and our trailing skip to get where it would leave off were | ||
| // each the actual last child. | ||
| val lastApproxesFinal = lastApproxes.map { | ||
| _ * terminatorMTAApprox | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This wants to be a + |
||
| + terminatorLengthApprox | ||
| + trailingSkipApprox | ||
| } | ||
| Assert.invariant(lastApproxesFinal.nonEmpty) | ||
| val res = lastApproxesFinal.reduce { | ||
| _ * _ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think right here is the only place we really want to use the * operator. The goal of this is to combine a bunch of different alignemtns that could happen at the same place time due to choice or optional elements, so this finds the the common alignment they all have. |
||
| } | ||
| res | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * The postfix separator MTA/length needs to be added after the trailing skip | ||
| */ | ||
| protected lazy val endingAlignmentWithRightFramingApprox: AlignmentMultipleOf = { | ||
| val res = endingAlignmentApprox | ||
| + separatorPostfixMTAApprox | ||
| + separatorLengthApprox | ||
| res | ||
| } | ||
|
|
||
| protected lazy val elementSpecifiedLengthApprox: LengthApprox = { | ||
| this match { | ||
| case eb: ElementBase => { | ||
|
|
@@ -321,23 +476,24 @@ trait AlignedMixin extends GrammarMixin { self: Term => | |
| case LengthKind.Delimited => encodingLengthApprox | ||
| case LengthKind.Pattern => encodingLengthApprox | ||
| case LengthKind.EndOfParent => LengthMultipleOf(1) // NYI | ||
| case LengthKind.Prefixed => LengthMultipleOf(1) // NYI | ||
| case LengthKind.Prefixed => { | ||
| val prefixElem = eb.prefixedLengthElementDecl | ||
| if (prefixElem.lengthKind == Explicit) { | ||
| LengthExact( | ||
| prefixElem.elementLengthInBitsEv.optConstant.get.get | ||
| ) + prefixLengthElementLength | ||
| } else { | ||
| getEncodingLengthApprox(prefixElem) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Prefix elements dont' necessarily have encoding. They could just be binary numbers. And I think it's possible for prefix lengths to have lengthKind="prefixed", in which case this would have to be
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If they are binary numbers, wouldn't the isKnownEncoding condition be false, and result in the value being LengthMultipleOf(8)? |
||
| } | ||
| } | ||
| } | ||
| } | ||
| case _: ModelGroup => Assert.usageError("Only for elements") | ||
| } | ||
| } | ||
|
|
||
| private lazy val encodingLengthApprox: LengthApprox = { | ||
| if (isKnownEncoding) { | ||
| val dfdlCharset = charsetEv.optConstant.get | ||
| val fixedWidth = | ||
| if (dfdlCharset.maybeFixedWidth.isDefined) dfdlCharset.maybeFixedWidth.get else 8 | ||
| LengthMultipleOf(fixedWidth) | ||
| } else { | ||
| // runtime encoding, which must be a byte-length encoding | ||
| LengthMultipleOf(8) | ||
| } | ||
| getEncodingLengthApprox(this) | ||
| } | ||
|
|
||
| protected lazy val isKnownToBeByteAlignedAndByteLength: Boolean = { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these changes are making me wonder if we need a
def +(that: AlignmentMultipleOf)that has slightly different logic than gcd.I believe the intention of
*was to combine alignments that could all happen at the same point in data (e.g. a choice of elements, optional siblings elements).But in the new code, we are taking an existing approximate alignment, and potentially adding a new alignment to it, which is slightly different.
For example, say we are currently at 2-byte alignment (
AlignmentMultipleof(16)) and we are adding an element that needs to be 1-byte (AlignmentMultipleOf(8)). In this case, we do not need to perform any alignment here because we are already byte align (we are actually 2-byte aligned), and our logic handles that correctly.But the contentStartAlignment of that new element still really wants to be 2-byte aligned, so it should remain
AlignmentMultipleOf(16). But right now, using*it's alignment will beAlignmentMultilpeOf(8)sincegcd(16,8) => 8. So we actually have less information about our true alignment, and could miss out on future optimizations.So it feels like a
+operation might want to be something likeThe idea is that if our new alignment (
that) evenly divides our existing alignment (this), then no alignment is actually needed, and we can keep our more accurate current alignment. Otherwise we use the new alignment.