Skip to content

feat: public XSD schema fields, get_type_element_order API, and xlink:href support#25

Open
AlexanderWillner wants to merge 14 commits into
jonwiggins:mainfrom
AlexanderWillner:pr/public-api-xlink
Open

feat: public XSD schema fields, get_type_element_order API, and xlink:href support#25
AlexanderWillner wants to merge 14 commits into
jonwiggins:mainfrom
AlexanderWillner:pr/public-api-xlink

Conversation

@AlexanderWillner
Copy link
Copy Markdown

Summary

Public API improvements and XLink support. Builds on top of PR #24.

Note: This PR includes all commits from #23 and #24. Review/merge those first.

1. Public XSD schema fields (6571ef1)

XsdSchema, XsdElement, ComplexType, ImportedSchema fields are now pub. This enables downstream crates to inspect XSD type definitions programmatically (e.g., to extract element ordering for serialization).

Also adds get_type_element_order(type_name, schema) -> Option<Vec<String>> — a convenience API that returns the ordered element names from a type's merged sequence particles.

2. Skip content validation for xlink:href elements (bc2d038)

GML/AAA schemas use xlink:href to reference external content (e.g., <crs xlink:href="urn:adv:crs:ETRS89_UTM33"/>). When present, the element's content comes from the referenced resource, not inline. This patch skips content model validation for such elements — matching the behaviour of libxml2 with processContents="lax".

Without this fix, valid AAA/NAS files with <crs xlink:href="..."/> (where the XSD declares crs as AbstractCRS_Type requiring child elements) produce false validation errors.

Files changed

  • src/validation/xsd.rs — ~86 lines net change (pub fields + xlink skip + API)

Testing

  • All 1071 existing tests pass
  • Both xmloxide and libxml2 reference validators pass on NAS roundtrip output

Add XSD 1.0 section 3.3.6 substitution group support to the XSD validator.

When element B declares substitutionGroup='A', B can appear anywhere A is
expected in a content model. This is transitive: if C substitutes for B,
C also substitutes for A.

Changes:
- Add substitution_group and is_abstract fields to XsdElement
- Add substitution_groups index to XsdSchema (head -> members map)
- Parse substitutionGroup/abstract attributes in parse_element_decl
- Build substitution index after schema parse via build_substitution_index
- Extend element_matches_decl to accept substitution group members
- Add is_substitution_member for transitive chain resolution
- Resolve instance element type in validate_sequence_element for correct
  content validation of substituted elements
Parse <xs:complexContent><xs:extension base='...'> in complex type
definitions. After all schemas are loaded, merge base-type content
model particles with extension particles in derivation order.

Post-processing step merge_extension_bases() resolves the full
inheritance chain recursively (with cycle detection) and prepends
base-type particles to the derived type's sequence.

Adds parse_complex_content() handler, extension_base field on
ComplexType, resolve_base_particles_impl() with visited-set guard,
and 3 unit tests covering simple extension, multi-level chains,
and empty-base extension.
When a schema uses targetNamespace and elementFormDefault='qualified',
type references like adv:DerivedType now correctly resolve to local
types instead of only searching imported namespaces.

Adds targetNamespace self-check in resolve_type_name and
resolve_element_ref, plus a last-resort local-name fallback in
resolve_type_name. Also adds find_complex_type helper that searches
both local and imported types for base particle resolution.

New tests: complex content extension with targetNamespace,
optional element ordering detection.
Three bugs prevented substitution group members declared in imported
schemas from being recognized during XSD validation:

1. build_substitution_index() only scanned local schema.elements,
   missing imported elements that declare substitutionGroup membership.
   Fix: also iterate imported_namespaces.*.elements.

2. element_matches_decl() rejected same-named elements from different
   namespaces without checking substitution group membership.
   Fix: when namespace differs but local name matches, fall back to
   is_substitution_member() check.

3. is_substitution_member() only looked up transitive member
   declarations in local schema.elements.
   Fix: also search imported_namespaces.*.elements for member decls.

Fixes: FeatureCollection substitution group, AbstractCRS abstract element.
element_matches_decl() now resolves the namespace of element
declarations referenced via ref= attributes (e.g. ref="wfs:FeatureCollection")
instead of always checking against the main schema's targetNamespace.

This fixes validation of documents where imported elements have
different namespaces than the main schema, such as WFS FeatureCollection
in NAS/AAA schemas.

Also:
- Allow unqualified child elements for element_ref declarations
- build_substitution_index scans imported elements
- is_substitution_member looks up transitive members in imports
Verifies that FeatureCollection substitution group is correctly
resolved when validating NAS/AAA files. Known remaining limitations
documented: AbstractCRS via xlink:href, boundedBy in FeatureCollection.
validate_sequence() now detects when elements appear in wrong order
within a sequence. When a child doesn't match the current particle,
checks if it matches a later particle. If not, reports an ordering
error instead of silently skipping.

This catches cases like hatDirektUnten appearing before optional
extension properties (bauwerksfunktion, ergebnisDerUeberpruefung,
qualitaetsangaben) in AAA/NAS schemas.

Also removes debug eprintln from element_matches_decl.
merge_extension_bases() now also processes complexContent extension
chains in imported namespaces, not just the main schema. This fixes
FeatureCollectionType (WFS) which extends SimpleFeatureCollectionType
to include boundedBy + member particles.

Also adds sequence order validation that detects misplaced elements
within xs:sequence (e.g. hatDirektUnten before optional extension
properties).

Removes debug eprintln statements.
Adds XsdParticle::Any variant with namespace constraints (##any,
##other, explicit list) and processContents modes (strict/lax/skip).

- parse_any_wildcard() parses <xsd:any> declarations
- validate_any_wildcard() consumes matching child elements
- Choice validation accepts wildcard as valid alternative
- matches_later_particle() treats Any as always matching

This unblocks validation of NAS features inside <wfs:member> which
uses <xsd:any processContents="lax" namespace="##other"/>.
Expose XsdSchema, XsdElement, ComplexType, ImportedSchema fields
as pub so downstream consumers can query element ordering.

Add get_type_element_order() to retrieve the ordered list of element
names from a complex type's merged sequence (including extension base
inheritance). This enables XSD-based serialization ordering.
GML/AAA schemas use xlink:href to reference external content
(e.g. <crs xlink:href="urn:adv:crs:ETRS89_UTM33"/>). When present,
the element's content comes from the referenced resource, not inline.
Skip content model validation for such elements — same behaviour as
libxml2 with processContents="lax".
validate_xsd now searches imported_namespaces for root element
declarations when not found in the main schema elements map.

This fixes validation of documents whose root element is declared
in an imported schema (e.g., AX_Bestandsdatenauszug in
NAS-Operationen.xsd imported by AAA-Basisschema.xsd).

Also adds test_root_element_from_imported_schema covering both
correct root lookup and element ordering validation against the
full AAA schema chain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant