Skip to content

Conversation

@Ontiomacer
Copy link

This PR addresses DAFFODIL-2539 by refactoring TDML tests under
daffodil-test/src/test/resources/test space/test 1/.
The original namespaces.tdml duplicated coverage that already exists under
section06/namespaces, while also mixing multiple test concerns.
This change removes the duplicated TDML file and replaces it with a focused spaceInNames.tdml test that preserves coverage for schemas and TDML files located in directories with spaces, including schema resolution and no-namespace collision behavior.
All existing tests pass locally (sbt daffodil-test/test), and no functional coverage related to paths containing spaces is removed.

rename the TDML file for clarity, and retain coverage
for schemas and TDML files located in paths with spaces.

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>
@Ontiomacer
Copy link
Author

@mbeckerle here's the pull request for the issue DAFFODIL-2539

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with some minor updates.

Though we do need to fix support for spaces in file names, I think that might be currently broken in some cases. That can either be done as part of this PR if you want, or we can create a new ticket and fix that separately.

Comment on lines 30 to 31
correctly, and that duplicate global element names across schemas
with no namespaces result in a Schema Definition Error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this tests really wants to care about global element names across schemas. This only cares about spaces. That part of the description can probably be removed.

-->

<tdml:parserTestCase
name="no_namespace_colliding_element_names"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest just calling this name="spaces_in_names_01" or something similar, to indicate this is just about verifing spaces in file names works. The fact that there are or aren't namespaces isn't really that important, namespaces just make for a more interesting test.

Make sure to update the test name in TestGeneral.scala to match.

See the License for the specific language governing permissions and
limitations under the License.
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this changes the name of a .tdml file, it means we must also update the test that references the tdml file to use the new name. In this case that is in TestGeneral.scala:

https://github.com/apache/daffodil/blob/main/daffodil-test/src/test/scala/org/apache/daffodil/section00/general/TestGeneral.scala#L83-L97

So we should also update this test to use spacesInNames.tdml instead of namespaces.tdml and also update that test to reference the name of the parser test case.


However, that test in TestGeneral.scala expects it to not be able to find namespaces.tdml. Which it couldn't do before this change. That also probably means it won't be able to find spacesInNames.tdml when that is updated.

I think this means that the test is just broken--we should be able to support spaces in paths and we need tests that verify that works correctly but these tests do not work. I'm pretty sure spaces used to work at some point, so I'm not sure why this test expects failure now.

I think the issues is in our Misc class in the getResource* functions:

https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/lib/util/Misc.scala#L103-L129

In both the functions we do replaceAll("""\s""", "%20") and then call Class.getResource on that path. But that is wrong because Class.getResource does not expect %20 and won't be able to find any resources that contain them. So our Misc.getResource* functions are broken with paths that contain spaces.

So these replaceAll's probably need to be removed or changed somehow. Note that there are probably other places that expect strings that are escaped URI's so we need to identify and fix those as well. I'm not sure exactly where those would be, but it would probably be revealed in testing.

Also note that there are characters other than spaces that need escaping in a URI. Most of those characters are uncommon in files or schema locations, but it does indicate that how we currently handle strings that are sometimes treated as URI's is probably not correct. We don't want to just add more replaceAlls, we probably need to think more carefully about how we handle strings and convert them to URI's. Maybe we need something like 'new URL(...).toURI` or something, since I believe URL's support spaces.


I think if you update the TestGeneral.scala file to replace namespaces.tdml with spaceInNames.tdml, that would be sufficient to remove the duplication as mentioned in the original ticket.

If you also want to attempt to fix the spaces issues, that could also be done as part of this ticket/PR or we can open another ticket to fix support for spaces.

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>
@Ontiomacer
Copy link
Author

Hey @stevedlawrence I did some of the fixes

1. Test intent clarification

  • Renamed the TDML test case to spaces_in_names_01.
  • Updated the TDML description to focus solely on verifying that TDML and schema resources located in paths containing spaces are resolved correctly.
  • Removed emphasis on global element name collisions, since that behavior is incidental and already covered elsewhere.

2. Test reference alignment

  • Updated TestGeneral.scala to reference spaceInNames.tdml instead of the removed namespaces.tdml.
  • Updated the Scala test to invoke spaces_in_names_01, keeping test names consistent across TDML and Scala.

3. Broken space-handling in resource lookup

  • Fixed Misc.getResourceOption (and related lookup paths) by removing the premature replaceAll("\\s", "%20").
  • Class.getResource now receives raw classpath paths as expected, restoring correct resolution for resources whose paths contain spaces.
  • As a result, the updated space-related tests now validate successful resolution instead of incorrectly expecting failure.

4. Scope

  • The fix is intentionally limited to classpath resource lookup.
  • Broader URI/URL normalization concerns are left unchanged and can be addressed in a follow-up ticket if needed.

With these changes, TDML duplication is removed, coverage for paths containing spaces is preserved, and the underlying resource resolution issue that caused the test confusion is fixed.

@Ontiomacer
Copy link
Author

Ontiomacer commented Dec 30, 2025

I'll continue to fix the changes in this PR let me know if there are more originating issues with the recent changes @stevedlawrence

@Ontiomacer
Copy link
Author

Also @stevedlawrence, I would like to clarify that I ran the full sbt test suite, and the failures observed were not caused by the fix itself, but by tests that were asserting the previously broken behavior. Once the %20 encoding was removed from Misc.getResource*, classpath resource lookup for paths containing spaces began working correctly. This caused tests that were expecting “resource not found” errors to fail. Those tests were updated to validate successful resource resolution instead.

@stevedlawrence
Copy link
Member

It looks like there are two tests related to spaces in names that are failing. I don't think they are unrelated failures, but are due to additional changes needed to support spaces. One test is the new spaces_in_names_01 test, which fails with this stack trace:

Test org.apache.daffodil.section00.general.TestSpace2.spaces_in_names_01 failed: java.lang.IllegalArgumentException: schemaLocation is not a valid URI: test space/test 2/multi_A_05_nons.dfdl.xsd, took 0.225 sec
[error]     at org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1473)
[error]     at org.apache.daffodil.core.dsom.IIBase.$anonfun$4(IIBase.scala:203)
[error]     at scala.Option.flatMap(Option.scala:283)
[error]     at org.apache.daffodil.core.dsom.IIBase.resolvedSchemaLocation$lzyINIT1$$anonfun$1(IIBase.scala:198)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body$lzyINIT1(OOLAG.scala:523)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body(OOLAG.scala:523)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.valueAsAny(OOLAG.scala:708)
...
Caused by: java.net.URISyntaxException: Illegal character in path at index 4: test space/test 2/multi_A_05_nons.dfdl.xsd
[error]     at java.net.URI$Parser.fail(URI.java:2995)
[error]     at java.net.URI$Parser.checkChars(URI.java:3166)
[error]     at java.net.URI$Parser.parseHierarchical(URI.java:3248)
[error]     at java.net.URI$Parser.parse(URI.java:3207)
[error]     at java.net.URI.<init>(URI.java:645)
[error]     at org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1468)

The "Caused by" part of the exception indicates we probably we have something like schemaLocation="test space/test 2/multi_A_05_nons.dfdl.xsd", which we can find in the test schema files in the "test spaces/" directory.

So it seems like we don't currently support spaces in the resolveSchemaLocation function. We may need further investigations to figure out how to correctly fix things and what other functions might need changes.

I think the core issue is that some functions expect string paths to be valid URIs where spaces are escaped with %20 (e.g. resolveSchemaLocation, optRelativeFileURI, optRelativeResourceURI), but some expect them to be normal paths without escapes (e.g. getResourceOption, getRequiredResource) and some I'm not sure it's very well defined (e.g. getResourceAbsoluteOrRelativeOption). I'm sure there are other functions not listed that have some built-in assumptions.

It feels like the right approach to convert our strings to URIs as early as possible and pass the URI around instead of strings. That way we know exactly what we are dealing with. And if we ever need to call a function that expects an unescaped string (e.g. getResourceOption), then we can call uri.getPath() which will do that unescaping and provide us a string that can be used for something like Class.getResource().

Though, this could potentially have a pretty big cascading effect. Changing one variable to a URI might require other functions to need URIs and it might turn into a significant change. Another potential issue is some functions, while not technically part of the public API, are used externally. For example, getRequiredResource, getResourceOption, etc. are used in a lot of tests, and we don't really want to have to change those all to use URIs--they still really do want to use strings.

Or alternatively, instead of changing things to URIs we just kindof have a convention that we expect these string paths to be valid URIs with escaped spaces (and sometimes we'll make escape things for the user), and the functions that don't want escapes can unescape the strings before use.

So all this to say, I'm not really sure what the perfect approach is. It might require making some changes and see what gets affect, what tests start failing, etc. and use that to guide us to the best solution.

You're welcome to dig into this and we're happy to help, but this might also be veering into an issue that isn't quite as beginner friendly since it might start touching a lot of the codebase and dealing with undocumented assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants