Skip to content

Conversation

@Ontiomacer
Copy link

@Ontiomacer Ontiomacer commented Dec 25, 2025

Deduplicate TDML tests under test space/test 1

Remove duplicated TDML coverage by deleting namespaces.tdml and
replacing it with a focused spaceInNames.tdml test. This preserves
coverage for TDML and schema resources located in directories
containing spaces without changing existing behavior.

DAFFODIL-2539

@Ontiomacer
Copy link
Author

@mbeckerle here's the pull request for the issue DAFFODIL-2539

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with some minor updates.

Though we do need to fix support for spaces in file names, I think that might be currently broken in some cases. That can either be done as part of this PR if you want, or we can create a new ticket and fix that separately.

Comment on lines 30 to 31
correctly, and that duplicate global element names across schemas
with no namespaces result in a Schema Definition Error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this tests really wants to care about global element names across schemas. This only cares about spaces. That part of the description can probably be removed.

-->

<tdml:parserTestCase
name="no_namespace_colliding_element_names"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest just calling this name="spaces_in_names_01" or something similar, to indicate this is just about verifing spaces in file names works. The fact that there are or aren't namespaces isn't really that important, namespaces just make for a more interesting test.

Make sure to update the test name in TestGeneral.scala to match.

See the License for the specific language governing permissions and
limitations under the License.
-->

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this changes the name of a .tdml file, it means we must also update the test that references the tdml file to use the new name. In this case that is in TestGeneral.scala:

https://github.com/apache/daffodil/blob/main/daffodil-test/src/test/scala/org/apache/daffodil/section00/general/TestGeneral.scala#L83-L97

So we should also update this test to use spacesInNames.tdml instead of namespaces.tdml and also update that test to reference the name of the parser test case.


However, that test in TestGeneral.scala expects it to not be able to find namespaces.tdml. Which it couldn't do before this change. That also probably means it won't be able to find spacesInNames.tdml when that is updated.

I think this means that the test is just broken--we should be able to support spaces in paths and we need tests that verify that works correctly but these tests do not work. I'm pretty sure spaces used to work at some point, so I'm not sure why this test expects failure now.

I think the issues is in our Misc class in the getResource* functions:

https://github.com/apache/daffodil/blob/main/daffodil-core/src/main/scala/org/apache/daffodil/lib/util/Misc.scala#L103-L129

In both the functions we do replaceAll("""\s""", "%20") and then call Class.getResource on that path. But that is wrong because Class.getResource does not expect %20 and won't be able to find any resources that contain them. So our Misc.getResource* functions are broken with paths that contain spaces.

So these replaceAll's probably need to be removed or changed somehow. Note that there are probably other places that expect strings that are escaped URI's so we need to identify and fix those as well. I'm not sure exactly where those would be, but it would probably be revealed in testing.

Also note that there are characters other than spaces that need escaping in a URI. Most of those characters are uncommon in files or schema locations, but it does indicate that how we currently handle strings that are sometimes treated as URI's is probably not correct. We don't want to just add more replaceAlls, we probably need to think more carefully about how we handle strings and convert them to URI's. Maybe we need something like 'new URL(...).toURI` or something, since I believe URL's support spaces.


I think if you update the TestGeneral.scala file to replace namespaces.tdml with spaceInNames.tdml, that would be sufficient to remove the duplication as mentioned in the original ticket.

If you also want to attempt to fix the spaces issues, that could also be done as part of this ticket/PR or we can open another ticket to fix support for spaces.

@Ontiomacer
Copy link
Author

Hey @stevedlawrence I did some of the fixes

1. Test intent clarification

  • Renamed the TDML test case to spaces_in_names_01.
  • Updated the TDML description to focus solely on verifying that TDML and schema resources located in paths containing spaces are resolved correctly.
  • Removed emphasis on global element name collisions, since that behavior is incidental and already covered elsewhere.

2. Test reference alignment

  • Updated TestGeneral.scala to reference spaceInNames.tdml instead of the removed namespaces.tdml.
  • Updated the Scala test to invoke spaces_in_names_01, keeping test names consistent across TDML and Scala.

3. Broken space-handling in resource lookup

  • Fixed Misc.getResourceOption (and related lookup paths) by removing the premature replaceAll("\\s", "%20").
  • Class.getResource now receives raw classpath paths as expected, restoring correct resolution for resources whose paths contain spaces.
  • As a result, the updated space-related tests now validate successful resolution instead of incorrectly expecting failure.

4. Scope

  • The fix is intentionally limited to classpath resource lookup.
  • Broader URI/URL normalization concerns are left unchanged and can be addressed in a follow-up ticket if needed.

With these changes, TDML duplication is removed, coverage for paths containing spaces is preserved, and the underlying resource resolution issue that caused the test confusion is fixed.

@Ontiomacer
Copy link
Author

Ontiomacer commented Dec 30, 2025

I'll continue to fix the changes in this PR let me know if there are more originating issues with the recent changes @stevedlawrence

@Ontiomacer
Copy link
Author

Also @stevedlawrence, I would like to clarify that I ran the full sbt test suite, and the failures observed were not caused by the fix itself, but by tests that were asserting the previously broken behavior. Once the %20 encoding was removed from Misc.getResource*, classpath resource lookup for paths containing spaces began working correctly. This caused tests that were expecting “resource not found” errors to fail. Those tests were updated to validate successful resource resolution instead.

@stevedlawrence
Copy link
Member

It looks like there are two tests related to spaces in names that are failing. I don't think they are unrelated failures, but are due to additional changes needed to support spaces. One test is the new spaces_in_names_01 test, which fails with this stack trace:

Test org.apache.daffodil.section00.general.TestSpace2.spaces_in_names_01 failed: java.lang.IllegalArgumentException: schemaLocation is not a valid URI: test space/test 2/multi_A_05_nons.dfdl.xsd, took 0.225 sec
[error]     at org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1473)
[error]     at org.apache.daffodil.core.dsom.IIBase.$anonfun$4(IIBase.scala:203)
[error]     at scala.Option.flatMap(Option.scala:283)
[error]     at org.apache.daffodil.core.dsom.IIBase.resolvedSchemaLocation$lzyINIT1$$anonfun$1(IIBase.scala:198)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body$lzyINIT1(OOLAG.scala:523)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.body(OOLAG.scala:523)
[error]     at org.apache.daffodil.lib.oolag.OOLAG$OOLAGValueBase.valueAsAny(OOLAG.scala:708)
...
Caused by: java.net.URISyntaxException: Illegal character in path at index 4: test space/test 2/multi_A_05_nons.dfdl.xsd
[error]     at java.net.URI$Parser.fail(URI.java:2995)
[error]     at java.net.URI$Parser.checkChars(URI.java:3166)
[error]     at java.net.URI$Parser.parseHierarchical(URI.java:3248)
[error]     at java.net.URI$Parser.parse(URI.java:3207)
[error]     at java.net.URI.<init>(URI.java:645)
[error]     at org.apache.daffodil.lib.xml.XMLUtils$.resolveSchemaLocation(XMLUtils.scala:1468)

The "Caused by" part of the exception indicates we probably we have something like schemaLocation="test space/test 2/multi_A_05_nons.dfdl.xsd", which we can find in the test schema files in the "test spaces/" directory.

So it seems like we don't currently support spaces in the resolveSchemaLocation function. We may need further investigations to figure out how to correctly fix things and what other functions might need changes.

I think the core issue is that some functions expect string paths to be valid URIs where spaces are escaped with %20 (e.g. resolveSchemaLocation, optRelativeFileURI, optRelativeResourceURI), but some expect them to be normal paths without escapes (e.g. getResourceOption, getRequiredResource) and some I'm not sure it's very well defined (e.g. getResourceAbsoluteOrRelativeOption). I'm sure there are other functions not listed that have some built-in assumptions.

It feels like the right approach to convert our strings to URIs as early as possible and pass the URI around instead of strings. That way we know exactly what we are dealing with. And if we ever need to call a function that expects an unescaped string (e.g. getResourceOption), then we can call uri.getPath() which will do that unescaping and provide us a string that can be used for something like Class.getResource().

Though, this could potentially have a pretty big cascading effect. Changing one variable to a URI might require other functions to need URIs and it might turn into a significant change. Another potential issue is some functions, while not technically part of the public API, are used externally. For example, getRequiredResource, getResourceOption, etc. are used in a lot of tests, and we don't really want to have to change those all to use URIs--they still really do want to use strings.

Or alternatively, instead of changing things to URIs we just kindof have a convention that we expect these string paths to be valid URIs with escaped spaces (and sometimes we'll make escape things for the user), and the functions that don't want escapes can unescape the strings before use.

So all this to say, I'm not really sure what the perfect approach is. It might require making some changes and see what gets affect, what tests start failing, etc. and use that to guide us to the best solution.

You're welcome to dig into this and we're happy to help, but this might also be veering into an issue that isn't quite as beginner friendly since it might start touching a lot of the codebase and dealing with undocumented assumptions.

@Ontiomacer
Copy link
Author

I agree @stevedlawrence this PR has surfaced a deeper inconsistency in how paths versus URIs are handled (notably around schemaLocation and related resolution logic).
At this point, I don’t want to assume the right direction or scope for addressing this. Could you advise on how you’d prefer to proceed from here—whether this PR should remain limited in scope, be extended, or if this should be split into a follow-up issue?

@stevedlawrence
Copy link
Member

I would suggest reverting the changes in Misc.scala and change the test in TestGeneral.scala back so that expects a "not found" error (but expects the new spacesInNames.tdml file). This way this PR is limited in scope to only removing the duplicate namespaces.tdml file.

Then we can open a new ticket about spaces in paths not working correctly and we can reference this PR and the test you've added in the ticket.

@Ontiomacer
Copy link
Author

@stevedlawrence
I’ve reverted the changes in Misc.scala to restore the original resource lookup behavior and updated TestGeneral.scala to continue expecting a “required resource not found” error, now referencing the new spaceInNames.tdml file instead of the removed namespaces.tdml.

With this, the scope of the PR is limited strictly to DAFFODIL-2539 (deduplicating TDML tests while preserving coverage), without attempting to fix the broader spaces-in-paths behavior.
Also I have enable the "Allow edits by maintainers"
Let me know what to do next?

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍 looks good with just a couple more minor updates.

Can you also squash your commits (including the suggested changes) into a single commit and update the commit message to follow the style mentioned in the code contributor workflow. Note that for the Daffodil project we don't usually put the jira issue in the title, but instead put it at the end of the commit message.

}

}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please undo this change, files should end with a new line. This might require a change in your IDE settings.

@Test def invalidRestrictionPolicyValidate_01 = test
@Test def invalidRestrictionPolicyValidate_02 = test
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a newline to this file.

Comment on lines 83 to 97
object TestSpace2 extends TdmlSuite {
val tdmlResource = "/test space/test 1/namespaces.tdml"
}

class TestSpace2 extends TdmlTests {
val tdmlSuite = TestSpace2

@Test def no_namespace_02(): Unit = {
val e = intercept[Exception] { test }
val m = e.getMessage()
assertTrue(m.toLowerCase.contains("required resource"))
assertTrue(m.contains("/test%20space/test%201/namespaces.tdml"))
assertTrue(m.toLowerCase.contains("not found"))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the tdmlResource to reference spacesInNames.tdml and in the assertTrue. When we fix the spaces issue it will make it clear where the tdml file is to be used for testing.

rename the TDML file for clarity, and retain coverage
for schemas and TDML files located in paths with spaces.

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>

Fix classpath resource lookup for paths containing spaces

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>

Revert resource lookup behavior to keep PR scoped to test deduplication.

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>

Fix formatting and update TestGeneral references

Signed-off-by: Tejas Tiwari <tt160705@outlook.com>
@Ontiomacer Ontiomacer force-pushed the daffodil-2539-tdml-dup-cleanup branch from 772ff8c to b2f4f28 Compare January 1, 2026 16:16
@Ontiomacer Ontiomacer changed the title DAFFODIL-2539 Deduplicate TDML tests while preserving coverage for paths with spaces Deduplicate TDML tests while preserving coverage for paths with spaces Jan 1, 2026
@Ontiomacer
Copy link
Author

@stevedlawrence
I did all the requested also edited the pr title description accordingly with changes is there anything left to cover let me know if anything is incorrect

@Ontiomacer
Copy link
Author

Ontiomacer commented Jan 1, 2026

“Allow edits by maintainers” has been enabled to make it easier to apply
any minor adjustments if needed.
@stevedlawrence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants