Refactoring to expose clean metadata and infoset walkers #1112

mbeckerle · 2023-11-09T03:24:56Z

(Supercedes PR #1092, which is still open because I linked the comments there from comments in this PR.)

Version 3.7.0-SNAPSHOT

This was designed to support integration of Daffodil directly (without intermediate XML or JSON or even EXI) to other data handling tools, specifically, Apache Drill.

InfosetElement and related InfosetNode traits are now in runtime1.api package.

InfosetOutputter now has methods which use InfosetElement and InfosetArray traits as the objects passed to the handler methods. This improves the API over making DIArray, DISimple, and DIComplex visible.

Added Metadata, ElementMetadata, etc. (also runtime1.api package) which provide limited exposure to the RuntimeData and CompilerInfo information.

Added MetadataHandler - which is walked by MetadataWalker which can be called from DataProcessor.

Walking the runtime1 metadata is easier than walking the DSOM tree. And these data fabrics like Apache Drill are interfacing to the runtime, not the compiler, so it's natural for the runtime1 metadata structures and data structures to be the ones driving the interfacing.

The InfosetNode types were always supposed to be the API, the DINodes the implementation. This solves the issue of what classes should show through to SAPI and JAPI about the infoset nodes. It should be the InfosetNode types, not the DINode types.

Furthermore, the InfosetNode types can have methods to access the needed runtime metadata information needed by walkers, InfosetOutputters, etc.

These hide our infoset implementation and runtime metdata (RuntimeData classes) implementions.

Added unit test for metadata and data walking to core module

Note: Nothing has changed with InfosetInputters, as those are not needed for Apache Drill integration - which is parse-only.

Refactored InfosetOutputter to enable sharing InfosetOutputterImpl.

BlobMethodImpl is now a class so must be extended first, InfosetOutputter is a trait so is implemented (second) in Java declarations.

Removed use of DIComplex, DISimple from SAPI and JAPI tests

NodeInfo.PrimType is n.g. for Java access.
Using PrimTypeNode which is a class.

TODO: maybe just use type name strings for this.

Added features to SchemaUtils to avoid "tns" prefix definition (which is now officially frowned upon)

Added isHidden to SequenceRuntimeData. Needed to avoid walking hidden elements that appear in metadata structures, but aren't relevant as they do not appear in InfosetOutputter events.

TODO: ensure these are not shared between hidden/non-hidden groups. (PropEnv stuff)
That is: isHidden on SequenceRuntimeData may be broken in the case where a sequence definition is shared both hidden and non-hidden.

Added child access to InfosetComplexElement. This is used by tests.

It is questionable whether this should be allowed, because we don't have fast O(1) hash-based access to all children, only to children referenced by expressions. This allows linear search for now.

Added QName.toPrefix_localName method. Why allocate these on the fly if they're static compile time constants.

DAFFODIL-2832

daffodil-core/src/test/scala/org/apache/daffodil/core/general/TestRuntimeProperties.scala

daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetImpl.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetOutputter.scala

daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala

stevedlawrence

Really nice changes! Main questions are surrounding if it would be possible to make the API even smaller.

daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SequenceTermRuntime1Mixin.scala

daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala

daffodil-japi/src/main/scala/org/apache/daffodil/japi/infoset/Infoset.scala

...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JDOMInfosetOutputter.scala

...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/ScalaXMLInfosetOutputter.scala

...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLInfosetOutputterMixin.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/RuntimeData.scala

daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetImpl.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala

...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JDOMInfosetOutputter.scala

...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLInfosetOutputterMixin.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala

mbeckerle · 2023-11-12T05:16:51Z

@stevedlawrence This is ready for a next review pass. I've addressed all the prior comments I believe, and cleaned up several other things.

Once merged, I want to run a large schema regression test with as many schemas as possible to verify this hasn't broken things.

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala

stevedlawrence

Looks really good. A number of comments with fresh eyes.

I think my main question is what this all looks like from a Javadoc and Java API implementation perspective, so that weird scala stuff doesn't leak into it and that all the members are visible in the doc. I think all the new MetaData and Infoset traits aren't defined anywhere where are java doc is generated, so we need to figure out how that will work. I dont' really want more proxy classes.

Maybe we can plan to drop the japi/sapi stuff, and merge that and the new stuff into a the api package and finally get rid of the proxy stuff. 4.0.0 is nearing, so maybe it's time to discuss fixing our API.

daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SequenceTermRuntime1Mixin.scala

daffodil-core/src/test/scala/org/apache/daffodil/core/api/TestMetadataWalking.scala

daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala

...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JsonInfosetOutputter.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/RuntimeData.scala

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/unparsers/UState.scala

mbeckerle · 2023-11-14T10:29:16Z

@stevedlawrence @tuxji This should be very close to ready now.
Please give this a review again.

tuxji · 2023-11-14T14:06:26Z

@stevedlawrence @tuxji This should be very close to ready now. Please give this a review again.

I'll start reviewing this PR. Quick question, though. Is this new walker API intended to be used by the C code generator too, instead of the C generator walking the compiled schema by hand?

mbeckerle · 2023-11-14T14:36:24Z

Not intended for that use case. This new metadata walker is for java (typically) applications to interface to runtime1 specifically. The c backend is more of a compiler integration.

…

On Tue, Nov 14, 2023, 9:06 AM John Interrante ***@***.***> wrote: @stevedlawrence <https://github.com/stevedlawrence> @tuxji <https://github.com/tuxji> This should be very close to ready now. Please give this a review again. I'll start reviewing this PR. Quick question, though. Is this new walker API intended to be used by the C code generator too, instead of the C generator walking the compiled schema by hand? — Reply to this email directly, view it on GitHub <#1112 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALUDA2STEIZ6GIJEIZOX53YEN3G5AVCNFSM6AAAAAA7D4HOC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGI3TEMJWGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

tuxji

+1

Looks good overall.

tuxji · 2023-11-13T21:57:10Z

build.sbt

@@ -219,7 +219,7 @@ val minSupportedJavaVersion: String =

 lazy val commonSettings = Seq(
  organization := "org.apache.daffodil",
-  version := "3.6.0",
+  version := "3.7.0-SNAPSHOT",


This change doesn't belong in this PR. PR #1113 which hasn't been merged yet will make this change for us.

tuxji · 2023-11-14T14:41:29Z

daffodil-core/src/test/scala/org/apache/daffodil/core/infoset/TestInfoset.scala


-        assertEquals(4, a.dataValue.getAnyRef)
-        assertEquals(infoset, a.parent)
+        assertEquals(4, a.getUnsignedLong.longValue().toInt)


What is the reason for calling .longValue() when it seems possible to call a.getUnsignedLong.toInt directly?

These return java types. java.math.BigInteger didn't seem to have a direct conversion to int. Hence this.

tuxji · 2023-11-14T14:48:14Z

daffodil-japi/src/test/java/org/apache/daffodil/example/TestJavaAPI.java

@@ -1209,7 +1209,6 @@ public void testJavaAPI25() throws IOException, ClassNotFoundException, External
        assertEquals(expectedData, bos.toString());
    }

-    @Test


Is this change intentional and why is the test being disabled?

Looks like a mistake. I'll add this back.

tuxji · 2023-11-14T14:53:37Z

daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala

+
+/**
+ * This is the supportable API for access to the RuntimeData structures
+ * which provide access to static information about a given .


About a given compiled schema object, or is there a better description than that?

stevedlawrence

+1

I looked at the javadoc and it doesn't include any of the new stuff since it's not in the japi project. That's expected, and doesn't have to be fixed for this PR, but we should make sure that is addressed at some point, maybe once the drill integration is complete and we are confident that the new API is sufficient.

mbeckerle · 2023-11-14T17:11:08Z

+1

I looked at the javadoc and it doesn't include any of the new stuff since it's not in the japi project. That's expected, and doesn't have to be fixed for this PR, but we should make sure that is addressed at some point, maybe once the drill integration is complete and we are confident that the new API is sufficient.

Yeah, we have to look into how to make the doc "hang together" despite some of the code being defined in java and some (most really) in scala, and some being in japi/sapi packages and some in runtime1/api package.

We may have to put in our own relative hyperlinks between the two bodies of documentation.

We also really need a way to document and exclude some methods/members from the API without having to have a sharp separation between things in the sapi/japi packages which are all API, and things not. We need to avoid the overhead of this proxy japi/sapi structure. It's too hard to maintain. I'd be fine with just adding to the java/scaladoc a standard comment and naming convention indicating that a method is internal not to be considered part of the API. These methods do need to be public. Even just naming them with prefix "_" and always putting them last in the trait/class/object would be ok.

This was designed to support integration of Daffodil directly (without intermediate XML or JSON or even EXI) to other data handling tools, specifically, Apache Drill. InfosetElement and related InfosetNode traits are now in runtime1.api package. InfosetOutputter now has methods which use InfosetElement and InfosetArray traits as the objects passed to the handler methods. This improves the API over making DIArray, DISimple, and DIComplex visible. (Though it is backward incompatible.) Also removed use of DIComplex, DISimple from most SAPI and JAPI tests. The SAX InfosetOutputter still downcasts somewhat to the DINode classes. Added Metadata, ElementMetadata, etc. (also runtime1.api package) which provide limited exposure to the RuntimeData and CompilerInfo information. Added MetadataHandler - which is walked by MetadataWalker which can be called from DataProcessor. Added unit test for metadata and data walking to core module Walking the runtime1 metadata is easier than walking the DSOM tree. And these data fabrics like Apache Drill are interfacing to the runtime1 specifically, not the schema compiler. It's natural for the runtime1 metadata structures and data structures to be the ones driving the interfacing. The InfosetNode types were always supposed to be the API, the DINodes the implementation. This solves the issue of what classes should show through to SAPI and JAPI about the infoset nodes. It should be the InfosetNode types, not the DINode types. Furthermore, the InfosetNode types can have methods to access the needed runtime metadata information needed by walkers, InfosetOutputters, etc. These hide our infoset implementation and runtime metdata (RuntimeData classes) implementions. Note: Nothing has changed with InfosetInputters, as those are not needed for Apache Drill integration - which is parse-only. BlobMethodMixin factored out of InfosetOutputter as a shared implementation trait for the basic blob implementation. Added features to SchemaUtils to avoid "tns" prefix definition (which is now officially frowned upon) Added isHidden to SequenceRuntimeData. Needed to avoid walking hidden elements that appear in metadata structures, but aren't relevant as they do not appear in InfosetOutputter events. Improved Infoset API access to simple type methods They now use the DFDL type names, not the underlying implementation type names. So a decimal is accessed via getDecimal not getBigDecimal. Improved java doc accordingly. They throw a predictable exception on conversion issues. DEPRECATION/COMPATIBILITY The InfosetOutputter trait methods have changed signatures. The types of the arguments have been replaced: DIArray -> InfosetArray DISimple -> InfosetSimpleElement DIComplex -> InfosetComplexElement This was done to hide the "DIxxx" types as they are internal and subject to change. Methods of DISimple that were named for implementation types (like getBigInt, getBigDecimal, etc.) have been replaced by methods named for the DFDL types. These methods are replaced: getBigDecimal -> getDecimal getBigInt -> getInteger These methods also changed names: dataValueAsString -> getText This method is new: getNonNegativeInteger Some methods have been removed (getStatus). DAFFODIL-2832

mbeckerle commented Nov 9, 2023

View reviewed changes

mbeckerle requested a review from stevedlawrence November 9, 2023 03:45

mbeckerle mentioned this pull request Nov 9, 2023

Metadata and Infoset data walking refactored mbeckerle/daffodil#55

Closed

mbeckerle commented Nov 9, 2023

View reviewed changes

mbeckerle mentioned this pull request Nov 9, 2023

WIP: Metadata walker for JAPI #1092

Closed

stevedlawrence reviewed Nov 9, 2023

View reviewed changes

mbeckerle commented Nov 11, 2023

View reviewed changes

mbeckerle commented Nov 12, 2023

View reviewed changes

mbeckerle requested a review from tuxji November 12, 2023 05:17

mbeckerle commented Nov 12, 2023

View reviewed changes

daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala Show resolved Hide resolved

stevedlawrence reviewed Nov 13, 2023

View reviewed changes

tuxji approved these changes Nov 14, 2023

View reviewed changes

stevedlawrence approved these changes Nov 14, 2023

View reviewed changes

mbeckerle force-pushed the drill-exp3 branch from 6c4283a to 6900beb Compare November 14, 2023 17:40

mbeckerle changed the title ~~Draft: Refactoring to expose clean metadata and infoset walkers~~ Refactoring to expose clean metadata and infoset walkers Nov 14, 2023

mbeckerle merged commit d26a582 into apache:main Nov 14, 2023
8 of 10 checks passed

mbeckerle deleted the drill-exp3 branch November 14, 2023 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring to expose clean metadata and infoset walkers #1112

Refactoring to expose clean metadata and infoset walkers #1112

mbeckerle commented Nov 9, 2023 •

edited

Loading

stevedlawrence left a comment

mbeckerle commented Nov 12, 2023

stevedlawrence left a comment

mbeckerle commented Nov 14, 2023

tuxji commented Nov 14, 2023

mbeckerle commented Nov 14, 2023 via email

tuxji left a comment

tuxji Nov 13, 2023

tuxji Nov 14, 2023

mbeckerle Nov 14, 2023

tuxji Nov 14, 2023

mbeckerle Nov 14, 2023

tuxji Nov 14, 2023

mbeckerle Nov 14, 2023

stevedlawrence left a comment

mbeckerle commented Nov 14, 2023

Refactoring to expose clean metadata and infoset walkers #1112

Refactoring to expose clean metadata and infoset walkers #1112

Conversation

mbeckerle commented Nov 9, 2023 • edited Loading

stevedlawrence left a comment

Choose a reason for hiding this comment

mbeckerle commented Nov 12, 2023

stevedlawrence left a comment

Choose a reason for hiding this comment

mbeckerle commented Nov 14, 2023

tuxji commented Nov 14, 2023

mbeckerle commented Nov 14, 2023 via email

tuxji left a comment

Choose a reason for hiding this comment

tuxji Nov 13, 2023

Choose a reason for hiding this comment

tuxji Nov 14, 2023

Choose a reason for hiding this comment

mbeckerle Nov 14, 2023

Choose a reason for hiding this comment

tuxji Nov 14, 2023

Choose a reason for hiding this comment

mbeckerle Nov 14, 2023

Choose a reason for hiding this comment

tuxji Nov 14, 2023

Choose a reason for hiding this comment

mbeckerle Nov 14, 2023

Choose a reason for hiding this comment

stevedlawrence left a comment

Choose a reason for hiding this comment

mbeckerle commented Nov 14, 2023

mbeckerle commented Nov 9, 2023 •

edited

Loading