-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring to expose clean metadata and infoset walkers #1112
Conversation
daffodil-core/src/test/scala/org/apache/daffodil/core/general/TestRuntimeProperties.scala
Outdated
Show resolved
Hide resolved
daffodil-core/src/test/scala/org/apache/daffodil/core/util/TestUtils.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetImpl.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetOutputter.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice changes! Main questions are surrounding if it would be possible to make the API even smaller.
daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SequenceTermRuntime1Mixin.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/infoset/Infoset.scala
Outdated
Show resolved
Hide resolved
...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JDOMInfosetOutputter.scala
Outdated
Show resolved
Hide resolved
...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/ScalaXMLInfosetOutputter.scala
Outdated
Show resolved
Hide resolved
...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLInfosetOutputterMixin.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/RuntimeData.scala
Outdated
Show resolved
Hide resolved
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/InfosetImpl.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Infoset.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Outdated
Show resolved
Hide resolved
...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JDOMInfosetOutputter.scala
Outdated
Show resolved
Hide resolved
...-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/XMLInfosetOutputterMixin.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/MetadataWalker.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/api/Metadata.scala
Outdated
Show resolved
Hide resolved
@stevedlawrence This is ready for a next review pass. I've addressed all the prior comments I believe, and cleaned up several other things. Once merged, I want to run a large schema regression test with as many schemas as possible to verify this hasn't broken things. |
daffodil-core/src/main/scala/org/apache/daffodil/core/dsom/SequenceGroup.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good. A number of comments with fresh eyes.
I think my main question is what this all looks like from a Javadoc and Java API implementation perspective, so that weird scala stuff doesn't leak into it and that all the members are visible in the doc. I think all the new MetaData and Infoset traits aren't defined anywhere where are java doc is generated, so we need to figure out how that will work. I dont' really want more proxy classes.
Maybe we can plan to drop the japi/sapi stuff, and merge that and the new stuff into a the api package and finally get rid of the proxy stuff. 4.0.0 is nearing, so maybe it's time to discuss fixing our API.
daffodil-core/src/main/scala/org/apache/daffodil/core/runtime1/SequenceTermRuntime1Mixin.scala
Outdated
Show resolved
Hide resolved
daffodil-core/src/test/scala/org/apache/daffodil/core/api/TestMetadataWalking.scala
Show resolved
Hide resolved
daffodil-core/src/test/scala/org/apache/daffodil/core/api/TestMetadataWalking.scala
Show resolved
Hide resolved
daffodil-core/src/test/scala/org/apache/daffodil/core/api/TestMetadataWalking.scala
Show resolved
Hide resolved
...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JsonInfosetOutputter.scala
Outdated
Show resolved
Hide resolved
...odil-runtime1/src/main/scala/org/apache/daffodil/runtime1/infoset/JsonInfosetOutputter.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/RuntimeData.scala
Outdated
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/RuntimeData.scala
Show resolved
Hide resolved
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/unparsers/UState.scala
Show resolved
Hide resolved
@stevedlawrence @tuxji This should be very close to ready now. |
I'll start reviewing this PR. Quick question, though. Is this new walker API intended to be used by the C code generator too, instead of the C generator walking the compiled schema by hand? |
Not intended for that use case.
This new metadata walker is for java (typically) applications to interface
to runtime1 specifically.
The c backend is more of a compiler integration.
…On Tue, Nov 14, 2023, 9:06 AM John Interrante ***@***.***> wrote:
@stevedlawrence <https://github.com/stevedlawrence> @tuxji
<https://github.com/tuxji> This should be very close to ready now. Please
give this a review again.
I'll start reviewing this PR. Quick question, though. Is this new walker
API intended to be used by the C code generator too, instead of the C
generator walking the compiled schema by hand?
—
Reply to this email directly, view it on GitHub
<#1112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALUDA2STEIZ6GIJEIZOX53YEN3G5AVCNFSM6AAAAAA7D4HOC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGI3TEMJWGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Looks good overall.
build.sbt
Outdated
@@ -219,7 +219,7 @@ val minSupportedJavaVersion: String = | |||
|
|||
lazy val commonSettings = Seq( | |||
organization := "org.apache.daffodil", | |||
version := "3.6.0", | |||
version := "3.7.0-SNAPSHOT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't belong in this PR. PR #1113 which hasn't been merged yet will make this change for us.
|
||
assertEquals(4, a.dataValue.getAnyRef) | ||
assertEquals(infoset, a.parent) | ||
assertEquals(4, a.getUnsignedLong.longValue().toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for calling .longValue()
when it seems possible to call a.getUnsignedLong.toInt
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These return java types. java.math.BigInteger didn't seem to have a direct conversion to int. Hence this.
@@ -1209,7 +1209,6 @@ public void testJavaAPI25() throws IOException, ClassNotFoundException, External | |||
assertEquals(expectedData, bos.toString()); | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change intentional and why is the test being disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a mistake. I'll add this back.
|
||
/** | ||
* This is the supportable API for access to the RuntimeData structures | ||
* which provide access to static information about a given . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About a given compiled schema object, or is there a better description than that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I looked at the javadoc and it doesn't include any of the new stuff since it's not in the japi project. That's expected, and doesn't have to be fixed for this PR, but we should make sure that is addressed at some point, maybe once the drill integration is complete and we are confident that the new API is sufficient.
Yeah, we have to look into how to make the doc "hang together" despite some of the code being defined in java and some (most really) in scala, and some being in japi/sapi packages and some in runtime1/api package. We may have to put in our own relative hyperlinks between the two bodies of documentation. We also really need a way to document and exclude some methods/members from the API without having to have a sharp separation between things in the sapi/japi packages which are all API, and things not. We need to avoid the overhead of this proxy japi/sapi structure. It's too hard to maintain. I'd be fine with just adding to the java/scaladoc a standard comment and naming convention indicating that a method is internal not to be considered part of the API. These methods do need to be public. Even just naming them with prefix "_" and always putting them last in the trait/class/object would be ok. |
This was designed to support integration of Daffodil directly (without intermediate XML or JSON or even EXI) to other data handling tools, specifically, Apache Drill. InfosetElement and related InfosetNode traits are now in runtime1.api package. InfosetOutputter now has methods which use InfosetElement and InfosetArray traits as the objects passed to the handler methods. This improves the API over making DIArray, DISimple, and DIComplex visible. (Though it is backward incompatible.) Also removed use of DIComplex, DISimple from most SAPI and JAPI tests. The SAX InfosetOutputter still downcasts somewhat to the DINode classes. Added Metadata, ElementMetadata, etc. (also runtime1.api package) which provide limited exposure to the RuntimeData and CompilerInfo information. Added MetadataHandler - which is walked by MetadataWalker which can be called from DataProcessor. Added unit test for metadata and data walking to core module Walking the runtime1 metadata is easier than walking the DSOM tree. And these data fabrics like Apache Drill are interfacing to the runtime1 specifically, not the schema compiler. It's natural for the runtime1 metadata structures and data structures to be the ones driving the interfacing. The InfosetNode types were always supposed to be the API, the DINodes the implementation. This solves the issue of what classes should show through to SAPI and JAPI about the infoset nodes. It should be the InfosetNode types, not the DINode types. Furthermore, the InfosetNode types can have methods to access the needed runtime metadata information needed by walkers, InfosetOutputters, etc. These hide our infoset implementation and runtime metdata (RuntimeData classes) implementions. Note: Nothing has changed with InfosetInputters, as those are not needed for Apache Drill integration - which is parse-only. BlobMethodMixin factored out of InfosetOutputter as a shared implementation trait for the basic blob implementation. Added features to SchemaUtils to avoid "tns" prefix definition (which is now officially frowned upon) Added isHidden to SequenceRuntimeData. Needed to avoid walking hidden elements that appear in metadata structures, but aren't relevant as they do not appear in InfosetOutputter events. Improved Infoset API access to simple type methods They now use the DFDL type names, not the underlying implementation type names. So a decimal is accessed via getDecimal not getBigDecimal. Improved java doc accordingly. They throw a predictable exception on conversion issues. DEPRECATION/COMPATIBILITY The InfosetOutputter trait methods have changed signatures. The types of the arguments have been replaced: DIArray -> InfosetArray DISimple -> InfosetSimpleElement DIComplex -> InfosetComplexElement This was done to hide the "DIxxx" types as they are internal and subject to change. Methods of DISimple that were named for implementation types (like getBigInt, getBigDecimal, etc.) have been replaced by methods named for the DFDL types. These methods are replaced: getBigDecimal -> getDecimal getBigInt -> getInteger These methods also changed names: dataValueAsString -> getText This method is new: getNonNegativeInteger Some methods have been removed (getStatus). DAFFODIL-2832
6c4283a
to
6900beb
Compare
(Supercedes PR #1092, which is still open because I linked the comments there from comments in this PR.)
Version 3.7.0-SNAPSHOT
This was designed to support integration of Daffodil directly (without intermediate XML or JSON or even EXI) to other data handling tools, specifically, Apache Drill.
InfosetElement and related InfosetNode traits are now in runtime1.api package.
InfosetOutputter now has methods which use InfosetElement and InfosetArray traits as the objects passed to the handler methods. This improves the API over making DIArray, DISimple, and DIComplex visible.
Added Metadata, ElementMetadata, etc. (also runtime1.api package) which provide limited exposure to the RuntimeData and CompilerInfo information.
Added MetadataHandler - which is walked by MetadataWalker which can be called from DataProcessor.
Walking the runtime1 metadata is easier than walking the DSOM tree. And these data fabrics like Apache Drill are interfacing to the runtime, not the compiler, so it's natural for the runtime1 metadata structures and data structures to be the ones driving the interfacing.
The InfosetNode types were always supposed to be the API, the DINodes the implementation. This solves the issue of what classes should show through to SAPI and JAPI about the infoset nodes. It should be the InfosetNode types, not the DINode types.
Furthermore, the InfosetNode types can have methods to access the needed runtime metadata information needed by walkers, InfosetOutputters, etc.
These hide our infoset implementation and runtime metdata (RuntimeData classes) implementions.
Added unit test for metadata and data walking to core module
Note: Nothing has changed with InfosetInputters, as those are not needed for Apache Drill integration - which is parse-only.
Refactored InfosetOutputter to enable sharing InfosetOutputterImpl.
BlobMethodImpl is now a class so must be extended first, InfosetOutputter is a trait so is implemented (second) in Java declarations.
Removed use of DIComplex, DISimple from SAPI and JAPI tests
NodeInfo.PrimType is n.g. for Java access.
Using PrimTypeNode which is a class.
TODO: maybe just use type name strings for this.
Added features to SchemaUtils to avoid "tns" prefix definition (which is now officially frowned upon)
Added isHidden to SequenceRuntimeData. Needed to avoid walking hidden elements that appear in metadata structures, but aren't relevant as they do not appear in InfosetOutputter events.
TODO: ensure these are not shared between hidden/non-hidden groups. (PropEnv stuff)
That is: isHidden on SequenceRuntimeData may be broken in the case where a sequence definition is shared both hidden and non-hidden.
Added child access to InfosetComplexElement. This is used by tests.
It is questionable whether this should be allowed, because we don't have fast O(1) hash-based access to all children, only to children referenced by expressions. This allows linear search for now.
Added QName.toPrefix_localName method. Why allocate these on the fly if they're static compile time constants.
DAFFODIL-2832