Introduce @flatten annotation #642

nox213 · 2024-12-01T12:51:15Z

closes #623

This PR introduces the @flatten annotation.

Usage

The @flatten annotation can only be applied to:

case classes: Flatten fields of a nested case class into the parent structure.

case class A(i: Int, @flatten b: B)
case class B(msg: String)
implicit val rw: ReadWriter[A] = macroRW
implicit val rw: ReadWriter[B] = macroRW
write(A(1, B("Hello"))) // {"i":1, "msg": "Hello"}

Iterable: Flatten key-value pairs of a Iterable[(String, _)] into the parent structure.

case class A(i: Int, @flatten map: Map[String, String])
implicit val rw: ReadWriter[A] = macroRW
val map = Map("a" -> "1", "b" -> "2")
write(A(1, map)) // {"i":1, "a":"1", "b": "2"}

Nested flattening allows you to apply the @flatten annotation recursively to fields within nested case classes.

case class Outer(msg: String, @flatten inner: Inner)
case class Inner(@flatten inner2: Inner2)
case class Inner2(i: Int)

implicit val rw: ReadWriter[Inner2] = macroRW
implicit val rw: ReadWriter[Inner] = macroRW
implicit val rw: ReadWriter[Outer] = macroRW

write(Outer("abc", Inner(Inner2(7)))) // {"msg": "abc", "i": 7}

The Reader also recognizes the @flatten annotation.

case class A(i: Int, @flatten b: B)
case class B(msg: String)
implicit val rw: ReadWriter[A] = macroRW
implicit val rw: ReadWriter[B] = macroRW
read("""{"i": 1, "msg": "Hello"}""") // The top-level field "msg": "Hello" is correctly mapped to the field in B.

For collection, during deserialization, all key-value pairs in the JSON that do not directly map to a specific field in the case class are attempted to be stored in the Map.

If a key in the JSON does not correspond to any field in the case class, it is stored in the collection.

case class A(i: Int,@flatten Map[String, String])
implicit val rw: ReadWriter[A] = macroRW
read("""{"i":1, "a" -> "1", "b" -> "2"}""") // Output: A(1, Map("a" -> "1", "b" -> "2"))

If there are no keys in the JSON that can be stored in the collection, it is treated as an empty collection.

read("""{"i":1}""")
// Output: A(1, Map.empty)

If a key’s value in the JSON cannot be converted to the Map’s value type (e.g., String), the deserialization fails.

read("""{"i":1, "a":{"name":"foo"}}""")
// Error: Failed to deserialize because the value for "a" is not a String, as required by Map[String, String].

Limitations

Flattening more than two collections to a same level is not supported. Flattening multiple collections to a same level feels awkward to support because, when deriving a Reader, it becomes unclear which collection the data should be stored in.
Type parameters do not seem to be properly resolved in the following scenario:

case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T: RW]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}

When using the @flatten annotation on a Iterable, the type of key must be String.

Implementation Strategy

Writer Derivation

From my understanding, deriving a Writer for a case class involves implementing a dispatch function that iterates through each field of the case class. In the existing implementation, the Writer is generated by processing the information of each field in the type T being derived.

this.writeSnippetMappedName[R, Int](ctx, upickle.default.objectAttributeKeyWriteMap("i"), implicitly[upickle.default.Writer[Int]], v.i);
this.writeSnippetMappedName[R, upickle.Nested](ctx, upickle.default.objectAttributeKeyWriteMap("n"), implicitly[upickle.default.Writer[upickle.Nested]], v.n);
ctx.visitEnd(-1)

When deriving a Writer, the above snippet shows how the visit function is invoked for each field, passing the corresponding field’s Writer as an argument. If the field is a Map or a case class and needs to be flattened, additional processing is required:

For case classes, instead of delegating to the Writer of the nested case class, the visit function should be called directly for each field in the nested case class. For example:

this.writeSnippetMappedName[R, upickle.Int](ctx, upickle.default.objectAttributeKeyWriteMap("integerValue"), implicitly[upickle.default.Writer[upickle.Int]], v.n.integerValue);

For Map, iterate through all key-value pairs in the Map, calling visit for each pair:

mapField.foreach { case (key, value) =>
  this.writeSnippetMappedName[R, $valueType](
    ctx,
    key.toString,
    implicitly[${c.prefix}.Writer[$valueType]],
    value
  )
}

Reader Derivation

Deriving a Reader is more complex, especially with support for Map, as it introduces several edge cases. A Reader is essentially a Visitor for JSON, as I understand it. The process involves three main steps:

Mapping read keys to indices.
Storing the read values in variables corresponding to the indices.
Constructing the case class instance from the read values after traversal.

To support flattening, additional steps were required in these stages:

Mapping Keys to Indices:
Flattened fields must be accounted for in the key-to-index mapping. For example:

case class A(@flatten b: B)
case class B(i: Int, d: Double)

// Without flattening
key match {
  "b" => 0
  _ => -1
}

// With flattening
key match {
  "i" => 0
  "d" => 1
  _ => -1
}

Allocate storage for flattened fields, similar to step 1, but extended to account for flattened structures.
Modify the class construction logic to recursively handle flattened fields.

Special Case for Maps:

Since Map must capture all key-value pairs not corresponding to a specific field, I implemented logic to handle this:
- If a key’s index is -1 an type T we derive has a flatten annotation on map, the key-value pair is stored in a ListBuffer within the Reader.
- These pairs are used during class construction to populate the Map.

# Conflicts: # upickle/test/src/upickle/MacroTests.scala # upickleReadme/Readme.scalatex

upickle/implicits/src-2/upickle/implicits/internal/Macros.scala

lihaoyi · 2024-12-02T10:37:16Z

@nox213 this looks great. Could you update the PR description with a summary of the implemented semantics, with examples, as well as a summary of the implementation strategy? That would make the PR much easier to review and merge

nox213 · 2024-12-02T14:43:23Z

Sure, I’ll tag you when I’m ready.

upickle/implicits/src-2/upickle/implicits/internal/Macros.scala

nox213 · 2024-12-03T12:45:58Z

@lihaoyi I polished the code a little and updated the PR.

lihaoyi · 2024-12-05T01:44:24Z

@nox213 at I high level, would it be possible to implement this without the case class Field refactor? If we could preserve the existing method signatures mostly-in-place, we could add forwarders and maintain binary compatibility which would allow us to release this without a breaking major version

lihaoyi · 2024-12-05T01:51:43Z

I guess more generally I'd like to see whether we can land this without breaking binary compatibility. Breaking bincompat and releasing Mill 5.0.0 is an option, but it would be better if we didn't need to do that

lihaoyi · 2024-12-05T01:57:09Z

I think this could work:

currentKey/storeToMap variables can be made binary compatible by moving them from the upstream trait into the various downstream impls
The macro changes could be made binary compatible by leaving the existing macros in place but deprecated, and just making a Macros2 class containing all the new implementations

lihaoyi · 2024-12-05T02:01:48Z

For the Scala 2 vs Scala 3 approach, could we use the Scala 3 approach in Scala 2 as well? It's best to keep the two implementations mostly consistent, so future changes can be done more easily across both versions

lihaoyi · 2024-12-05T02:05:23Z

case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}

Can this be fixed by making it implicit def rw[T: RW]?

lihaoyi · 2024-12-05T02:14:49Z

When using the @flatten annotation on a Map, the type of key must be String. (Allowing user-defined case classes as keys introduces a high risk of key conflicts, so I’ve restricted it to String.) Also, only immutable.Map is supported. This makes derivation of Reader simpler.

@flatten is only allowed to immutable.Map. This makes derivation of Reader simpler.

How hard would it be to generalize this to other collection types and key types?

One use case for other collection types would be to support Seq[(String, T)] or LinkedHashMap[String, T] for cases where people want to preserve ordering
uPickle already supports non-String map keys via https://com-lihaoyi.github.io/upickle/#JSONDictionaryFormats, so it would be nice if we could be consistent here as well when flattening out maps

lihaoyi · 2024-12-05T02:31:01Z

Left some high level comments. It's a pretty invasive change so I expect it will take some time to discuss the changes, first at a high-level and then down to the lines of code being changed, just to set expectations. But overall the approach looks solid

nox213 · 2024-12-05T12:58:47Z

@lihaoyi Thanks for your review.

Can this be fixed by making it implicit def rw[T: RW]?

I'm not sure whether it's possible. I will investigate more on it.

How hard would it be to generalize this to other collection types and key types?

I believe this is doable. For other collection types, I could store the type of collection that is flatten and then use that type to construct it from Reader. For keys, I didn't do it because I thought that's not the use cases. I will try this.

For the Scala 2 vs Scala 3 approach, could we use the Scala 3 approach in Scala 2 as well? It's best to keep the two implementations mostly consistent, so future changes can be done more easily across both versions

Sure, I will use the Scala 3 approach in Scala 2 (no case class Field).

nox213 · 2024-12-05T13:03:12Z

The challenging part seems to be bincompat. Would the best approach be to implement @flatten without changing the existing function signatures then? And if changing the function signatures is unavoidable, the next best option would be to keep the existing functions (marking them as deprecated) and implement separate macros (e.g., a new Macro class in Scala 2). In this case, would it make sense to introduce a conditional branch in specific functions to decide whether to call the deprecated macro or the new one? For example, determining this based on the presence of the @flatten annotation. Did I understand this correctly?

One thing I’m curious about: as I understand it, macro code is only called internally, so I assumed it would be safe in terms of bincompat. However, can even functions that are not called externally impact bincompat?

lihaoyi · 2024-12-05T13:51:50Z

For bincompat, we could leave the existing Macros file completely untouched and just make a new copy Macros2 that we use going forward. No need to conditionally branch on the presence of @flatten, the new code works great. We just need to leave the old code in place for bincompat in case anyone downstream is using it (probably nobody, but good to be careful in preserving bincompat anyway)

lihaoyi · 2024-12-05T13:54:48Z

I believe this is doable. For other collection types, I could store the type of collection that is flatten and then use that type to construct it from Reader. For keys, I didn't do it because I thought that's not the use cases. I will try this.

I think what you would need to do is take the type of the collection, resolve the Reader for it, and use that Reader instance at runtime in order to construct the collection from the individual key-value pairs

For keys, I think you should be able to do the same thing: take the type, resolve the Reader, and use the Reader to construct the key objects from the JSON keys and then pass those to the collection Reader

nox213 · 2024-12-06T01:13:10Z

Thanks for your explanation and tips.

To summarize the direction I need to take for the changes:

For Scala 2, keep the existing Macros.scala unchanged and create a new Macros2.scala to apply the modifications (for bincompat), and the new code will use Macros2.scala.
Move the definitions of currentKey and storeToMap from the upstream trait to the downstream implementation (for bincompat).
Update the implementation to support flattening for collection types other than immutable.Map.
Allow Map to have key types other than String.
For the Scala 2 macro implementation, aim to follow a design as similar as possible to Scala 3, without using case class Field.

In the existing Scala 2 implementation, all the information necessary for derivation (e.g., symbols, types, mappedName, etc.) is generated upfront and passed to wrapCaseN for construction. That's why I introduced case class Field, so that I can pass all the information to wrapCaseN. If I were to adopt an approach similar to Scala 3, wrapCaseN would instead start with just the type being derived and then call a function to fetch the field information for that type.

Is this the right direction to proceed?

Lastly, for Macros2.scala, would it be better to apply private or [package] private as much as possible if those restrictions can be applied?

lihaoyi · 2024-12-06T01:21:46Z

@nox213 yes that's right. Making the new macros package private would indeed help, though some of the supporting runtime APIs may need to remain public so we won't be able to privatize everything

nox213 · 2024-12-06T02:26:30Z

Okay, I will work on it.

wip make scala2 works Delete some files Add more tests more tests scala3 writer fix polish clean up Add helper function scala3 macro readers fix bug test pass polish polish wip fix remove unused Use ListBuffer to preserve orders polish Keep existing macros for bincompat Keep formatting

nox213 · 2024-12-16T13:51:14Z

What I tried was that

val writeRef = Ref(Symbol.requiredMethod("upickle.default.write"))
          (keyTpe0.asType, valueTpe0.asType, keyWriterTpe0.asType, valueWriterTpe0.asType) match {
            case ('[keyTpe], '[valueTpe], '[keyWriterTpe], '[valueWriterTpe]) =>
              val typeAppliedWrite = TypeApply(writeRef, List(TypeTree.of[keyTpe]))
              val snippet =
                if (keyTpe0 =:= TypeRepr.of[String])
                 ... // be ommitted
                else
                  '{
                      ${select.asExprOf[Iterable[(keyTpe, valueTpe)]]}.foreach { (k, v) =>
                        ${self}.writeSnippetMappedName[R, valueTpe](
                          ${ctx},
                          ${Apply(
                              Apply(
                                typeAppliedWrite,
                                List('{k}.asTerm, '{-1}.asTerm, '{false}.asTerm, '{false}.asTerm)
                              ),
                              List(
                                '{summonInline[keyWriterTpe]}.asTerm
                              )
                            )
                            .asExprOf[String]},
                          summonInline[valueWriterTpe],
                          v,
                        )
                      }
                    }

I used requiredMethod to access the method symbol out of scope and I think I managed to generate a code that calls it, but the error was that

[error]     |               Found:    (upickle.Flatten.KeyClass.rw :
[error]     |                 upickle.default.ReadWriter[upickle.Flatten.KeyClass])
[error]     |               Required: Api.this.Writer[upickle.Flatten.KeyClass]

Those two types are treated as different type because their path is different. I got stuck at this point.

…nto feature/flatten

nox213 · 2024-12-18T13:31:02Z

@lihaoyi

case class Param[T](@flatten t: T)
object Param {
   implicit def rw[T]: RW[Param[T]] = upickle.default.macroRW // compile error when this function is called to derive instance
  implicit val rw[SomeClass]: RW[Param[SomeClass]] = upickle.default.macroRW // works
}

Can this be fixed by making it implicit def rw[T: RW]?

I misunderstood your comment. I tested with your fix but it still doesn't work.

  object FlattenTestWithType {
    implicit def rw[T: RW]: RW[FlattenTestWithType[T]] = upickle.default.macroRW // compile error
  }

I’ve pushed the changes to support collections other than Map (iterable flattening is now possible) so far. a07839d

jodersky · 2024-12-19T22:23:36Z

I just saw your ping now. Let me know if there's still something I can help with

nox213 · 2024-12-20T00:35:35Z

@jodersky Thanks, could you check these comment, comment?
Basically, I tried to make flatten annotation works for collections that has non-string key. To support it, I need to generate code that calls functions defined in upickle.default which lie outside the scope from macros.scala.

nox213 · 2024-12-26T12:10:29Z

@lihaoyi

supporting collections with non-string key

Would you consider merging this PR without the feature for now? The PR is already quite large, and if we come up with a good solution later, we could address it in a separate follow-up.

lihaoyi · 2024-12-26T13:06:03Z

Sure! Would be happy to merge it first

Give me some time to properly review it. Still probably need @jodersky 's help to look through some of the scala 3 macro stuff, and I'll give it a pass as well

lihaoyi · 2024-12-27T09:13:43Z

I read through and think it looks reasonable. I don't have any confidence I'll be able to catch subtle bugs in the implementation, but we have the existing test suite for that so hopefully this doesn't break anything

One main request: can we have some tests for the edge cases and failure modes of @flatten? Some I can think of:

What if the flattened field itself contains a field which collides with the enclosing case class? e.g.

case class Foo(x: Int)
case class Bar(x: String, @flatten foo: Foo)

Can we flatten two fields in a single case class list?
case class Foo(x: Int)
case class Bar(y: String)
case class Qux(@flatten foo: Foo, @flatten bar: Bar)
If we flatten two fields in a case class, what if they have fields which collide?

case class Foo(x: Int)
case class Bar(x: String)
case class Qux(@flatten foo: Foo, @flatten bar: Bar)

What if we @flatten a field whose type type that cannot be flattened?

case class Foo(@flatten x: Int)

What if we @flatten a collection of invalid type?

case class Foo(@flatten x: Seq[(Seq[Int], Int)])

I don't really have any opinion on the exact semantics in these situations, but regardless of what we choose we should have unit tests asserting their success or their failure and error message (these can go in upickle/test/src/upickle/FailureTests.scala)

nox213 · 2024-12-28T07:57:42Z

What if the flattened field itself contains a field which collides with the enclosing case class?
If we flatten two fields in a case class, what if they have fields which collide?

It fails with following errors. Before process the flatten annotation, macro checks whether there are duplicated mappedName.

[error] /Users/devsisters/repos/upickle/upickle/test/src/upickle/MacroTests.scala:225:48: There are multiple fields with the same key.
[error] Following keys are duplicated: x.
[error]     implicit val rw: RW[Bar] = upickle.default.macroRW
[error]                                                ^
[error] one error found

Can we flatten two fields in a single case class list?

Yes, this is possible, but my implementation only supports flattening multiple case classes and a single collection at a time.

What if we @flatten a field whose type that cannot be flattened?

[error] /Users/devsisters/repos/upickle/upickle/test/src/upickle/MacroTests.scala:221:48: Invalid type for flattening: Int.
[error]     implicit val rw: RW[Foo] = upickle.default.macroRW
[error]                                                ^
[error] one error found

What if we @flatten a collection of invalid type?

This also fails with an same error above.

I added a few more tests to cover the parts that were not addressed by the tests I had previously created (2eb1ab8).

nox213 · 2024-12-28T08:24:47Z

@lihaoyi While adding some tests, I thought of a case where a collection could lead to runtime collisions. For example:

case class Foo(i: Int, @flatten m: Map[String, String])

If m = Map("i" -> "value"), there would be a key collision, but this can only be detected at runtime. Referring to the behavior of the Rust library linked in the issue, here’s how it handles such cases:
When writing, it generates JSON like this:

{"i": 1, "i": "value"}

And when reading it back, it fails with an error like:

Error("duplicate field `i`", line: 1, column: 16)

On the other hand, if the map field appears first, like this:

case class Foo(@flatten m: Map[String, String], i: Int)

The generated JSON would look like this:

{"i": "value", "i": 1}

In this case, when trying to read it back, it would result in a runtime error while attempting to convert "value" to an integer.

In upickle, it behaves almost the same except for the first case.
When it reads,

{"i": 1, "i": "value"}

my implementation produces Foo(1, Map.empty).
However, making it raise an error when there are duplicate keys at runtime seems doable.

Do you think it would be better to raise an error in such cases rather than silently succeeding?
A runtime key collision can cause an error when reading a value and attempting a type conversion. Would it be acceptable to keep this behavior? I currently can’t think of a better way to handle it.

lihaoyi · 2024-12-28T08:36:47Z

I think the current behavior for the case you mentioned is reasonable, but would it be possible to make it raise an error on write? That would probably be more intuitive to users than generating JSON a dictionary with duplicate keys

nox213 · 2024-12-28T11:11:21Z

I made it raise an error on write and added a failure test case as well (e4071f6)

X upickle.MacroTests.runtimeCollision 26ms 
  java.lang.Exception: Key collision detected for the following keys: i, s
    upickle.Flatten$Foo$$anon$67.write0(MacroTests.scala:221)
    upickle.Flatten$Foo$$anon$67.write0(MacroTests.scala:221)
    upickle.core.Types$Writer.write(Types.scala:143)

samikrc · 2025-01-01T06:18:16Z

I have limited experience in Scala 3 macros, so probably not a great reviewer of this issue, however posing some general questions:

There seem to be a few classnames like this: CaseClassReader3V2 etc? Would it be better to change the code for the existing class itself? The git system maintains our versions, so we do not need to introduce V2 etc?
The new method CaseClassReader3 starts with a @deprecated tag. Is that expected?

nox213 · 2025-01-03T07:23:13Z

There seem to be a few classnames like this: CaseClassReader3V2 etc? Would it be better to change the code for the existing class itself? The git system maintains our versions, so we do not need to introduce V2 etc?

It's a temporary name that will be changed to CaseClassReader3 later to maintain binary compatibility.

The new method CaseClassReader3 starts with a @deprecated tag. Is that expected?

It's a class that existed before which is replaced by CaseClassReader3V2

jodersky

The Scala 3 macro code looks reasonable. I also appreciate that you cleaned up a bunch of stuff and restricted visibilities of internal methods! I do have some comments on them however, mostly minor issues and questions to help my own understanding.

I also second Haoyi, in that I can't guarantee that there aren't any subtle bugs, but we can use the test suite for that. Speaking of, do we have some kind of performance tests we can use to weed out regressions? From what I see in the code, most flatten-specific macro parts are behind a flag (if the annotation is present), so I don't think there will be any regressions, but it's always good to test this stuff.

Regarding supporting of non-string keys: I think it's probably best to tackle that in a separate PR later on too; this one is already quite large.

To your specific question, @nox213

'{
                ${select.asExprOf[Iterable[(keyTpe, valueTpe)]]}.foreach { (k, v) =>
                  ${self}.writeSnippetMappedName[R, valueTpe](
                    ${ctx},
                    upickle.default.write(key)(summonInline[keyWriterTpe]), // this doesn't compile
                    summonInline[writerTpe],
                    v,
                  )
                }
              }

one thing you could try is take the outer "API trait" as a parameter (similar to what is done in writeSnippets(..., inline thisOuter: upickle.core.Types with upickle.implicits.MacrosCommon...)), and use some typecasting if necessary.

Maybe something like

def foo(api: Expr[upickle.core.Types with upickle.implicits.MacrosCommon]) =
  ${outer}.write(key.asInstanceOf[Any])(summonInline[keyWriterTpe])

jodersky · 2025-01-05T16:19:44Z