Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Avro] Add logicalType support for some java.time types; add AvroJavaTimeModule for native ser/deser #283

30 changes: 30 additions & 0 deletions avro/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,36 @@ byte[] avroData = mapper.writer(schema)

and that's about it, for now.

## Java Time Support
Serialization and deserialization support for limited set of `java.time` classes to Avro with [logical type](http://avro.apache.org/docs/current/spec.html#Logical+Types) is provided by `AvroJavaTimeModule`.

cowtowncoder marked this conversation as resolved.
Show resolved Hide resolved
```java
AvroMapper mapper = AvroMapper.builder()
.addModule(new AvroJavaTimeModule())
.build();
```

#### Note
Please note that time zone information is at serialization. Serialized values represent point in time,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "is at serialization" instead be "is not included at serialization" (or something like that)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right,
I meant "is lost at serialization"

independent of a particular time zone or calendar. Upon reading a value back time instant is reconstructed but not the original time zone.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of OffsetDateTime and ZonedDateTime, aren't they deserialized using whatever defaultZoneId is passed to AvroInstantDeserializer.fromLong()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is correct.
They are deserialized using whatever `defaultZoneId1 is passed to AvroInstantDeserializer.fromLong()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to say something like, "Note that time zone & offset information is not serialized—the serialized representation is only a point in time. For local time types (OffsetDateTime and ZonedDateTime) the time zone defaultZoneId in the Avro context is used for converting to & from the serialized representation. The same defaultZoneId must be used at serialization & deserialization time to obtain meaningful results."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having written that, the behavior feels weird to me. Would it be possible to store the offset/zoneId for the OffsetDateTime and ZonedDateTime types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, zoned things absolutely should retain time zone and/or offset, and changing that to something else feels very much Wrong.

For local variants it may be necessary to do interim binding if (but only if) representation uses a fixed timepoint (like long for "milliseconds since 1. 1. 1970") -- but if so, it should be something really fixed like UTC and not whatever user happens to configure (because "default" zone id would otherwise have to match on writer and reader).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cowtowncoder

To me, zoned things absolutely should retain time zone and/or offset, and changing that to something else feels very much Wrong.

Avro specification does not aim to preserve time zone for non local-XYZ logical types.

For correct deserialization into OffsetDateTime or ZonedDateTime reader and writer have to agree on time zone.

Support of OffsetDateTime and ZonedDateTime is here for convenience - I can drop it.

For local variants it may be necessary to do interim binding if (but only if) representation uses a fixed timepoint (like long for "milliseconds since 1. 1. 1970") -- but if so, it should be something really fixed like UTC and not whatever user happens to configure (because "default" zone id would otherwise have to match on writer and reader).

For local variants, contextual time zone is not used at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalFoksa Ok, I think I better go over the changes once again & try to find what Avro specification says. Although handling of local/zoned types has expected semantics in Java 8, I vaguely recall Avro proscribing behvavior that seemed to differ... and I think for interoperability the letter of Avro spec should usually have precedence (even if I disagreed with how it was defined).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after reading and re-reading what Avro spec says about timestamp, regular and local, I have no idea what are those supposed to mean -- it seems nonsense to be blunt.

Not the part about physical storage itself (although why on earth are there separate milli- vs micro-second types?) but ... well, if NEITHER stores timezone information NOR is there ANY WAY to sync send/receiver zones, then... there seems to be no actual reason for 2 types. At all. I mean, timestamp in this sense can NEITHER be local (it is concrete physical time offset) NOR non-local (no time offset or time zone!). It is much like java.util.Date yet named in a confusing way, using 4 different but rather non-distinct types.
What a mess!

But to try to untangle the mess I guess there is only the one question of how would read and write operations handle these differently.

On writing side there probably cannot be any difference: physical timestamp is what it is. Whatever "local" timezone could be thought to be does not matter; change of zone/offset would not change that value.

On reading side timezone/offset is sort of arbitrary as well: value itself is concrete, although we can use whatever zone we might want.
I guess use of "context timezone" could make sense for "local" variant? For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it.

@MichalFoksa WDYT? Apologies for this taking long -- but I have some time now and will get this merged during this week :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess use of "context timezone" could make sense for "local" variant?

"local" variants do not contain time zone.

For non-local I don't have strong opinion: either UTC or context timezone would be fine as far as I see it

I would leave it on user.

Yeah, Avro is Avro ... :) But it is not bad.


#### Supported java.time types:

Supported java.time types with Avro schema.

| Type | Avro schema
| ------------------------------ | -------------
| `java.time.OffsetDateTime` | `{"type": "long", "logicalType": "timestamp-millis"}`
| `java.time.ZonedDateTime` | `{"type": "long", "logicalType": "timestamp-millis"}`
| `java.time.Instant` | `{"type": "long", "logicalType": "timestamp-millis"}`
| `java.time.LocalDate` | `{"type": "int", "logicalType": "date"}`
| `java.time.LocalTime` | `{"type": "int", "logicalType": "time-millis"}`
| `java.time.LocalDateTime` | `{"type": "long", "logicalType": "local-timestamp-millis"}`

#### Precision

Avro supports milliseconds and microseconds previsions for date and time related logicalType(s). Only the milliseconds precision is supported.
MichalFoksa marked this conversation as resolved.
Show resolved Hide resolved

## Generating Avro Schema from POJO definition

Ok but wait -- you do not have to START with an Avro Schema. This module can
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package com.fasterxml.jackson.dataformat.avro.jsr310;

import com.fasterxml.jackson.databind.module.SimpleModule;
import com.fasterxml.jackson.dataformat.avro.PackageVersion;
import com.fasterxml.jackson.dataformat.avro.jsr310.deser.AvroInstantDeserializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.deser.AvroLocalDateDeserializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.deser.AvroLocalDateTimeDeserializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.deser.AvroLocalTimeDeserializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.ser.AvroInstantSerializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.ser.AvroLocalDateSerializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.ser.AvroLocalDateTimeSerializer;
import com.fasterxml.jackson.dataformat.avro.jsr310.ser.AvroLocalTimeSerializer;

import java.time.Instant;
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;
import java.time.OffsetDateTime;
import java.time.ZonedDateTime;

/**
* A module that installs a collection of serializers and deserializers for java.time classes.
*/
public class AvroJavaTimeModule extends SimpleModule {

private static final long serialVersionUID = 1L;

public AvroJavaTimeModule() {
super(AvroJavaTimeModule.class.getName(), PackageVersion.VERSION);

addSerializer(Instant.class, AvroInstantSerializer.INSTANT);
addSerializer(OffsetDateTime.class, AvroInstantSerializer.OFFSET_DATE_TIME);
addSerializer(ZonedDateTime.class, AvroInstantSerializer.ZONED_DATE_TIME);
addSerializer(LocalDateTime.class, AvroLocalDateTimeSerializer.INSTANCE);
addSerializer(LocalDate.class, AvroLocalDateSerializer.INSTANCE);
addSerializer(LocalTime.class, AvroLocalTimeSerializer.INSTANCE);

addDeserializer(Instant.class, AvroInstantDeserializer.INSTANT);
addDeserializer(OffsetDateTime.class, AvroInstantDeserializer.OFFSET_DATE_TIME);
addDeserializer(ZonedDateTime.class, AvroInstantDeserializer.ZONED_DATE_TIME);
addDeserializer(LocalDateTime.class, AvroLocalDateTimeDeserializer.INSTANCE);
addDeserializer(LocalDate.class, AvroLocalDateDeserializer.INSTANCE);
addDeserializer(LocalTime.class, AvroLocalTimeDeserializer.INSTANCE);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.deser;

import java.time.Instant;
import java.time.OffsetDateTime;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.time.temporal.Temporal;
import java.util.function.BiFunction;

/**
* Deserializer for variants of java.time classes (Instant, OffsetDateTime, ZonedDateTime) from an integer value.
*
* Deserialized value represents an instant on the global timeline, independent of a particular time zone or
* calendar, with a precision of one millisecond from the unix epoch, 1 January 1970 00:00:00.000 UTC.
* Time zone information is lost at serialization. Time zone data types receives time zone from deserialization context.
*
* Deserialization from string is not supported.
*
* @param <T> The type of a instant class that can be deserialized.
*/
public class AvroInstantDeserializer<T extends Temporal> extends AvroJavaTimeDeserializerBase <T> {

private static final long serialVersionUID = 1L;

public static final AvroInstantDeserializer<Instant> INSTANT =
new AvroInstantDeserializer<>(Instant.class, (instant, zoneID) -> instant);

public static final AvroInstantDeserializer<OffsetDateTime> OFFSET_DATE_TIME =
new AvroInstantDeserializer<>(OffsetDateTime.class, OffsetDateTime::ofInstant);

public static final AvroInstantDeserializer<ZonedDateTime> ZONED_DATE_TIME =
new AvroInstantDeserializer<>(ZonedDateTime.class, ZonedDateTime::ofInstant);

protected final BiFunction<Instant, ZoneId, T> fromInstant;

protected AvroInstantDeserializer(Class<T> supportedType, BiFunction<Instant, ZoneId, T> fromInstant) {
super(supportedType);
this.fromInstant = fromInstant;
}

@Override
protected T fromLong(long longValue, ZoneId defaultZoneId) {
/**
* Number of milliseconds, independent of a particular time zone or calendar,
* from 1 January 1970 00:00:00.000 UTC.
*/
return fromInstant.apply(Instant.ofEpochMilli(longValue), defaultZoneId);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.deser;

import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.DeserializationContext;
import com.fasterxml.jackson.databind.deser.std.StdScalarDeserializer;
import com.fasterxml.jackson.databind.type.LogicalType;

import java.io.IOException;
import java.time.ZoneId;

public abstract class AvroJavaTimeDeserializerBase<T> extends StdScalarDeserializer<T> {

protected AvroJavaTimeDeserializerBase(Class<T> supportedType) {
super(supportedType);
}

@Override
public LogicalType logicalType() {
return LogicalType.DateTime;
}

@SuppressWarnings("unchecked")
@Override
public T deserialize(JsonParser p, DeserializationContext context) throws IOException {
final ZoneId defaultZoneId = context.getTimeZone().toZoneId().normalized();
switch (p.getCurrentToken()) {
case VALUE_NUMBER_INT:
return fromLong(p.getLongValue(), defaultZoneId);
}
return (T) context.handleUnexpectedToken(_valueClass, p);
MichalFoksa marked this conversation as resolved.
Show resolved Hide resolved
}

protected abstract T fromLong(long longValue, ZoneId defaultZoneId);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.deser;

import java.time.LocalDate;
import java.time.ZoneId;

/**
* Deserializer for {@link LocalDate} from and integer value.
*
* Deserialized value represents number of days from the unix epoch, 1 January 1970.
*
* Deserialization from string is not supported.
*/
public class AvroLocalDateDeserializer extends AvroJavaTimeDeserializerBase<LocalDate> {

private static final long serialVersionUID = 1L;

public static final AvroLocalDateDeserializer INSTANCE = new AvroLocalDateDeserializer();

protected AvroLocalDateDeserializer() {
super(LocalDate.class);
}

@Override
protected LocalDate fromLong(long longValue, ZoneId defaultZoneId) {
/**
* Number of days from the unix epoch, 1 January 1970..
*/
return LocalDate.ofEpochDay(longValue);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.deser;

import java.time.Instant;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.ZoneOffset;

/**
* Deserializer for {@link LocalDateTime} from an integer value.
*
* Deserialized value represents timestamp in a local timezone, regardless of what specific time zone
* is considered local, with a precision of one millisecond from 1 January 1970 00:00:00.000.
*
* Deserialization from string is not supported.
*/
public class AvroLocalDateTimeDeserializer extends AvroJavaTimeDeserializerBase<LocalDateTime> {

private static final long serialVersionUID = 1L;

public static final AvroLocalDateTimeDeserializer INSTANCE = new AvroLocalDateTimeDeserializer();

protected AvroLocalDateTimeDeserializer() {
super(LocalDateTime.class);
}

@Override
protected LocalDateTime fromLong(long longValue, ZoneId defaultZoneId) {
/**
* Number of milliseconds in a local timezone, regardless of what specific time zone is considered local,
* from 1 January 1970 00:00:00.000.
*/
return LocalDateTime.ofInstant(Instant.ofEpochMilli(longValue), ZoneOffset.ofTotalSeconds(0));
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.deser;

import java.time.LocalTime;
import java.time.ZoneId;

/**
* Deserializer for {@link LocalTime} from an integer value.
*
* Deserialized value represents time of day, with no reference to a particular calendar,
* time zone or date, where the int stores the number of milliseconds after midnight, 00:00:00.000.
*
* Deserialization from string is not supported.
*/
public class AvroLocalTimeDeserializer extends AvroJavaTimeDeserializerBase<LocalTime> {

private static final long serialVersionUID = 1L;

public static final AvroLocalTimeDeserializer INSTANCE = new AvroLocalTimeDeserializer();

protected AvroLocalTimeDeserializer() {
super(LocalTime.class);
}

@Override
protected LocalTime fromLong(long longValue, ZoneId defaultZoneId) {
/**
* Number of milliseconds, with no reference to a particular calendar, time zone or date, after
* midnight, 00:00:00.000.
*/
return LocalTime.ofNanoOfDay(longValue * 1000_000L);
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
package com.fasterxml.jackson.dataformat.avro.jsr310.ser;

import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.JavaType;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.SerializerProvider;
import com.fasterxml.jackson.databind.jsonFormatVisitors.JsonFormatVisitorWrapper;
import com.fasterxml.jackson.databind.jsonFormatVisitors.JsonIntegerFormatVisitor;
import com.fasterxml.jackson.databind.ser.std.StdScalarSerializer;

import java.io.IOException;
import java.time.Instant;
import java.time.OffsetDateTime;
import java.time.ZonedDateTime;
import java.time.temporal.Temporal;
import java.util.function.Function;

/**
* Serializer for variants of java.time classes (Instant, OffsetDateTime, ZonedDateTime) into long value.
*
* Serialized value represents an instant on the global timeline, independent of a particular time zone or
* calendar, with a precision of one millisecond from the unix epoch, 1 January 1970 00:00:00.000 UTC.
* Please note that time zone information gets lost in this process. Upon reading a value back, we can only
* reconstruct the instant, but not the original representation.
*
* Note: In combination with {@link com.fasterxml.jackson.dataformat.avro.schema.DateTimeVisitor} it aims to produce
* Avro schema with type long and logicalType timestamp-millis:
* {
* "type" : "long",
* "logicalType" : "timestamp-millis"
* }
*
* {@link AvroInstantSerializer} does not support serialization to string.
*
* @param <T> The type of a instant class that can be serialized.
*/
public class AvroInstantSerializer<T extends Temporal> extends StdScalarSerializer<T> {

private static final long serialVersionUID = 1L;

public static final AvroInstantSerializer<Instant> INSTANT =
new AvroInstantSerializer<>(Instant.class, Function.identity());

public static final AvroInstantSerializer<OffsetDateTime> OFFSET_DATE_TIME =
new AvroInstantSerializer<>(OffsetDateTime.class, OffsetDateTime::toInstant);

public static final AvroInstantSerializer<ZonedDateTime> ZONED_DATE_TIME =
new AvroInstantSerializer<>(ZonedDateTime.class, ZonedDateTime::toInstant);

private final Function<T, Instant> getInstant;

protected AvroInstantSerializer(Class<T> t, Function<T, Instant> getInstant) {
super(t);
this.getInstant = getInstant;
}

@Override
public void serialize(T value, JsonGenerator gen, SerializerProvider provider) throws IOException {
/**
* Number of milliseconds, independent of a particular time zone or calendar,
* from 1 January 1970 00:00:00.000 UTC.
*/
final Instant instant = getInstant.apply(value);
gen.writeNumber(instant.toEpochMilli());
}

@Override
cowtowncoder marked this conversation as resolved.
Show resolved Hide resolved
public void acceptJsonFormatVisitor(JsonFormatVisitorWrapper visitor, JavaType typeHint) throws JsonMappingException {
JsonIntegerFormatVisitor v2 = visitor.expectIntegerFormat(typeHint);
if (v2 != null) {
v2.numberType(JsonParser.NumberType.LONG);
}
}

}
Loading