To generate an Avro Schema from a Java Object, you can use AvroMapper:
public static void main(String[] args) throws IOException {
AvroMapper avroMapper = AvroMapper.builder()
.disable(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY)
.addModule(new AvroJavaTimeModule())
.build();
createAvroSchemaFromClass(Universe.class, avroMapper);
createAvroSchemaFromClass(Earth.class, avroMapper);
createAvroSchemaFromClass(Mars.class, avroMapper);
}
private static void createAvroSchemaFromClass(Class<?> clazz, AvroMapper avroMapper) throws IOException {
AvroSchemaGenerator gen = new AvroSchemaGenerator();
gen.enableLogicalTypes();
avroMapper.acceptJsonFormatVisitor(clazz, gen);
AvroSchema schemaWrapper = gen.getGeneratedSchema();
org.apache.avro.Schema avroSchema = schemaWrapper.getAvroSchema();
String avroSchemaInJSON = avroSchema.toString(true);
//Write to File
Path fileName = Path.of(clazz.getSimpleName() + ".avsc");
Files.writeString(fileName, avroSchemaInJSON);
}
The following dependencies are need:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.10.2</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.24</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-avro</artifactId>
<version>2.14.2</version>
</dependency>
However, this does not support polymorphism. If you have the following class:
@Getter
public class Universe {
private Long numberOfPlanets;
private List<Planet> planets;
}
Then the generated AVRO schema will look like:
{
"type" : "record",
"name" : "Universe",
"namespace" : "com.example.generateavroschemafrompojo.pojo",
"fields" : [ {
"name" : "numberOfPlanets",
"type" : [ "null", {
"type" : "long",
"java-class" : "java.lang.Long"
} ]
}, {
"name" : "planets",
"type" : [ "null", {
"type" : "array",
"items" : {
"type" : "record",
"name" : "Planet",
"fields" : [ {
"name" : "planetName",
"type" : [ "null", "string" ]
}, {
"name" : "planetSize",
"type" : [ "null", "string" ]
}, {
"name" : "distanceFromSun",
"type" : [ "null", {
"type" : "long",
"java-class" : "java.lang.Long"
} ]
}, {
"name" : "numberOfMoons",
"type" : [ "null", {
"type" : "int",
"java-class" : "java.lang.Integer"
} ]
}, {
"name" : "details",
"type" : [ "null", {
"type" : "record",
"name" : "Details",
"namespace" : "com.example.generateavroschemafrompojo.pojo.common",
"fields" : [ ]
} ]
} ]
}
} ]
} ]
}
The planets variable will be of type Record (Planet), and it will have the fields that belong only to the class Planet and no mention of possible classes that extend this class.
There is also a problem of redefinition of record types. For example Details class is a common class used across various classes. As such during generation, there is no way to mark this as common. .
You will need to manually modify the generated schemas to allow only a single .avsc to define record type (Details), but all others will then reference it using its full namespace “com.example.generateavroschemafrompojo.pojo.common.Details”.
Github