1.API参考文档地址
https://avro.apache.org/docs/current/api/java/index.html
2.avro数据格式定义
官网说明:https://avro.apache.org/docs/current/spec.html
这里定义一个简单的schema文件user.avsc,注意 ,后缀一定是avsc ,其中的内容如下:
{
"namespace": "com.yyj.avro.demo",
"type": "record",
"name": "User",
"fields": [
{"name": "id", "type": "string","default":""},
{"name": "name", "type": ["string", "null"]},
{"name": "age", "type": ["int", "null"]}
]
}
Records use the type name “record” and support the following attributes:
-
name: a JSON string providing the name of the record (required). -
namespace, a JSON string that qualifies the name; -
doc: a JSON string providing documentation to the user of this schema (optional). -
aliases: a JSON array of strings, providing alternate names for this record (optional). -
fields: a JSON array, listing fields (required). Each field is a JSON object with the following attributes: -
name: a JSON string providing the name of the field (required), and -
doc: a JSON string describing this field for users (optional). -
type: a schema, as defined above -
default: A default value for this field, only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time. Permitted values depend on the field’s schema type, according to the table below. Default values for union fields correspond to the first schema in the union. Default values for bytes and fixed fields are JSON strings, where Unicode code points 0-255 are mapped to unsigned 8-bit byte values 0-255. Avro encodes a field even if its value is equal to its default.
field default values
avro type json type example
null null null
boolean boolean true
int,long integer 1
float,double number 1.1
bytes string "\u00FF"
string string "foo"
record object {"a": 1}
enum string "FOO"
array array [1]
map object {"a": 1}
fixed string "\u00ff"
-
order: specifies how this field impacts sort ordering of this record (optional). Valid values are “ascending” (the default), “descending”, or “ignore”. For more details on how this is used, see the sort order section below. -
aliases: a JSON array of strings, providing alternate names for this field (optional). -
namespace:定义了根据 schema 文件生成的类的包名 -
type:固定写法 -
name:生成的类的名称 -
fields:定义了生成的类中的属性的名称和类型,其中"type": [“int”, “null”]的意思是,age 这个属性是int类型,但可以为null
基本类型:null、boolean、int、long、float、double、bytes、string
复杂类型:record、enum、array、map、union、fixed
3.设置maven依赖
<dependencies>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>1.8.2</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/resources/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
4.Schema Evolution有4种:
-
Backward: 向后兼容,用新schema可以读取旧数据,有些字段没有数据,就用default值 -
Forward: 向前兼容,用旧schema可以读取新数据,avro将忽略新加的字段 -
Full: 全兼容,支持向前兼容,向后兼容,只能新增或删除带默认值的字段 -
Breaking: 不兼容
|