一、使用avro-maven插件为avsc文件生成对应的java类:
在项目的pom.xml中增加依赖及插件如下:
... org.apache.avro avro 1.8.1 org.apache.maven.plugins maven-compiler-plugin org.apache.avro avro-maven-plugin 1.8.1 generate-sources schema ${project.basedir}/src/main/avro/ ${project.basedir}/src/main/java/
执行mvn的install命令后,提示:
[INFO] Final Memory: 16M/217M[INFO] ------------------------------------------------------------------------[ERROR] Failed to execute goal org.apache.avro:avro-maven-plugin:1.8.1:schema (default) on project study: neither sourceDirectory: D:\fvp-workspace\study\src\main\avro or testSourceDirectory: D:\fvp-workspace\study\src\test\avro are directories -> [Help 1][ERROR]
需要注意下,需要手动在${project.basedir}/src/main和${project.basedir}/src/test下建立avro文件夹。avro文件夹就是后面存放Avro的schema文件了(*.avsc)。
1.1、定义schema
使用JSON为Avro定义schema。schema由基本类型(null,boolean, int, long, float, double, bytes 和string)和复杂类型(record, enum, array, map, union, 和fixed)组成。例如,以下定义一个user的schema,在main目录下创建一个avro目录,然后在avro目录下新建文件 user.avsc :
{"namespace": "com.sf.study.avro", "type": "record", "name": "User", "fields": [ { "name": "name", "type": "string"}, { "name": "favorite_number", "type": ["int", "null"]}, { "name": "favorite_color", "type": ["string", "null"]} ]}
如IDE的截图所示:
1.2、用schema生成类文件
在这里,因为使用avro插件,所以,直接输入以下命令,maven插件会自动帮我们生成类文件:
mvn clean install
然后在刚才配置的目录下就会生成相应的类,如下:
如果不使用插件,也可以使用avro-tools来生成:
java -jar /path/to/avro-tools-1.8.1.jar compile schema
1.3、使用前面生成的类
在前面,类文件已经创建好了,接下来,可以使用刚才自动生成的类来创建用户了:
package com.sf.study.avro;public class CreateUserTest { public static void main(String[] args) { User user1 = new User(); user1.setName("zhangsan"); user1.setFavoriteNumber(256); // Leave favorite color null // Alternate constructor User user2 = new User("lisi", 7, "red"); // Construct via builder User user3 = User.newBuilder() .setName("wangwu") .setFavoriteColor("blue") .setFavoriteNumber(null) .build(); }}
1.4、序列化
把前面创建的用户序列化并存储到磁盘文件:
// Serialize user1, user2 and user3 to disk DatumWriteruserDatumWriter = new SpecificDatumWriter (User.class); DataFileWriter dataFileWriter = new DataFileWriter (userDatumWriter); try { dataFileWriter.create(user1.getSchema(), new File("users.avro")); dataFileWriter.append(user1); dataFileWriter.append(user2); dataFileWriter.append(user3); dataFileWriter.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }
这里,我们是序列化user到文件users.avro
1.5、反序列化
接下来,我们对序列化后的数据进行反序列化:
public static void unserialize() { try { // Deserialize Users from disk DatumReaderuserDatumReader = new SpecificDatumReader (User.class); DataFileReader dataFileReader; dataFileReader = new DataFileReader (new File("users.avro"), userDatumReader); User user = null; while (dataFileReader.hasNext()) { // Reuse user object by passing it to next(). This saves us from // allocating and garbage collecting many objects for files with // many items. user = dataFileReader.next(user); System.out.println(user); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
输出结果为:
{"name": "Alyssa", "favorite_number": 256, "favorite_color": null} {"name": "Ben", "favorite_number": 7, "favorite_color": "red"} {"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}