Avro schema supports nested record definitions, i.e., definition of a custom record, whose fields are again some other custom records. One of the questions I had while working on an Avro schema was: why do I have to repeat the name field in a subrecord definition?
Let’s take a look at the “address” part of this sample schema:
It turns out that the first
address is the field name and the second
Address (note the uppercase “A”) is the type name.
The difference between a field name and a type name becomes clear when the Avro schema is compiled into Java classes. In our example, “address” is used as the name of a member field in the Company class, and “Address” is used as the class name for the generated Address class.
Our sample schema (Company.avsc) can be compiled to Java classes as follows:
The generated classes are:
$ tree output/ output/ `-- io `-- github `-- ouyi `-- avro |-- Address.java `-- Company.java
The Company class has a member field “address” of the type “Address”:
Considering the Java coding convention, it makes sense that the first letter of the field name is in lower case (“address”), while the same of the type name is in upper case (“Address”).
For completeness, the following are a few JSON records based on our sample schema:
Note that in contrast to the Avro binary format, the JSON representation does not carry the schema (type information). Therefore, we only see the lower case field names here.