Protocol Buffers

26 Feb 2016

This is regarding the syntax of protocol buffer files.

A .proto file is a (possibly-empty) sequence of top-level things: options, packages, imports, enums, extends, messages and services(?). This is an example:

package tutorial;

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

message AddressBook {
  repeated Person person = 1;
}

A package starts with the word "package", continues with a nonempty sequence of idents separated by periods, and is terminated with a semicolon. A package declaration helps to prevent naming conflicts between different projects. This is an example of a package:

package tutorial;

A message starts with the word "message", then an ident, then a messageBody. A messageBody is a possibly-empty sequence of items: option, group, extensions, extend, message, enum, field - (or colon???), surrounded with curly braces. A message is just an aggregate containing a set of typed fields. These are examples of messages:

message AddressBook {
  repeated Person person = 1;
}
message PhoneNumber {
  required string number = 1;
  optional PhoneType type = 2 [default = HOME];
}
message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 4;
}

An enum starts with the word "enum", then an ident, then a possibly-empty sequence of enumFields, options, and semicolons, surrounded by curly braces. An enumField is an ident, an equals, an integer literal and then a semicolon. This is an example of an enum:

enum PhoneType {
  MOBILE = 0;
  HOME = 1;
  WORK = 2;
}

An ident starts with a letter or underscore, and continues with any number of letters or numbers or underscores. These are examples of idents:

foo
_bar
baz_qux123

An extend starts with the word "extend", then a userType, then a possibly-empty sequence of groups, fields, and semicolons, surrounded by curly braces.

A constant is one of: an ident, an integer literal, a float literal, a string literal, or a bool literal.

A userType is a nonempty sequence of idents, separated by periods. It may optionally be prefixed by a period, in which case name resolution starts from the outermost scope (just like a leading slash for absolute paths in unix).

A group is a label, the word "group", a camelIdent, an equals, an integer literal, and a messageBody.

A field is a label, a type, an ident, an equals, an integer literal, and then optionally a nonempty sequence of fieldOption, separated by commas, inside square brackets. A fieldOption is an optionBody or the word "default", an equals, and a constant.

An optionBody is a nonempty sequence of idents separated with periods, then an equals, and then a constant.

An extensions is the word "extensions" followed by a nonempty sequence of extension separated with commas and terminated with a semicolon. An extension is an integer literal, or an integer literal followed by "to" followed by another integer literal, or the word "max".

A label is either "required" or "optional" or "repeated". "required" means that a value for the field must be provided, otherwise the message will be considered "uninitialized". "optional" means that the field may or may not be set. For simple types you can specify your own default value, otherwise a system default is used: zero for numeric types, the empty string for strings, false for bools. Repeated fields can be repeated any number of times, including zero. The order of the repeated values will be preserved in the protocol buffer.

A type is one of: double, float, int32, int64, uint32, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64, bool, string, bytes or a userType.

A camelIdent is a capital letter, followed by a possibly-empty sequence of letters and/or underscores.

A boolean literal is either "true" or "false".

A string literal is a possibly empty sequence of ordinary characters or escaped characters, surrounded by quotes.

A quote is either a single or a double quote.

A service is the word "service" followed by an ident, followed by a possibly-empty sequence of rpcs and options and semicolons, surrounded by curly braces.

A rpc is the word "rpc" followed by an ident, followed by a userType surrounded in parens, followed by the word "returns" followed by another userType surrounded by parens, followed by a semicolon.

An option starts with the word "option", continues with an optionBody, and is terminated with a semicolon. An optionBody is a nonempty sequence of idents separated by periods, then an equals, then a constant.

An import starts with the word "import", continues with a string literal, and is terminated with a semicolon.

This is an example of the sort of networks-of-ideas that I was talking about previously. Most of these are definitions. The bold word is a "definiendum", and the rest of the definition is a "definiens" - those are standard philosophy / logic terms for parts of a definition. There are a few examples, and a few descriptions. Clearly this needs more examples and more descriptions in order to really function as documentation.

However, I think it works as an example of a network-of-ideas.