Home >Blog

Protocol Buffer in Python

There are certain objects available in python that wrap access to an underlying memory array or buffer. Such objects include the built-in bytes and bytearray, also some extensions like array.array. For special purposes such numeric computation, simulation, or image processing, third party libraries can defined their own types.

Here we’ll cover working with the protocol buffer and will show:

Different message formats used in a .proto file.
How to use the protocol buffer compiler.
How to use the python protocol buffer API to write and read messages.

Defining your .proto file format

For creating a contacts saving application, you will need to start with a .proto file. This file will contain the definitions. In the .proto file you will specify the data structure you want to socialize, then add a name and a datatype to the corresponding field.

package tutorial;


message Person {

    required string name = 1;

    required int32 id = 2;

    optional string email = 3;


    enum PhoneType {

        MOBILE = 0;

        HOME = 1;

        WORK = 2;

    }

    message PhoneNumber {

        required string number = 1;

        optional PhoneType type = 2[
            default = HOME];

    }

}

message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2[
        default = Home];
}

repeated PhoneNumber phones = 4,

    message AdrssBook {
        repeated Person people = 1;
    }

Now go through every part of this code and what it does.

We will create a file starting with a package declaration that will help us to prevent any type of naming conflicts with other projects. Packages are determined by hierarchical file directory structure. So if you define any package in your .proto file it will not have an effect on generated code. It is recommended to define one in the Protocol Buffers name space as well as in other languages or in non-python languages.

In the next statement you have your message definitions. It just contains sets of typed fields. There are many different data types available including int32, float, string, bool.

You can also add further structure to your messages by using other message types as field types. In the example above the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages—as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values—here you want to specify that a phone number can be one of MOBILE, HOME, or WORK.

The " = 1", " = 2" markers on each element identify the unique "tag" that field uses in the binary encoding.

Each field must contain one from the following three modifiers:

required: a value for the field must be provided, otherwise the message will be considered "uninitialized". Serializing an uninitialized message will raise an exception. Parsing an uninitialized message will fail. Other than this, a required field behaves exactly like an optional field.

optional: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type in the example. Otherwise, a system default is used: zero for numeric types, the empty string for strings, false for bools. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which has none of its fields set. Calling the accessor to get the value of an optional (or required) field which has not been explicitly set always returns that field's default value.

repeated: the field may be repeated any number of times (including zero). The order of the repeated values will be preserved in the protocol buffer. Think of repeated fields as dynamically sized arrays.

Compiling Your Protocol Buffers

After you have created the .proto, it requires classes to do read and write operations to the AddressBook. To accomplish this you will run the protocol buffer compiler proctoc on your .proto:

If you haven't installed the compiler, download the package and follow the instructions in the README.

Now run the compiler, specifying the source directory (where your application's source code lives which is the current directory if you don't provide a value), the destination directory (where you want the generated code to go, often the same as $SRC_DIR), and the path to your .proto. In this case use:

protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto

Because you want Python classes, you use the --python_out option; similar options are provided for other supported languages.

This generates addressbook_pb2.py in your specified destination directory.

The Protocol Buffer API

In java and C++ protocol buffer code is generated automatically. Unfortunately in python Python protocol buffer compiler does not generate data access code automatically. Instead there is another file generated which have its own interpretation which you would say so decipher.

class Person(message.Message):
    __metaclass__ = reflection.GeneratedProtocolMessageType


class PhoneNumber(message.Message):
    __metaclass__ = reflection.GeneratetlProtocolMessageType

DESCRIPTOR = _PERSON_PHONENUMBER
DESCRIPTOR = _PERSON

call AddressBook(message.Message):
    __metaclass__ = reflection.GeneratedProtocolMessageType

DESCRIPTOR = ADDRESSBOOK

_ _metaclass_ _ = reflection.GeneratedProtocolMessageType is the important statement here in which the template is created for creating classes. At load time, the GeneratedProtocolMessageType metaclass uses the specified descriptors to create all the Python methods you need to work with each message type and adds them to the relevant classes. You can then use the fully-populated classes in your code.

Now you can use the Person class as if it defines each field of the Message base class as a regular field. You can write it as in the following code example:

import addressbook_pb2

person = addressbook_pb2.Person()

person.id  = 1234

person.name = "John Doe"

person.email = "jdoe@example.com"

phone = person.phones. add()

phone.number = "555-1234"

phone.type = addressbook_pb2.Person.Home

Extending a Protocol Buffer

At some point after the release of your first or perhaps several implementations, you’ll likely need to improve the protocol buffer’s definition. If you want a new protocol buffer which is compatible forward and backward then in this new version of the buffer you have to follow these rules:

You cannot change tags numbers of any operational field.
You cannot delete any existing field or add any new field in the buffer.
You may be able to delete optional or replicated fields.
You can add new optional fields or repeated fields but for that you have to create new tag numbers.

What you can accomplish if you follow these rules?

If you follow these rules, old code will happily read new messages and simply ignore any new fields. To the old code, optional fields that were deleted will simply have their default value, and deleted repeated fields will be empty. New code will also transparently read old messages. However, keep in mind that new optional fields will not be present in old messages, so you will need to either check explicitly whether they're set with has_, or provide a reasonable default value in your .proto file with [default = value] after the tag number. If the default value is not specified for an optional element, a type-specific default value is used instead: for strings, the default value is the empty string. For Booleans, the default value is false. For numeric types, the default value is zero. Note also that if you’ve added a new repeated field, your new code will not be able to tell whether it was left empty (by new code) or never set at all (by old code) since there is no has_ flag for it.

Hope you enjoyed. If you’re looking for more Python articles then check out:

5 Python Web Frameworks to learn in 2017

or visit our homepage to compare the best software development tools.

By L.R. | 2/22/2017 | General