By Gabor Laszlo Hajba | 10/31/2016 | General |Beginners

Python Serialization with Pickle

Python Serialization with Pickle

Pickle is a module by Python 3 which enables you to serialize objects between sessions / runs of your application

Why is serialization important?

Imagine you are playing a game and you quit after a while. When you start the game again you expect to continue where you left off. It would be a real pity if you would have to start all over again -- every time you start because there is no saved state of your gameplay.

There are multiple ways to store the data which contains the application's current state. One approach would be to open a file and write the information into it. If the application stores this data in plain text it would be a great opportunity for cheaters to change the variables like remaining lives. You want to try to avoid this.

I do not say that serializing the application state is the silver bullet for this problem but it makes things harder because it uses an internal algorithm to store the objects between application runs -- and this makes the result hard to read.

Why pickle?

In this article we will use the pickle module to serialize and retrieve data between applications runs. This is because pickle is bundled with the standard Python installation and is available on every computer where you run your application (because you cannot run it without a Python interpreter).

Naturally to use a tool we have to know how it works and which types can be serialized with this tool. Let's take a look at the types which can be pickled:

  • Native data-types like numbers, strings, booleans, byte arrays and None
  • Lists, tuples, sets and dictionaries
  • Functions, classes and instances of classes
  • Types you tell pickle to serialize because pickle can be extended

The example project

We will take a simple example for serialization. We have an imaginary 2D game where we’ll save information about the player's name, position, HP, Ammo and XP. To make things a bit more complex I’ll present two types of serialization: one uses a plain-old dictionary, the other a class called PlayerInfo:

class PlayerInfo:
    def __init__(self, name, position_x, position_y, hp, ammo, xp):
        self.name = name
        self.position_x = position_x
        self.position_y = position_y
        self.hp = hp
        self.ammo = ammo
        self.xp = xp

    def __str__(self):
        return 'name: {}\nposition_x: {}\nposition_y: {}\nhp: {}\nammo: {}\nxp: {}'.format(self.name, self.position_x, self.position_y, self.hp, self.ammo, self.xp)

 

Serializing objects

As mentioned previously: serializing objects is good to store information between application runs or share data between different applications in a bit secure manner where it is not easy to read the contents without effort.

Now let's see examples how we can serialize data. First let's create the information which will hold the information about the player:

>>> player_info = {'name': 'GHajba', 'position_x' : 135, 'position_y': 335, 'hp': 120, 'ammo': 53, 'xp': 59443.9}
>>> import pickle
>>> with open('player.pickle', 'wb') as f:
...     pickle.dump(player_info, f)
...
>>>

As you can see, the data is very simple: we created a dictionary and filled it with relevant data. Naturally this is a very basic example, in a real-life application gathering the information would be continuous and you wouldn't fill the dictionary prior saving.

One thing you have to note is that we use 'wb' when opening the file handle. This is because pickle works with bytes regardless of writing or reading.

After saving the information we can take a look at the player.pickle file. The contents will be similar to this:

ð}q(XxpqG@ð|ððððXammoqK5X
position_yqMOX
position_xqKðXnameqXGHajbaqXhpqKxu.

This can vary based on your operating system and many factors. But as you can see, only the strings are recognizable, the numbers not.

Naturally you can serialize to a string too. To do this you only need to call the pickle.dumps() function. To demonstrate this I use the already mentioned class PlayerInfo:

>>> p = PlayerInfo('GHajba', 135, 335, 120, 53, 59443.9)
>>> import pickle
>>> ps = pickle.dumps(p)
>>> ps
b'\x80\x03cplayer\nPlayerInfo\nq\x00)\x81q\x01}q\x02(X\x02\x00\x00\x00hpq\x03KxX\x04\x00\x00\x00nameq\x04X\x06\x00\x00\x00GHajbaq\x05X\x04\x00\x00\x00ammoq\x06K5X\n\x00\x00\x00position_xq\x07K\x87X\n\x00\x00\x00position_yq\x08MO\x01X\x02\x00\x00\x00xpq\tG@\xed\x06|\xcc\xcc\xcc\xcdub.'

As you can see, the result is a byte-string which contains the exported information:

De-serializing objects

After serializing our objects we can quit the example script freely without losing information created there. If we re-start the application it should load the already known information and display it to us. Let's extend our knowledge to load information with pickle and display it.

We take the previous example. We have saved the contents of player_info dictionary to a file called player.pickle. Now let's load it back.

You have two options: you can open up a new interactive interpreter instance or take care to name the variable something else than which you have serialized before. I'll change the name slightly to make importing work from the same interactive interpreter session too:

>>> import pickle
>>> with open('player.pickle', 'rb') as f:
...     player = pickle.load(f)
...
>>> player
{'hp': 120, 'ammo': 53, 'position_x': 135, 'xp': 59443.9, 'position_y': 335, 'name': 'GHajba'}

As you can see: the result is the same: we serialized player information to a file (from a map) and got back the same player information reading back from that file.

Loading back a byte-string created with pickle.dumps() happens with the pickle.loads() function. We load back our PlayerInfo instance serialized in the previous example:

>>> ps = b'\x80\x03cplayer\nPlayerInfo\nq\x00)\x81q\x01}q\x02(X\x02\x00\x00\x00hpq\x03KxX\x04\x00\x00\x00nameq\x04X\x06\x00\x00\x00GHajbaq\x05X\x04\x00\x00\x00ammoq\x06K5X\n\x00\x00\x00position_xq\x07K\x87X\n\x00\x00\x00position_yq\x08MO\x01X\x02\x00\x00\x00xpq\tG@\xed\x06|\xcc\xcc\xcc\xcdub.'
>>> import pickle
>>> player_info = pickle.loads(ps)
>>> player_info
<player.PlayerInfo object at 0x10147f550>

The deserialized object is of type PlayerInfo but do we have the exact information we serialized? To be sure let's simply call the __str__ method of the PlayerInfo class which is the easiest when we print the object:

>>> print(p)
name: GHajba
position_x: 135
position_y: 335
hp: 120
ammo: 53
xp: 59443.9

Everything is fine, we got back the same object we serialized previously.

 

Switching between Python 2 and 3

Depending on the Python versions you use it can happen, that you can easily de-serialize data serialized with a different Python version. Let's take the previous dictionary example and serialize it with Python 2.7 and de-serialize it with Python 3.5:

python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> player_info = {'name': 'GHajba', 'position_x' : 135, 'position_y': 335, 'hp': 120, 'ammo': 53, 'xp': 59443.9}
>>> pickle.dumps(player_info)
"(dp0\nS'name'\np1\nS'GHajba'\np2\nsS'hp'\np3\nI120\nsS'position_x'\np4\nI135\nsS'position_y'\np5\nI335\nsS'xp'\np6\nF59443.9\nsS'ammo'\np7\nI53\ns."

python3
Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b"(dp0\nS'name'\np1\nS'GHajba'\np2\nsS'hp'\np3\nI120\nsS'position_x'\np4\nI135\nsS'position_y'\np5\nI335\nsS'xp'\np6\nF59443.9\nsS'ammo'\np7\nI53\ns.")
{'name': 'GHajba', 'ammo': 53, 'position_y': 335, 'hp': 120, 'position_x': 135, 'xp': 59443.9}

As you can see, it goes well for this basic types -- the only thing you have to do is append a b before the string added as argument to pickle.loads because you have to tell the interpreter that your string contains bytes and it not just a plain-old string.

How about the other way around?

python3
Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> player_info = {'name': 'GHajba', 'position_x' : 135, 'position_y': 335, 'hp': 120, 'ammo': 53, 'xp': 59443.9}
>>> pickle.dumps(player_info)
b'\x80\x03}q\x00(X\x02\x00\x00\x00hpq\x01KxX\x04\x00\x00\x00ammoq\x02K5X\n\x00\x00\x00position_yq\x03MO\x01X\x02\x00\x00\x00xpq\x04G@\xed\x06|\xcc\xcc\xcc\xcdX\x04\x00\x00\x00nameq\x05X\x06\x00\x00\x00GHajbaq\x06X\n\x00\x00\x00position_xq\x07K\x87u.'

python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads('\x80\x03}q\x00(X\x02\x00\x00\x00hpq\x01KxX\x04\x00\x00\x00ammoq\x02K5X\n\x00\x00\x00position_yq\x03MO\x01X\x02\x00\x00\x00xpq\x04G@\xed\x06|\xcc\xcc\xcc\xcdX\x04\x00\x00\x00nameq\x05X\x06\x00\x00\x00GHajbaq\x06X\n\x00\x00\x00position_xq\x07K\x87u.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1382, in loads
    return Unpickler(file).load()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 886, in load_proto
    raise ValueError, "unsupported pickle protocol: %d" % proto
ValueError: unsupported pickle protocol: 3

As you can see, pickling data in Python 3 and de-serialize it in Python 2 comes to an error. However can we solve this problem?

Yes, we can, we just have to use a protocol which is known by Python 2 while we serialize the data with Python 3.

The protocol describes how Python should convert information while serializing. Higher numbers refer to more recent versions of Python. The original 0 protocol is human readable and is compatible between each Python version. Protocol 2 was introduced in Python 2.3, protcol 3 is available since Python 3.0 and protocol 4 from Python 3.4.

This means we have to use protocol 2 while pickling in Python 3 to enable de-serializing in Python 2:

python3
Python 3.5.0 (v3.5.0:374f501f4567, Sep 12 2015, 11:00:19)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> player_info = {'name': 'GHajba', 'position_x' : 135, 'position_y': 335, 'hp': 120, 'ammo': 53, 'xp': 59443.9}
>>> pickle.dumps(player_info, protocol=2)
b'\x80\x02}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00GHajbaq\x02X\x02\x00\x00\x00xpq\x03G@\xed\x06|\xcc\xcc\xcc\xcdX\n\x00\x00\x00position_xq\x04K\x87X\x02\x00\x00\x00hpq\x05KxX\n\x00\x00\x00position_yq\x06MO\x01X\x04\x00\x00\x00ammoq\x07K5u.'

python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads('\x80\x02}q\x00(X\x04\x00\x00\x00nameq\x01X\x06\x00\x00\x00GHajbaq\x02X\x02\x00\x00\x00xpq\x03G@\xed\x06|\xcc\xcc\xcc\xcdX\n\x00\x00\x00position_xq\x04K\x87X\x02\x00\x00\x00hpq\x05KxX\n\x00\x00\x00position_yq\x06MO\x01X\x04\x00\x00\x00ammoq\x07K5u.')
{u'name': u'GHajba', u'hp': 120, u'position_x': 135, u'position_y': 335, u'xp': 59443.9, u'ammo': 53}

As you can see, the problem was with the protocol pickle in Python 3 used. If we tell pickle to use a protocol which was known in the Python 2 version too, it works perfectly together.

The pickle.load and pickle.loads functions detect the protocol automatically so you do not have to add it as a parameter to these method calls.

Conclusion

We have seen, that pickle implements binary serialization of data and you can interchange Python interpreter versions between serializing and de-serializing but you have to take care of the right protocol to use.

Naturally serialization does not make your data secure but it makes harder to access them and you can use this approach to save state of your application between runs.

 

By Gabor Laszlo Hajba | 10/31/2016 | General

{{CommentsModel.TotalCount}} Comments

Your Comment

{{CommentsModel.Message}}

Recent Stories

Top DiscoverSDK Experts

User photo
3355
Ashton Torrence
Web and Windows developer
GUI | Web and 11 more
View Profile
User photo
3220
Mendy Bennett
Experienced with Ad network & Ad servers.
Mobile | Ad Networks and 1 more
View Profile
User photo
3060
Karen Fitzgerald
7 years in Cross-Platform development.
Mobile | Cross Platform Frameworks
View Profile
Show All
X

Compare Products

Select up to three two products to compare by clicking on the compare icon () of each product.

{{compareToolModel.Error}}

Now comparing:

{{product.ProductName | createSubstring:25}} X
Compare Now