Problem
I’ve made an object that looks like this:
company1.name = 'banana'
company1.value = 40
This object is something I’d like to keep. I’m not sure how I’m going to accomplish it.
Asked by Peterstone
Solution #1
The pickle module from the standard library could be used. Here’s an example of how it could be used to your situation:
import pickle
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
with open('company_data.pkl', 'wb') as outp:
company1 = Company('banana', 40)
pickle.dump(company1, outp, pickle.HIGHEST_PROTOCOL)
company2 = Company('spam', 42)
pickle.dump(company2, outp, pickle.HIGHEST_PROTOCOL)
del company1
del company2
with open('company_data.pkl', 'rb') as inp:
company1 = pickle.load(inp)
print(company1.name) # -> banana
print(company1.value) # -> 40
company2 = pickle.load(inp)
print(company2.name) # -> spam
print(company2.value) # -> 42
You might also develop your own basic utility that opens a file and writes a single object to it, such as this:
def save_object(obj, filename):
with open(filename, 'wb') as outp: # Overwrites any existing file.
pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)
# sample usage
save_object(company1, 'company1.pkl')
Because this is such a popular answer, I’d want to discuss a few more complex subjects.
Because the former is written in C and is substantially faster, it’s nearly always preferable to use the cPickle module instead of pickle. There are some minor changes between them, but in most cases, they’re interchangeable, and the C version will outperform the A version. It’s simple to switch to it; simply alter the import declaration to this:
import cPickle as pickle
In Python 3, cPickle was renamed _pickle, however this is no longer necessary because the pickle module now does it for you—see What is the difference between pickle and _pickle in Python 3? for more information.
The short version is that you could do something like this to ensure that your code always uses the C version when both Python 2 and 3 have it:
try:
import cPickle as pickle
except ModuleNotFoundError:
import pickle
As mentioned in the documentation, pickle can read and write files in a variety of Python-specific formats known as protocols. “Protocol version 0” is ASCII and thus “human-readable.” Versions > 0 are binary, and the highest accessible version is determined by the Python version. The default is also influenced by the Python version. Protocol version 0 was the default in Python 2, while Protocol version 4 is the default in Python 3.8.1. The module had a pickle in Python 3.x. It has DEFAULT PROTOCOL added to it, but Python 2 doesn’t have that.
Fortunately, there is a pickle-writing shorthand. Use the literal number -1 instead of HIGHEST PROTOCOL in every call (if that’s what you want, which it typically is). It’s comparable to referring the last element of a sequence via a negative index. As a result, instead of writing:
pickle.dump(obj, outp, pickle.HIGHEST_PROTOCOL)
You may simply write:
pickle.dump(obj, outp, -1)
If you constructed a Pickler object for use in many pickle operations, you’d only have to provide the protocol once:
pickler = pickle.Pickler(outp, -1)
pickler.dump(obj1)
pickler.dump(obj2)
etc...
Note: If you’re working in an environment with many Python versions, you’ll probably want to use (i.e. hardcode) a single protocol number that all of them can read (later versions can generally read files produced by earlier ones).
While a pickle file can hold any number of pickled objects, as illustrated in the examples above, it’s typically faster to keep them all in some form of variably-sized container, such as a list, tuple, or dict, then write them all to the file in one call:
tech_companies = [
Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')
Later, you can restore the list and everything in it with:
with open('tech_companies.pkl', 'rb') as inp:
tech_companies = pickle.load(inp)
The main benefit is that you don’t have to know how many object instances are saved in order to retrieve them later (although doing so without that information is possible, it requires some slightly specialised code). For further information, see the answers to the linked issue Saving and loading many objects in a pickle file? Personally, I preferred @Lutz Prechelt’s response, therefore that’s how the sample code is written:
class Company:
def __init__(self, name, value):
self.name = name
self.value = value
def pickle_loader(filename):
""" Deserialize a file of pickled objects. """
with open(filename, "rb") as f:
while True:
try:
yield pickle.load(f)
except EOFError:
break
print('Companies in pickle file:')
for company in pickle_loader('company_data.pkl'):
print(' name: {}, value: {}'.format(company.name, company.value))
Answered by martineau
Solution #2
I believe that assuming the object is a class is a rather strong assumption. What if it isn’t a class at all? There’s also the possibility that the interpreter didn’t define the object. What if the interpreter defined it for you? What if the properties were dynamically added? Pickle does not respect the addition of attributes to some python objects’ __dict__ after they are created (i.e. it ‘forgets’ they were added — because pickle serializes by reference to the object definition).
In all these cases, pickle and cPickle can fail you horribly.
If you want to save an object that was generated arbitrarily and has properties (either added in the object definition or added later),… Dill, which can serialise nearly anything in Python, is your best bet.
We begin with a class…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
... pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
...
>>>
Now turn off your computer and turn it back on…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
... company1 = pickle.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>>
Pickle isn’t up to the task. Let’s see what we can do with dill. For good measure, we’ll throw in another object type (a lambda).
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>>
>>> with open('company_dill.pkl', 'wb') as f:
... dill.dump(company1, f)
... dill.dump(company2, f)
...
>>>
Now it’s time to read the file.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
... company1 = dill.load(f)
... company2 = dill.load(f)
...
>>> company1
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>
It’s effective. Pickle fails while dill succeeds because dill treats __main__ like a module (for the most part) and may pickle class definitions rather than by reference (like pickle does). Dill can pickle a lambda since it provides it a name, which allows pickling magic to happen.
There is an easier approach to save all of these things, especially if you have a large number of them. Dump the entire Python session and return to it later.
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
... pass
...
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>>
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>>
>>> dill.dump_session('dill.pkl')
>>>
Now turn off your computer, go have a cup of coffee, or whatever you like, and come back later…
Python 2.7.8 (default, Jul 13 2014, 02:29:54)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>
The only significant disadvantage is that dill is not included in the Python standard library. You won’t be able to utilize it if you can’t install a Python package on your server.
You may acquire the latest dill via git+https://github.com/uqfoundation/dill.git@master#egg=dill if you can install python packages on your machine. You can also use pip install dill to download the most recent version.
Answered by Mike McKerns
Solution #3
Using python3 and company1 from your query, here’s a quick example.
import pickle
# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))
# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))
Pickle, on the other hand, frequently fails, as this answer pointed out. As a result, you should definitely use dill.
import dill
# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))
# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))
Answered by Anthony Ebert
Solution #4
You can use anycache to complete the task. It takes into account all of the details:
Assume you’ve got a method called myfunc that creates the instance:
from anycache import anycache
class Company(object):
def __init__(self, name, value):
self.name = name
self.value = value
@anycache(cachedir='/path/to/your/cache')
def myfunc(name, value)
return Company(name, value)
Anycache first executes myfunc and saves the result to a file in cachedir with a unique identifier (depending on the function name and arguments) as the filename. The pickled object is loaded on each subsequent run. The pickled object is grabbed from the previous python run if the cachedir is retained between python runs.
For any further details see the documentation
Answered by c0fec0de
Solution #5
The ability to save pickles has been added to newer versions of pandas.
It’s less difficult for me. e.g.
pd.to_pickle(object_to_save,'/temp/saved_pkl.pickle' )
Answered by George Sotiropoulos
Post is based on https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence