2012/05/22

python tips: unpickling object is very slow because of garbage collection

In python, unpickling gigantic object from file is very much time consuming because of python's garbage collection. Python's garbage collection starts when cPickle loads gigantic object contains a lot of container object and number of container objects in the object exceed threshold of collect() method, python's garbage collection starts.
Now, assume you have 2GB pickle object which contains huge dictionary object in your file system and your python code load it. I compare loading time with/without garbage collection in python 2.7.2 on Ubuntu12.04.
#!/usr/local/bin/python                                                         
#coding: utf-8                                                                  
                                                                                
import cPickle as pickle                                                        
from datetime import datetime                                                   
import gc
                                                                                
F = file('gigantic_data.pickle', 'rb')                                          
s = datetime.now()                                                              
myobj = pickle.load(F)                                                          
e = datetime.now()                                                              
F.close()
                                                                                
print("load_pickle takes is {0}".(e-s))

# disabling cyclic garbage collection
gc.disable()

F = file('gigantic_data.pickle', 'rb')                                          
s = datetime.now()                                                              
myobj = pickle.load(F)                                                          
e = datetime.now()                                                              
F.close()

print("load_pickle takes is {0}".(e-s))
> /usr/local/bin/python above_script.py
load_pickle takes 0:15:16.530199
load_pickle takes 0:00:14.195409

No comments:

Post a Comment

100