Greedy Python « damnever's blog

Python is slow because it eats CPU, and multithreading can’t effectively utilize multi-core. However, this product not only eats CPU but also eats memory, which is very greedy.

In my previous article, I mentioned a project. I never paid attention to it. This product probably loaded 70M disk files into memory in the form of dict, and the memory usage immediately soared by several hundred megabytes .. (⊙v⊙)

Let’s see how much memory the Python object eats (the following code is based on Python3.6, Python2.x will only increase but not decrease):

>>> import sys
>>> sys.getsizeof(1)
 28  # bytes
>>> sys.getsizeof(1<<64)  # long in Py2
 36
>>> sys.getsizeof(1.1)
 24
>>> sys.getsizeof('s')
 50
>>> sys.getsizeof('ss')
 51
>>> sys.getsizeof(b'b')
 34
>>> sys.getsizeof(b'bb')
 35
>>> from decimal import Decimal
>>> sys.getsizeof(Decimal(3.4))
 104

For container type objects, we need to use this code to recursively calculate the memory size, and it is directly copied here:

from __future__ import print_function
from sys import getsizeof, stderr
from itertools import chain
from collections import deque
try:
    from reprlib import repr
except ImportError:
    pass

def total_size(o, handlers={}, verbose=False):
    """ Returns the approximate memory footprint an object and all of its contents.

    Automatically finds the contents of the following builtin containers and
    their subclasses:  tuple, list, deque, dict, set and frozenset.
    To search other containers, add handlers to iterate over their contents:

        handlers = {SomeContainerClass: iter,
                    OtherContainerClass: OtherContainerClass.get_elements}
    """
    dict_handler = lambda d: chain.from_iterable(d.items())
    all_handlers = {
        tuple: iter,
        list: iter,
        deque: iter,
        dict: dict_handler,
        set: iter,
        frozenset: iter,
    }
    all_handlers.update(handlers)     # user handlers take precedence
    seen = set()                      # track which object id's have already been seen
    default_size = getsizeof(0)       # estimate sizeof object without __sizeof__

    def sizeof(o):
        if id(o) in seen:       # do not double count the same object
            return 0
        seen.add(id(o))
        s = getsizeof(o, default_size)

        if verbose:
            print(s, type(o), repr(o), file=stderr)

        for typ, handler in all_handlers.items():
            if isinstance(o, typ):
                s += sum(map(sizeof, handler(o)))
                break
        return s

    return sizeof(o)


##### Example call #####
if __name__ == '__main__':
    d = dict(
        a=1, b=2.5, c=1<<64,
        d=(1, 2, 3), e=[4, 5, 6], f={7, 8, 9},
        g=b'bytes', h='unicode'
    )
    print(total_size(d, verbose=True))

The output is like this:

368 <class 'dict'> {'a': 1, 'b': 2.5, 'c': 18446744073709551616, 'd': (1, 2, 3), ...}
50 <class 'str'> 'a'
28 <class 'int'> 1
50 <class 'str'> 'b'
24 <class 'float'> 2.5
50 <class 'str'> 'c'
36 <class 'int'> 18446744073709551616
50 <class 'str'> 'd'
72 <class 'tuple'> (1, 2, 3)
28 <class 'int'> 2
28 <class 'int'> 3
50 <class 'str'> 'e'
88 <class 'list'> [4, 5, 6]
28 <class 'int'> 4
28 <class 'int'> 5
28 <class 'int'> 6
50 <class 'str'> 'f'
224 <class 'set'> {7, 8, 9}
28 <class 'int'> 8
28 <class 'int'> 9
28 <class 'int'> 7
50 <class 'str'> 'g'
38 <class 'bytes'> b'bytes'
50 <class 'str'> 'h'
56 <class 'str'> 'unicode'
1558

You’re not mistaken, this little dict has killed nearly 1.5K of memory… Well done, everyone can see the strength here.

Of course, Python will use the object pool to optimize the memory usage for small objects, but this can only be applied to a large number of identical small objects. And this article mentioned:

CPython manages small objects (less than 256 bytes) in special pools on 8-byte boundaries. There are pools for 1-8 bytes, 9-16 bytes, and all the way to 249-256 bytes. When an object of size 10 is allocated, it is allocated from the 16-byte pool for objects 9-16 bytes in size. So, even though it contains only 10 bytes of data, it will cost 16 bytes of memory. If you allocate 1,000,000 objects of size 10, you actually use 16,000,000 bytes and not 10,000,000 bytes as you may assume. This 60% overhead is obviously not trivial.

Therefore, use Python object cache to store large data sets cautiously. The omnipotent Google tells us that we can use the standard library shelve / sqlite3.connect(':memory:'), third-party tools numpy / redis and optimized data structures trie, etc. to replace dict-like data sets.

2017/4/14: Today I had time to try to optimize the project mentioned at the beginning of the article. With the help of the guppy tool, I tracked the memory usage of the object. It is indeed the dict that takes up a lot of memory. Due to the attributes of the SDK, in order to adapt to the file storage format on the Java side and use as few dependencies as possible and avoid unnecessary program complexity, it is not so necessary to replace the dict. But when I looked closely, I found that unicode occupies a lot of memory. In fact, the parsed strings loaded by json.load are all unicode. So I directly converted all unicode into bytes. What I want to complain about is that Python’s json library is weak in supporting custom decode. You need to exchange time complexity for flexibility. Try to hack the built-in Decoder. It’s not worth it to look at several versions of Python source code. It’s slower to load, but it’s worth it to reduce more than a hundred megabytes of memory. Actually, most of these strings contain numbers. Converting to numbers will reduce more memory, also a few tens of megabytes, but think about the need to convert again when writing files to increase the complexity of the program is not very cost-effective, temporarily can optimize so much, after all, no other objects occupy too much memory, after all, the performance of the online machine is also strong..

BTW

This article provides a case study on how to optimize memory usage, mainly using the Heapy tool to locate objects that consume more memory, and optimize memory usage by eliminating temporary objects (del large_data), using __slots__ magic, eliminating tuple, using Cython, turning object methods into functions, etc.

In addition, objgraph can be used to trace memory leak-related problems.