Wednesday, August 13, 2008

Custom hash objects in Python

It's quite common to use strings, integers and other 'native' Python data types as hash keys. But sometimes it is much easier to be able to use your own class instances as keys. Python's magic methods allow you to do this.

Note: This is not a tip on implementing hash functions, this is how you can remove a certain layer of peeking around into objects

__hash and __cmp__


Consider a useless HTML parser with a simple node design where you want to associate the node name with its attributes.

You want to use the absolute node name as a unique hash.

The solution is to define custom implementations for __hash__ and __cmp__, two magic methods. For more information and constraints about them take a look at the Python docs.

The builtin functions hash(obj) and cmp(obj1, obj2) will attempt to call there __underscored__ counterparts on objs.



import UserDict # allow NodeAttrs to behave like a dictionary, not significant for this example

class NodeName(object):
def __init__(self, name, parent=None):
self.name = name
self.parent = parent

def __str__(self):
return (parent and str(parent) or '') + self.name

def __hash__(self):
# the hash of our string is our unique hash
return hash(str(self))

def __cmp__(self, other):
# similarly the strings are good for comparisons
return cmp(str(self), str(other))

class NodeAttrs(UserDict.UserDict):
def __init__(self, attrs={}):
self.update(attrs)



Where we assume that the parser is doing the heavy lifting of parsing the name, and putting the attributes in a dictionary. Now to use this in a dictionary you would do the following:



>>> d = {}
>>> d[node_name] = attrs # node_name is an instance of NodeName and attrs

... # do anything which can be done to a dictionary and its keys



Thats it! For more magic methods see Python __Underscore__ Methods

1 comment:

  1. A few typos:

    __hash should be __hash__

    there counterparts should be their counterparts

    ReplyDelete