2010-08-29

Benchmarking "associative array"

Some time ago I needed a fast way to look up objects in what people would call an associative array. In C++ this most of the time means a std::map or something similar.
A std::map is a great generic container that associate any kind of object with any kind of other object, and it's usually very fast as well. This blog post will be about benchmarking different kinds of ways to associate strings with objects.

Lets look at what the different alternatives we have:
  • std::map: The most generic way that almost always is available on all platforms. Often using binary trees.
  • boost::unordered_map: Also a generic container using hash index instead of binary trees.
  • stringmap: An attempt by myself to make a specialized container with strings as the key and a trie index.
There are many other containers out there and I will follow up on this post when I have the time comparing other trie/radix implementations. Right now I'm mostly interested in how fast the std::map and boost::unordered_map are.

So to try them out we need to make different kind of tests. We are interested in the performance when doing a "find" in the container, and when doing "inserts". We should also do different tests for different kind of strings.

Lets cut to the chase and do some benchmarking:

Round 1: "find" totaly random data.
In this test each string is totally randomized with a size between 1 and 255. Each char is a char between 1 and 255. Basically a worst case scenario for doing hash index.


It turns out that boost::unordered_map is pretty fast. But my own implementation that specialize in strings is almost twice as fast.

Round 2: "find" some more normal data.
Total random data is not very common, so I thought that I should do a test of something a little more common data. I made a "find /usr/" on my linux-box and used that as the keys.


boost::unordered is still pretty damn fast. And it does not seem to degrade much when the container size gets larger.

For me, this tests would be enough since I'm mostly interested in how the "find" performs. But I have also made a benchmark with the same data on how "inserts" performs.

Round 3: "insert" random data:
Same data as the round 1 data.


Not much to say. They all seem to perform nearly equal when inserting random data.

Round 4: "insert" normal data:
Same data as the round 2.


Now, this is interesting. boost::unordered performs very well in this test. You also notice the typical characteristics for a hash index.

Conclusion:
Well. Comparing std::map and boost::unordered_map was interesting. If you are using std::map today, replace it with boost::unordered_map unless you are using map-specific methods like equal_range. I also concluded that there is room for some specialized string-specific containers.

I have found some specialized trie/radix index containers and will follow up on them later on.
If they aren't drop in replacements for a std::map, I will probably continue on my own "stringmap" container and make it fully compatible with std::map.

Note: I should also do a test on memory usage.

4 comments:

  1. Hi.

    Could you describe your "stringmap" implementation ? What specifically are you doing ?

    Thanks,
    Shashwat

    ReplyDelete
  2. This is a mesmerizing article.. You’ve really impressed me today with this. You’ve shown your outstanding writing skills once again here today. I am dazzled to see that ! Hypnoterapi

    ReplyDelete
  3. Nice article! You might be interested to try my new sparse hash implementation,fast and extremely memory efficient: https://github.com/greg7mdp/sparsepp

    ReplyDelete
  4. Nice article! You might be interested to try my new sparse hash implementation,fast and extremely memory efficient: https://github.com/greg7mdp/sparsepp

    ReplyDelete