wotreplays.org: Adventures in MongoDB indexing
Any application backed by MongoDB sooner or later lives or dies by the indexes you have on your data…
For a while the indexes that were used on wotreplays.org weren’t too extensive, they served their purpose and brought query times down to an average of 0.05 miliseconds, fast enough to not worry me. However, those same indexes used to be the cause of the massive slowdown when vbAddict was cross-posting replays; when you get upwards of 3000 new replays a day, they add up really quickly.
So I spent some time mucking about with the code that actually does all the grunt work when you browse the site, the query module. This module basically takes the filter options passed, and constructs a query from it that gets fed to MongoDB. There are a few queries that are considerably fast to handle, anything that puts a clamp on things, as it were.
Things like player searches; the “main” index here is the one that restricts the set of replays to look for to a given player. But the second you browse a player and select a certain map, your earlier index doesn’t quite work. It’ll still be the one selected by MongoDB’s query optimizer, but it doesn’t “cover” the query anymore. Of course, on 10.000 replays you won’t notice that. Try it when you’ve got 500.000 and it will slow things down considerably, because the query itself takes longer.
Now imagine a query that happens quite regularly: browsing for a specific vehicle, on a specific map, in a specific game mode, on a specific server.
That’s 4 distinct things to look for. The problem arises when you start making more of these combinations, which the eventual “everything can be filtered” is going to be doing. So for the time being, the query module generates indexes on the fly that will pretty much guarantee each query it can perform is covered by an index. I can already hear the DBA’s out there gnashing their teeth, but it’s a way to see what actually works and what doesn’t. It also lets me twiddle a configuration file to change the order in which certain fields are stuck into compound indexes, so… it’s for science!
Oddly enough, query times have gone down by 0.1 second on the whole. The first time any given query runs it may take longer because the index needs to be built, but they’re speedier afterwards.