Today I read this article about rewriting Pushpin's connection manager in Rust which resulted in 90% reduction in CPU usage.

Author describes the reasoning behind doing rewrite (as we all know that rewriting applications from scratch is almost always a bad idea), presents architecture overview and shows various optimization techniques which were used to achieve the result. I highly recommend reading it as it is inspiring.

One of the techniques was Hierarchical Timing Wheels. I vaguely remembered the idea behind this concept and decided to refresh my knowledge a bit. It turned out I was thinking about Simple Timing Wheels instead of Hierarchical ones -.-.

I'll describe the why and how next.

Why

Why this data structure was created? In a lot of applications, time is crucial component for correct and fast execution of the software. For example, we may need to restrict time of request processing in our high performance server to process more small workloads and move the long-running tasks to separate queue.

If you have a lot of processing happening then you'll have a lot of timers as well. In order to efficiently manage those timers we can use few data structures. One of them is Hierarchical Timing Wheels. I'll focus on this structure, but you can see more approaches and evolution of them here.

How

Imagine that we are implementing a system which processes a lot of data in some specific time window. In order to maximize throughput we want to defer temporary data cleanup until processing window is over. We know that window lasts for 2h 35min and 13s (no idea — don't ask :D), but we don't know exact time the window will start. All we know that the window starts with the first request.

So what we need to do is when first request arrives, we set up a timer which will do data cleanup after our specified time.

To manage a lot of such windows we want to use Hierarchical Timing Wheels. Let's break it down into smaller parts starting from the end.

Timing wheels

Timing Wheel looks like a Wall Clock — we have round shape, and it's divided into smaller slices. Each slice is the same size and represents time unit — for example seconds.

We would use Ring Buffer to implement that and the index points to the current time. With every tick we move the index to the next slice.

Behind each slice we have a task to execute. We execute it when the index points on slice. For example, we scheduled task to be executed in 30 seconds, so we are putting the task in a place current_time_idx + 30. After 30 seconds, when index moves to our task, the task is executed.

Hierarchical

Now we want to schedule our cleanup task which should happen after 2h 35min and 13s. For each part of the time hierarchy (one part is hours, then minutes, then seconds) we have separate timing wheel. Additionally, for each slice, besides the task, we are also storing its timeout.

With every tick we are moving the index of the shortest part (seconds in our case). When we reach the end of the buffer, we are moving index to the first position, and_ we are moving index of the next wheel (minutes) one step further (just like the minute hand on the clock moves when the second hand finishes full circle).

When the index points to a task, we are removing the largest portion of the time hierarchy, and we are moving the task to the smaller hierarchy wheel under appropriate slice.

Example

Inserting timer with expire time of 2h 35min 13s:

  • create three wheels — hours, minutes, seconds — index of each is set to slice number 0
  • go to the hour wheel, insert the task with time 2h 35min and 13s under slice number 1 (2h → second slice)
  • when seconds wheel's index will make full circle, we are moving the minutes wheel's index to the next slice
  • when minutes wheel's index will make full circle, we are moving the hours wheel's index to the next slice
  • when hour wheel's index will point to our task, we are removing hours from its timeout
  • then put the task with time 35min and 13s under 35th slice of minutes wheel
  • we continue this until minutes wheel's index points to our task
  • then again — remove minutes from the time and put task with time 13s under 13th slice of seconds wheel
  • after 13 seconds, the index of seconds wheel will point to our task
  • execute our task