This was developed as part of a recent rethink of how the popular API proxy ApiAxle handles it's statistical data so it's quite specific to the associated domain. This is certainly not the only way to tackle this problem and may not be the best, simply the one that worked for this developer on this project.
- Super fast inserts.
- Query by arbitrary time range.
- Support for both near real-time (per second) and historical data.
- Reasonable DB space usage.
After looking at various solutions including using Sorted Sets we decided we could get the best performance/space setup by breaking storing each API hit in a range of hashmaps representing different granularities of time (e.g. minutes, seconds...). Each hashmap would hold a suitable number of values to provide useful data at that granularity.
granularities = seconds: # kept for 1 hour size: 3600 ttl: 7200 factor: 1 minutes: # Available for 24 hours size: 1440 # Minutes in 24 hours ttl: 172800 # Seconds in 48 hours factor: 60 # Number of seconds that make up this granularity
This structure is easily extensible and customisable to suit the projects needs.
Each key is then assigned a TTL (using Redis EXPIREAT) of twice the duration of storage (this is doubled to accommodate rollover between one day/hour/minute and the next.)
Each key then includes the timestamp at which the hit occurred rounded down to the nearest whole number when divided by the granularity required.
# Round a timestamp (in seconds) for a give granularity getRoundedTimestamp: ( timestamp, granularity ) -> factor = granularity.size * granularity.factor return Math.floor( timestamp / factor ) * factor
A hit occurring at 1364833411 would create the following keys:
Each key then contains a mapping from timestamp:hits. In this case the timestamp is rounded down divided by the amount of seconds per granularity (60 for minutes). The values are updated using the atomic HINCRBY operation.
The statistics data is made available using the
ApiAxle API with the consumer
specifying the required granularity and time range as query
parameters. The from and to timestamps are rounded to the required
granularity, reusing the logic for saving and a simple
iterates the range, incrementing by the number of seconds per unit.
i = from while from <= to rounded_ts = getRoundedTimestamp( getTime(), granularities["seconds"] ) redis_key = "<API_ID>:stats:seconds:rounded_ts" results[i] = hget( redis_key, i ) i += granularities["seconds"].factor
The full source code for this implementation is available in ApiAxle's github repo. This version is specific to ApiAxle but if any one is particularly interested get in touch with me, I'd be happy to help create a more generic JS/Coffeescript library for this.