HyperLogLog

Posted Sep 7, 2023 Updated Sep 8, 2023

By M_Falcon

1 min read

HyperLogLog

HyperLogLog is an algorithm for the count distinct problem.
Redis provides HyperLogLog algorithm string data
It only uses 12 KB.

When to use?

Guarantee approximate uniqueness when handle large dataset using small memory

-- It's like a set, doesn't actually store true value
PFADD <string | number>

Duplicate -> 0 (false)
Unique -> 1 (true)

-- Get the approximate count of key.
PFCOUNT <key>

Suppose we need to guarantee uniqueness of item views.
So, It’s not allowed to increment many view counts by same user.

We can think about simple solution using SET data structure keeping username of items:views Set for item views

This solution guarantee exact number of view count.
However, requires many memory usage even though it’s simple feature

HyperLogLog for item views
This solution requires only 12KB memory usage.

HyperLogLog error
However, it has error rate approximately 0.81%.

This solution is appropriate than SET solution since it’s important reduce redis memory usage and it’s not much important counting views exactly.

Consider using HyperLogLog for ensuring uniqueness with some error.
It depends on your feature.
It’s appropriate when handle count views, likes etc.

Don’t use it when you need guaranteeing exact uniqueness like username , userId , email etc.

This post is licensed under CC BY 4.0 by the author.