Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion DIRECTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -549,6 +549,11 @@
* [Test Hashmap](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/hashmap/test_hashmap.py)
* Hashset
* [Test My Hashset](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/hashset/test_my_hashset.py)
* Lfucache
* [Lfu Cache](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/lfucache/lfu_cache.py)
* [Lfu Cache Node](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/lfucache/lfu_cache_node.py)
* [Lfu Cache V2](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/lfucache/lfu_cache_v2.py)
* [Test Lfu Cache](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/lfucache/test_lfu_cache.py)
* Linked Lists
* Circular
* [Circular Linked List Utils](https://github.com/BrianLusina/PythonSnips/blob/master/datastructures/linked_lists/circular/circular_linked_list_utils.py)
Expand Down Expand Up @@ -1154,7 +1159,6 @@
* [Test Flatten Array](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_flatten_array.py)
* [Test Is Sorted How](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_is_sorted_how.py)
* [Test Length Of Missing Array](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_length_of_missing_array.py)
* [Test Lfu Cache](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_lfu_cache.py)
* [Test List Ops](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_list_ops.py)
* [Test Manipulate Data](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_manipulate_data.py)
* [Test Min Max](https://github.com/BrianLusina/PythonSnips/blob/master/tests/datastructures/test_min_max.py)
Expand Down
74 changes: 73 additions & 1 deletion datastructures/lfucache/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,76 @@ To determine the least frequently used key, a use counter is maintained for each
smallest use counter is the least frequently used key.

When a key is first inserted into the cache, its use counter is set to 1 (due to the put operation). The use counter
for a key in the cache is incremented either a get or put operation is called on it.
for a key in the cache is incremented either a get or put operation is called on it.

## Solution

The LFU cache algorithm tracks how often each key is accessed to determine which keys to remove when the cache is full.
It uses one hash map to store key-value pairs and another to group keys by their access frequency. Each group in this
frequency hash map contains nodes arranged in a doubly linked list. Additionally, it keeps track of the current least
frequency to quickly identify the least used keys. When the cache reaches its limit, the key with the lowest frequency
is removed first, specifically from the head of the corresponding linked list.

Each time a key is accessed, its frequency increases, and its position in the frequency hash map is updated, ensuring
that the least used keys are prioritized for removal. This is where the doubly linked list is helpful, as the node being
updated might be located somewhere in the middle of the list. Shifting the node to the next frequency level can be done
in constant time, making the update process efficient.

Let’s discuss the algorithm of the LFU cache data structure in detail. We maintain two hash maps, `lookup` and `frequencyMap`,
and an integer, `minimum_frequency`, as follows:

- `lookup` keeps the key-node pairs.
- The node contains three values: `key`, `value`, and `frequency`.

- `frequencyMap` maintains doubly linked lists against every frequency existing in the data.
- For example, all the keys that have been accessed only once reside in the double linked list stored at `frequencyMap[1]`,
all the keys that have been accessed twice reside in the double linked list stored at `frequencyMap[2]`, and so on.

- `minimum_frequency` keeps the frequency of the least frequently used key.

Apart from the required functions i.e., Get and Put, we implement a helper function, `PromoteKey` that helps us maintain
the order of the keys with respect to the frequency of their use. This function is implemented as follows:

- First, retrieve the node associated with the key.
- If node's `frequency` is 0, the key is new. We simply increment its `frequency` and insert it at the tail of the
linked list corresponding to the frequency 1
- Otherwise, detach the `node` from its corresponding linked list.
- If the corresponding linked list becomes empty after detaching the node, and the node’s `frequency` equals `minimum_frequency`,
there's no key left with a frequency equal to `minimum_frequency`. Hence, increment `minimum_frequency`.
- Increment `frequency` of the key
- Insert node at the tail of the linked list associated with the frequency corresponding to the updated `frequency`.
- Before inserting it, check if the linked list exists. Suppose it doesn’t, create one.

After implementing `PromoteKey()`, the LFU cache functions are implemented as follows:
- `Get`: We check if the key exists in the cache.
- If it doesn't, we return `None`
- Otherwise, we promote the key using `PromoteKey()` function and return the value associated with the key.
- `Put`: We check if the key exists in the cache.
- If it doesn't, we must add this (key, value) pair to our cache.
- Before adding it, we check if the cache has already reached capacity. If it has, we remove the LFU key. To do that,
we remove the head node of the linked list associated with the frequency equal to `minimum_frequency`.
- Then we add the new key.
- If the key already exists, we simply update its value.
- At the end of both cases, we adjust the frequency order of the key using `PromoteKey()`.

![Solution 1](./images/solutions/lfu_cache_solution_1.png)
![Solution 2](./images/solutions/lfu_cache_solution_2.png)
![Solution 3](./images/solutions/lfu_cache_solution_3.png)
![Solution 4](./images/solutions/lfu_cache_solution_4.png)
![Solution 5](./images/solutions/lfu_cache_solution_5.png)
![Solution 6](./images/solutions/lfu_cache_solution_6.png)
![Solution 7](./images/solutions/lfu_cache_solution_7.png)
![Solution 8](./images/solutions/lfu_cache_solution_8.png)
![Solution 9](./images/solutions/lfu_cache_solution_9.png)
![Solution 10](./images/solutions/lfu_cache_solution_10.png)

### Time Complexity

The time complexity of `PromoteKey()` is `O(1)` because the time taken to detach a node from a doubly linked list and
insert a node at the tail of a linked list is `O(1)`. The time complexity of both Put and Get functions is `O(1)` because
they utilize `PromoteKey()` and some other constant time operations.

### Space Complexity

The space complexity of this algorithm is linear, `O(n)`, where `n` refers to the capacity of the data structure. This
is the space occupied by the hash maps.
130 changes: 4 additions & 126 deletions datastructures/lfucache/__init__.py
Original file line number Diff line number Diff line change
@@ -1,127 +1,5 @@
from collections import defaultdict
from typing import Any, Union, Dict
from datastructures.lfucache.lfu_cache_node import LfuCacheNode
from datastructures.lfucache.lfu_cache import LFUCache
from datastructures.lfucache.lfu_cache_v2 import LFUCacheV2

from datastructures.linked_lists.doubly_linked_list import DoublyLinkedList
from datastructures.linked_lists.doubly_linked_list.node import DoubleNode


class LfuCacheNode(DoubleNode):
def __init__(self, data):
super().__init__(data)
self.frequency = 1


class LFUCache:
def __init__(self, capacity: int):
"""
Initializes an instance of a LFUCache
@param capacity: Capacity of the cache
@type capacity int

1. Dict named node self._lookup for retrieval of all nodes given a key. O(1) time to retrieve a node given a key
2. Each frequency has a DoublyLinkedList stored in self._frequency where key is the frequency and value is an
object of DoublyLinkedList
3. minimum frequency through all nodes, this can be maintained in O(1) time, taking advantage of the fact that
the frequency can only increment by 1. use the following 2 rules:
i. Whenever we see the size of the DoublyLinkedList of current min frequency is 0, increment min_frequency
by 1
ii. Whenever we put in a new (key, value), the min frequency must be 1 (the new node)
"""
self.capacity = capacity
self._current_size = 0
self._lookup = dict()
self._frequency: Dict[int, DoublyLinkedList] = defaultdict(DoublyLinkedList)
self._minimum_frequency = 0

def __update(self, node: LfuCacheNode):
"""
Helper function used in 2 cases:
1. When get(key) is called
2. When put(key, value) is called and key exists

Common point of the 2 cases:
1. no new node comes in
2. node is visited one more time -> node.frequency changed -> thus the place of this node will change

Logic:
1. Pop node from 'old' DoublyLinkedList with frequency
2. Append node to 'new' DoublyLinkedList with frequency + 1
3. If 'old' DoublyLinkedList has size 0 & self.minimum_frequency is frequency, update self.minimum_frequency
to frequency + 1

Complexity Analysis:
Time Complexity: O(1) time

@param node: Node to update in the Cache
@type node LfuCacheNode
"""
frequency = node.frequency

# pop the node from the 'old' DoublyLinkedList
self._frequency[frequency].delete_node(node)

if self._minimum_frequency == frequency and not self._frequency[frequency]:
self._minimum_frequency += 1

node.frequency += 1
frequency = node.frequency

# add to 'new' DoublyLinkedList with new frequency
self._frequency[frequency].prepend(node)

def get(self, key: int) -> Union[Any, None]:
"""
Gets an item from the Cache given the key
@param key: Key to use to fetch data from Cache
@return: Data mapped to the key
"""
if key not in self._lookup:
return None

node = self._lookup[key]
data = node.data
self.__update(node)
return data

def put(self, key: int, value: Any) -> None:
"""
If key is already present in the self._lookup, we perform same operations as get, except updating the node data
to new value

Otherwise, below operations are performed:
1. If cache reaches capacity, pop least frequently used item.
2 Facts:
a. we maintain self._minimum_frequency, minimum possible frequency in cache
b. All cache with the same frequency are stored as a DoublyLinkedList, with recently used order (Always
append to head).

Consequence is that the tail of the DoublyLinkedList with self._minimum_frequency is the least recently used
one, pop it.

2. Add new node to self._lookup
3. add new node to DoublyLinkedList with frequency of 1
4. reset minimum_frequency to 1

@param key: Key to use for lookup
@param value: Value to store in the cache
@return: None
"""

if self.capacity == 0:
return None

if key in self._lookup:
node = self._lookup[key]
self.__update(node)
node.data = value
else:
if self._current_size == self.capacity:
node = self._frequency[self._minimum_frequency].pop()
self._lookup.pop(node.key)
self._current_size -= 1

node = DoubleNode(data=value, key=key)
self._lookup[key] = node
self._frequency[1].append(node)
self._minimum_frequency = 1
self._current_size += 1
__all__ = ["LFUCache", "LFUCacheV2", "LfuCacheNode"]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
123 changes: 123 additions & 0 deletions datastructures/lfucache/lfu_cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from collections import defaultdict
from typing import Any, Union, Dict

from datastructures.linked_lists.doubly_linked_list import DoublyLinkedList
from datastructures.lfucache.lfu_cache_node import LfuCacheNode


class LFUCache:
def __init__(self, capacity: int):
"""
Initializes an instance of a LFUCache
@param capacity: Capacity of the cache
@type capacity int

1. Dict named node self._lookup for retrieval of all nodes given a key. O(1) time to retrieve a node given a key
2. Each frequency has a DoublyLinkedList stored in self._frequency where key is the frequency and value is an
object of DoublyLinkedList
3. minimum frequency through all nodes, this can be maintained in O(1) time, taking advantage of the fact that
the frequency can only increment by 1. use the following 2 rules:
i. Whenever we see the size of the DoublyLinkedList of current min frequency is 0, increment min_frequency
by 1
ii. Whenever we put in a new (key, value), the min frequency must be 1 (the new node)
"""
self.capacity = capacity
self._current_size = 0
self._lookup = dict()
self._frequency: Dict[int, DoublyLinkedList] = defaultdict(DoublyLinkedList)
self._minimum_frequency = 0

def __update(self, node: LfuCacheNode):
"""
Helper function used in 2 cases:
1. When get(key) is called
2. When put(key, value) is called and key exists

Common point of the 2 cases:
1. no new node comes in
2. node is visited one more time -> node.frequency changed -> thus the place of this node will change

Logic:
1. Pop node from 'old' DoublyLinkedList with frequency
2. Append node to 'new' DoublyLinkedList with frequency + 1
3. If 'old' DoublyLinkedList has size 0 & self.minimum_frequency is frequency, update self.minimum_frequency
to frequency + 1

Complexity Analysis:
Time Complexity: O(1) time

@param node: Node to update in the Cache
@type node LfuCacheNode
"""
frequency = node.frequency

# pop the node from the 'old' DoublyLinkedList
self._frequency[frequency].delete_node(node)

if self._minimum_frequency == frequency and not self._frequency[frequency]:
self._minimum_frequency += 1

node.frequency += 1
frequency = node.frequency

# add to 'new' DoublyLinkedList with new frequency
self._frequency[frequency].prepend(node)

def get(self, key: int) -> Union[Any, None]:
"""
Gets an item from the Cache given the key
@param key: Key to use to fetch data from Cache
@return: Data mapped to the key
"""
if key not in self._lookup:
return None

node = self._lookup[key]
data = node.data
self.__update(node)
return data

def put(self, key: int, value: Any) -> None:
"""
If key is already present in the self._lookup, we perform same operations as get, except updating the node data
to new value

Otherwise, below operations are performed:
1. If cache reaches capacity, pop least frequently used item.
2 Facts:
a. we maintain self._minimum_frequency, minimum possible frequency in cache
b. All cache with the same frequency are stored as a DoublyLinkedList, with recently used order (Always
append to head).

Consequence is that the tail of the DoublyLinkedList with self._minimum_frequency is the least recently used
one, pop it.

2. Add new node to self._lookup
3. add new node to DoublyLinkedList with frequency of 1
4. reset minimum_frequency to 1

@param key: Key to use for lookup
@param value: Value to store in the cache
@return: None
"""

if self.capacity == 0:
return None

if key in self._lookup:
node = self._lookup[key]
self.__update(node)
node.data = value
return None
else:
if self._current_size == self.capacity:
node = self._frequency[self._minimum_frequency].pop()
self._lookup.pop(node.key)
self._current_size -= 1

node = LfuCacheNode(data=value, key=key)
self._lookup[key] = node
self._frequency[1].prepend(node)
self._minimum_frequency = 1
self._current_size += 1
return None
13 changes: 13 additions & 0 deletions datastructures/lfucache/lfu_cache_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from datastructures.linked_lists.doubly_linked_list.node import DoubleNode


class LfuCacheNode(DoubleNode):
def __init__(self, data, key):
super().__init__(data, key=key)
self.frequency = 1


class LfuCacheNodeV2(DoubleNode):
def __init__(self, data, key):
super().__init__(data, key=key)
self.frequency = 0
Loading
Loading