Ticker Flow Cache

This is another interview question that I was asked and I wanted to fully develop to "see it working".

The problem is about saving streaming data and discarding the Least Recently Used. It is a simpler variation of LRU caching where cache faults don't need to be implemented (if the data has been discarded, instead of retrieving it from a secondary storage as a non-streaming cache would do, the program just returns a null)

The streaming data to be saved is the content of a "ticker tape": a security (e.g., stock) symbol and its corresponding price. The cache will only save the N most recent prices for the M most recently updated securities.

The problem is interesting in that the cache has 2 "levels" with slightly different characteristics. The lower level is composed of prices. When a new price comes in and the cache is full, the oldest one is always discarded, we never preserve an existing price. The higher level is composed of security symbols and re-use can happen, we may get multiple prices for the same security and when this happens the symbol needs to have its most recent use reset, without discarding any other security from the cache.

The lower level is easy to implement with a list of simple queues (FIFO), one list per security. For the higher level I chose a linked list, so that the "remove item in the middle" operation is reasonably easy and doesn't involve memory copying (note that removing from the middle is slower than adding a new item because it requires finding the item, which is an O(n) operation in a linked list). It is also necessary to have a relation between the two levels and I used a dictionary for this, with the security symbol as the key and the lower level index as the value. This dictionary also provides HashSet functionality to determine if a security is present in the cache.

Another design decision to consider is how to handle partially full situations. For the lower level no special consideration is needed, if the queue size would grow beyond its limits we just dequeue an item. For the higher level however, because we need to maintain the relation between higher an lower levels, we would have to handle differently the cases of partially full and "fully primed". I decided to initialize the higher level with dummy data because it simplifies (and makes it faster) the "fully primed" run time at the expense of slower initialization time and a slightly higher memory usage initially. I would expect that the main operation mode of the cache would be "fully primed" however if the expected usage is that the cache would be initialized and destroyed frequently and would operate more often in "not-fully-primed" mode (that is, if the number of securities in the cache is not filled to full capacity) then the code should be changed.

Another decision relates to multi-threaded access. For a cache like this I would generally expect it to be associated to a consolidated stream of data and this code is written with an expectatino of being single threaded. If there are multiple streams of data and Tick() needs to be thread-safe it would be necessary to modify the writes/updates so that they use interlocked writing and updating. Another possible multithreaded scenario would be when reading and writing happens on different threads. If the read happens between the time an item starts being dropped from the cache and when the new item has been added, then it could get incomplete/corrupt data (and locks/interlocked would be necessary to prevent this situation).

erufian / ticker Goto Github PK

ticker's Introduction

Ticker Flow Cache

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent