sunnyszy / lrb Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dasebe/webcachesim

77.0 2.0 28.0 25.33 MB

A C++11 simulator for a variety of CDN caching policies.

License: BSD 2-Clause "Simplified" License

C++ 90.39% Makefile 0.12% CMake 1.46% C 0.12% Python 6.64% Java 0.80% Dockerfile 0.16% Shell 0.31%

lrb's Introduction

webcachesim2

A simulator for CDN caching and web caching policies.

Simulate a variety of existing caching policies by replaying request traces, and use this framework as a basis to experiment with new ones. A 14-day long Wikipedia trace is released alongside the simulator.

The webcachesim2 simulator was built to evaluate the Learning relaxed Belady algorithm (LRB), a new machine-learning-based caching algorithm. The simulator build on top of webcachesim, see References for more information.

Currently supported caching algorithms:

Learning Relaxed Belady (LRB)
LR (linear-regression based ML caching)
Belady (heap-based)
Belady (a sample-based approximate version)
Relaxed Belady
Inf (infinite-size cache)
LRU
B-LRU (Bloom Filter LRU)
ThLRU (LRU with threshold admission control)
LRUK
LFUDA
S4LRU
ThS4LRU (S4LRU with threshold admission control)
FIFO
Hyperbolic
GDSF
GDWheel
Adaptive-TinyLFU (via Java library integration)
LeCaR
UCB
LHD
AdaptSize
Random (random eviction)

Configuration parameters of these algorithms can be found in the config file config/algorithm_params.yaml

The prototype implementation on top of Apache Traffic Server is available here.

Traces

The Wikipedia trace used in the paper: download link. To uncompress:

tar -xzvf wiki2018.tr.tar.gz

A newer version of Wikipedia trace is also available: download link.

Trace Format

Request traces are expected to be in a space-separated format with 3 columns and additionally columns for extra features.

time should be a long long int, but can be arbitrary (for future TTL feature, not currently in use)
id should be a long long int, used to uniquely identify objects
size should be uint32, this is object's size in bytes
extra features are optional uint16 features. LRB currently interprets them as categorical features (e.g., object type).

time	id	size
1	1	120
2	2	64
3	1	120
4	3	14
4	1	120

Simulator will run a sanity check on the trace when starting up.

Installation

For ease of use, we provide a docker image which contains the simulator. Our documentation assumes that you use this image. To run it:

 docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim ${traceFile} ${cacheType} ${cacheSize} [--param=value]

Alternatively, you may follow the instruction to manually install the simulator.

Common issues

If you see error when running docker

error: running sanity check on trace: /trace/wiki2018.tr
terminate called after throwing an instance of 'std::runtime_error'
what(): Exception opening file /trace/wiki2018.tr

Please verify the trace directory correctly mounted by running

docker run -it --entrypoint /bin/bash -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim -c "ls /trace"
# You should be able to see wiki2018.tr listed

Using an existing policy

The basic interface is

./webcachesim_cli traceFile cacheType cacheSize [--param=value]

where

traceFile: a request trace (see trace format)
cacheType: one of the caching policies
cacheSize: the cache capacity in bytes
param, value: optional cache parameter and value, can be used to tune cache policies

Global parameters

parameter	type	description
bloom_filter	0/1	use bloom filter as admission control in front of cache algorithm
dburi, dbcollection	string	upload simulation results to mongodb
is_metadata_in_cache_size	0/1	deducted metadata overhead from cache size
n_early_stop	int	stop simulation after n requests, <0 means no early stop

Examples

Running single simulation

Running LRU on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim wiki2018.tr LRU 1099511627776

# running sanity check on trace: /trace/wiki2018.tr
# ...
# pass sanity check
simulating

## intermediate results will be printed on every million of requests
seq: 1000000
# current_cache_size/max_cache_size (% full) 
cache size: 29196098273/1099511627776 (0.0265537)
# time take to simulate this million of requests
delta t: ...
# byte miss ratio for this million of requests
segment bmr: 0.786717
# resident set size in byte
rss: 60399616

# ...
# Final results will be print in json string. Byte miss and byte req are aggregated in segment_byte_req, segment_byte_miss.
# The default segment size is 1 million request. This allows to calculate final byte miss ratio with your warmup length.
# LRB evaluation warmup length is in the NSDI paper.
# Alternatively, check no_warmup_byte_miss_ratio for byte miss ratio without considering warmup.

Running B-LRU on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim wiki2018.tr LRU 1099511627776 --bloom_filter=1

Running LRB on Wiki trace 1TB

docker run -it -v ${YOUR TRACE DIRECTORY}:/trace sunnyszy/webcachesim wiki2018.tr LRB 1099511627776 --memory_window=671088640

LRB memory window for Wikipedia trace different cache sizes in the paper (based on first 20% validation prefix):

cache size (GB)	memory window
64	58720256
128	100663296
256	167772160
512	335544320
1024	671088640

Automatically tune LRB memory window on a new trace

LRB_WINDOW_TUNING.md describes how to tune LRB memory window on a new trace.

Contributors are welcome

Want to contribute? Great! We follow the Github contribution work flow. This means that submissions should fork and use a Github pull requests to get merged into this code base.

Bug Reports

If you come across a bug in webcachesim, please file a bug report by creating a new issue. This is an early-stage project, which depends on your input!

Contribute a new caching policy

If you want to add a new caching policy, please augment your code with a reference, a test case, and an example. Use pull requests as usual.

References

We ask academic works, which built on this code, to reference the LRB/AdaptSize papers:

Learning Relaxed Belady for Content Distribution Network Caching
Zhenyu Song, Daniel S. Berger, Kai Li, Wyatt Lloyd
USENIX NSDI 2020.

AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN
Daniel S. Berger, Ramesh K. Sitaraman, Mor Harchol-Balter
USENIX NSDI 2017.

lrb's People

Contributors

Stargazers

Watchers

lrb's Issues

LRB window tuning

Hi, I was trying to tune the LRB window for a new trace, but I'm not sure whether I'm doing it correctly. I use the command below

python3 lrb_window_search.py ~/job_dev.yaml ~/algorithm_params.yaml ~/trace_params.yaml mongodb://127.0.0.1:27017/mydb?compressors=disabled&gssapiServiceName=mongodb

And I got output in terminal as

2021-04-09 10:41:12.005 | INFO     | __main__:get_cache_size_and_parameter_list:158 - cdn1_500m_sigmetrics18.tr LRB ignore config: memory_window=10000000
2021-04-09 10:41:12.005 | INFO     | __main__:get_cache_size_and_parameter_list:165 - cdn1_500m_sigmetrics18.tr LRB ignore config: n_early_stop=[-1]
/mnt/data/miniconda3/lib/python3.8/site-packages/pymongo/compression_support.py:55: UserWarning: Unsupported compressor: disabled
  warnings.warn("Unsupported compressor: %s" % (compressor,))
lrb_window_search.py:38: UserWarning: no byte_million_req info, estimate memory window lower bound as 4k
  warnings.warn("no byte_million_req info, estimate memory window lower bound as 4k")
2021-04-09 10:41:12.011 | INFO     | __main__:get_validation_tasks_per_cache_size:54 - For cdn1_500m_sigmetrics18.tr/size=68719441552/LRB, memory windows to validate: [4095, 5622, 7718, 10594, 14543, 19963, 27404, 37617, 51637, 70883, 97301, 133565, 183345, 251678, 345479, 474239, 650989, 893613, 1226663, 1683842, 2311411, 3172875, 4355409, 5978673, 8206928, 11265657, 15464376, 21227960, 29139636, 40000000]
2021-04-09 10:41:12.012 | INFO     | __main__:get_cache_size_and_parameter_list:158 - cdn1_500m_sigmetrics18.tr LRB ignore config: memory_window=10000000
2021-04-09 10:41:12.012 | INFO     | __main__:get_cache_size_and_parameter_list:165 - cdn1_500m_sigmetrics18.tr LRB ignore config: n_early_stop=[-1]
2021-04-09 10:41:12.017 | INFO     | __main__:get_validation_tasks_per_cache_size:54 - For cdn1_500m_sigmetrics18.tr/size=137438883103/LRB, memory windows to validate: [4095, 5622, 7718, 10594, 14543, 19963, 27404, 37617, 51637, 70883, 97301, 133565, 183345, 251678, 345479, 474239, 650989, 893613, 1226663, 1683842, 2311411, 3172875, 4355409, 5978673, 8206928, 11265657, 15464376, 21227960, 29139636, 40000000]
2021-04-09 10:41:12.017 | INFO     | __main__:get_cache_size_and_parameter_list:158 - cdn1_500m_sigmetrics18.tr LRB ignore config: memory_window=10000000
2021-04-09 10:41:12.017 | INFO     | __main__:get_cache_size_and_parameter_list:165 - cdn1_500m_sigmetrics18.tr LRB ignore config: n_early_stop=[-1]
2021-04-09 10:41:12.022 | INFO     | __main__:get_validation_tasks_per_cache_size:54 - For cdn1_500m_sigmetrics18.tr/size=274877766207/LRB, memory windows to validate: [4095, 5622, 7718, 10594, 14543, 19963, 27404, 37617, 51637, 70883, 97301, 133565, 183345, 251678, 345479, 474239, 650989, 893613, 1226663, 1683842, 2311411, 3172875, 4355409, 5978673, 8206928, 11265657, 15464376, 21227960, 29139636, 40000000]
2021-04-09 10:41:12.023 | INFO     | __main__:get_cache_size_and_parameter_list:158 - cdn1_500m_sigmetrics18.tr LRB ignore config: memory_window=10000000
2021-04-09 10:41:12.023 | INFO     | __main__:get_cache_size_and_parameter_list:165 - cdn1_500m_sigmetrics18.tr LRB ignore config: n_early_stop=[-1]
2021-04-09 10:41:12.030 | INFO     | __main__:get_validation_tasks_per_cache_size:54 - For cdn1_500m_sigmetrics18.tr/size=549755532413/LRB, memory windows to validate: [4095, 5622, 7718, 10594, 14543, 19963, 27404, 37617, 51637, 70883, 97301, 133565, 183345, 251678, 345479, 474239, 650989, 893613, 1226663, 1683842, 2311411, 3172875, 4355409, 5978673, 8206928, 11265657, 15464376, 21227960, 29139636, 40000000]
2021-04-09 10:41:12.030 | INFO     | __main__:get_cache_size_and_parameter_list:158 - cdn1_500m_sigmetrics18.tr LRB ignore config: memory_window=10000000
2021-04-09 10:41:12.030 | INFO     | __main__:get_cache_size_and_parameter_list:165 - cdn1_500m_sigmetrics18.tr LRB ignore config: n_early_stop=[-1]
2021-04-09 10:41:12.037 | INFO     | __main__:get_validation_tasks_per_cache_size:54 - For cdn1_500m_sigmetrics18.tr/size=1099511064826/LRB, memory windows to validate: [4095, 5622, 7718, 10594, 14543, 19963, 27404, 37617, 51637, 70883, 97301, 133565, 183345, 251678, 345479, 474239, 650989, 893613, 1226663, 1683842, 2311411, 3172875, 4355409, 5978673, 8206928, 11265657, 15464376, 21227960, 29139636, 40000000]
n_task: 150
 generating job file to /tmp/1617982872.job
first task: bash --login -c "$WEBCACHESIM_ROOT/build/bin/webcachesim_cli cdn1_500m_sigmetrics18.tr LRB 274877766207 --dbcollection=dev --enable_trace_format_check=0 --segment_window=1000000 --real_time_segment_window=600 --uni_size=0 --is_metadata_in_cache_size=0 --dburi=mongodb://127.0.0.1:27017/mydb?compressors=disabled --sample_rate=64 --batch_size=131072 --max_n_past_timestamps=32 --num_iterations=32 --num_leaves=32 --num_threads=4 --learning_rate=0.1 --objective=byte_miss_ratio --n_edc_feature=10 --range_log=1000000 --version=opensource --n_req=500000000 --n_early_stop=100000000 --memory_window=893613 --task_id=1617982872040439" &> /tmp/1617982872040439.log

parallel -v --eta --shuf --sshdelay 0.1 -S 1/: < /tmp/1617982872.job
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2021, March 22). GNU Parallel 20210322 ('2002-01-06').
  Zenodo. https://doi.org/10.5281/zenodo.4628277

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.


Computers / CPU cores / Max jobs to run
1:local / 1 / 1

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 150 AVG: 0.00s  local:1/0/100%/0.0s

And it just gets stuck there. I was wondering whether this gets running correctly? And how I am able to view the tuning result?

having size cache of zero

Hey, I wonder why I get at every run of the simulator that the cache size is zero?
I'm getting the following line:
cache size: 0/1099511627776 (0)
Do you have an idea why is it happens?

something wrong with mongodb

Hello , I met a problem recently.
when i run " docker run --network="host" -it -v /home:/trace sunnyszy/webcachesim wiki2018.tr LRU 1099511627776 --dburi=mongodb://localhost:27017/cachebase "
there is something wrong

warning: db connection failed: Invalid namespace specified 'cachebase.': generic server error

compilation error

Exception opening file

a bug in LHD

There is a bug in LHD, because size is set after LHD initialization, LHD uses -1/INT64_MAX to initialize explorerBudget, which effectively becomes infinite.

Wikipedia traces download link not working

The wikipedia traces download link is not working

Smaller Traces available?

Hi!

I am trying to work with this code and run it on my laptop, but the wikipedia traces are huge and given the compute used in the paper, I'm not sure if this is feasible or not. I had two questions about this:

Are there any (much) smaller traces that are (a) available and (b) would work? (At least compared to the Wikipedia dataset, which is huge)
Is it possible to adjust the cache parameters such that it would work feasibly on a laptop? Specifically an M2 MacbookAir or an x86 macbook.

Thanks! Any help is appreciated
-Kyle

Final json string is not printed for LeCaR

Hi,
When I was trying to run LeCaR, for example, by using this command sudo docker run -it -v ~/lrb/trace:/trace sunnyszy/webcachesim wiki2018.tr LeCaR 549755813888 > wiki18_LeCaR_512GB.txt, in the txt I did not see the final result json string printed. Instead I just have

And it got stuck there. I was wondering why this would happen?

Calculate hit rate

Hi, thanks for open-sourcing this amazing project. In your paper you focus on the byte miss ratio, but I prefer to know the general hit rate.

So In the simulation.h, I added two counters inside the class Framework, as uint64_t cnt = 0, hit = 0;
And in the similation.cpp, in the simulate() function,
originally,
if (is_admitting) {
bool is_hit = webcache->lookup(*req);
if (!is_hit) {
update_metric_req(byte_miss, obj_miss, size);
update_metric_req(rt_byte_miss, rt_obj_miss, size)
webcache->admit(*req);
}
} else {
update_metric_req(byte_miss, obj_miss, size);
update_metric_req(rt_byte_miss, rt_obj_miss, size)
}
Now,
cnt++; //here!!!
if (is_admitting) {
bool is_hit = webcache->lookup(*req);
if (!is_hit) {
update_metric_req(byte_miss, obj_miss, size);
update_metric_req(rt_byte_miss, rt_obj_miss, size)
webcache->admit(*req);
} else {
hit++; //here!!!
}
} else {
update_metric_req(byte_miss, obj_miss, size);
update_metric_req(rt_byte_miss, rt_obj_miss, size)
}

And finally I use hit / cnt to get the hit rate. Is this correct? Thanks a lot!

how to stop sanity check

by adding some params , how can i stop sanity check

Which papers do LR and UCB come from?

Hi, @sunnyszy!

I want to know which papers LR (linear-regression based ML caching) and UCB come from.

Thank you for your help!

if I want run the algorithm as baseline

hello, Recently I've been doing research on access algorithms, I want to compare the performance of the LRB before adding the admit policy. Do I need to turn the bloom filter on to run the baseline?

How did you collect the Dataset?

Hi,

Thanks for you interesting work and making code and dataset available.

I just wanted to know where did you get the dataset? Did you collect it?
I appreciate it if you can give me some information.

Reporting Good Decision Ratio

Hi!

I was wondering if there was an easy way of reporting the good decision ratio. I see that the existing behavior already reports byte miss ratio, and was wondering how hard it would be to also report the good decision ratio, and any guidance/pointers to particular files to look at to do so. Also, is segment_window or real_time_segment_window the Belady boundary? How would I change that parameter when running LRB on a trace?

Much thanks!
-Kyle

fail sanity check

Hi, I would like to ask when I run the docker, n_req stops at 658000000 and the command lines show these:
"req: 658361220, key: 2334 size inconsistent. Old size: 356 new size: 35
terminate called after throwing an instance of 'std::runtime_error'
what(): fail sanity check"
Would you please tell me the reason, I'm a green hand at this and sorry to bother you!

Wiki trace format

Hi, I was wondering whether the object's size in the wiki18 trace is in bytes or kilobytes?
Thanks.

LRB result not printed

Hi, I was using the cli to run wiki_18.tr trace, but I found out that LRB final result is not printed for 256 GB and it runs for a very long time now (other algorithms like below all finished)

Result folder:

Log folder (the last line of LRB log):

Is it still running or is there any problem that makes it stuck? When I run the 512 GB the tot seq is 2800000000.

Unexpected behaviour of the simulator on sample trace

I get unexpected exceptions for sample trace shown below, normally the LHD algorithm should ignore the request that requires more memory than current cache size.

ubuntu@x:~/trace$ cat sample_trace.tr 
0 0 10
1 1 10000000
ubuntu@x:~/trace$ docker run -it -v /home/ubuntu/trace:/trace sunnyszy/webcachesim sample_trace.tr LHD 9
/opt/webcachesim/build/bin/webcachesim_cli sample_trace.tr LHD 9 
running sanity check on trace: /trace/sample_trace.tr
n_extra_fields: 0
n_req: 0
pass sanity check
unrecognized parameter: n_extra_fields
simulating
Request too big: 84 > 9

For other algorithms it works as expected:

ubuntu@x:~/trace$ docker run -it -v /home/ubuntu/trace:/trace sunnyszy/webcachesim sample_trace.tr LRU 9
/opt/webcachesim/build/bin/webcachesim_cli sample_trace.tr LRU 9 
running sanity check on trace: /trace/sample_trace.tr
n_extra_fields: 0
n_req: 0
pass sanity check
simulating

seq: 2
cache size: 0/9 (0)
delta t: 0
segment bmr: 1
rss: 11653120
{ "trace_file" : "sample_trace.tr", "cache_type" : "LRU", "cache_size" : "9", "no_warmup_byte_miss_ratio" : 1.0, "segment_byte_miss" : [ 10000010 ], "segment_byte_req" : [ 10000010 ], "segment_object_miss" : [ 2 ], "segment_object_req" : [ 2 ], "segment_rss" : [ 11653120 ], "segment_byte_in_cache" : [ 0 ], "real_time_segment_byte_miss" : [ 0, 10000010 ], "real_time_segment_byte_req" : [ 0, 10000010 ], "real_time_segment_object_miss" : [ 0, 2 ], "real_time_segment_object_req" : [ 0, 2 ], "real_time_segment_rss" : [ 11653120, 11653120 ], "simulation_time" : "0", "simulation_timestamp" : "2023-11-30 19:55:29" }
warning: db connection failed: an invalid MongoDB URI was provided

Cache Size changes during simulation

Hi,

Thanks for sharing the code. I encountered an issue when running the program. Why does the cache size change during the simulation? The following is the output I got when I set the cache_size parameter to be 33554432, but from the second seq, the cache_size somehow increases to be very large.

seq: 1000000
cache size: 33491309/33554432 (0.998119)
delta t: 23.916
segment bmr: 0.814654
rss: 128008192
in/out metadata: 1273 / 72420
memory_window: 67108864
percent_beyond: 6.32882e-06
feature overhead per entry: 104
sample overhead per entry: 26
n_training: 41691
training_time: 192.614 ms
inference_time: 39.1755 us

seq: 2000000
cache size: 9462298711/18446744073615097856 (5.12952e-10)
delta t: 2.621
segment bmr: 0.268707
rss: 141557760
in/out metadata: 75467 / 30402
memory_window: 67108864
percent_beyond: 0
feature overhead per entry: 108
sample overhead per entry: 56
n_training: 62444
training_time: 249.561 ms
inference_time: 39.1755 us

seq: 3000000
cache size: 14666415799/18446744073601548288 (7.95068e-10)
delta t: 2.2
segment bmr: 0.124146
rss: 149803008
in/out metadata: 110383 / 21958
memory_window: 67108864
percent_beyond: 0
feature overhead per entry: 111
sample overhead per entry: 76
n_training: 59373
training_time: 281.636 ms
inference_time: 39.1755 us

.......

Training LRB on non-CDN data

I want to get some information on the ML model itself and I am not sure if it's in this repo. I am looking to train it on my specific dataset unrelated to CDN. I am not quite sure if the extra_features column allows me flexibility or if there is somewhere else I should look.

Any information would be helpful and thanks so much!

How to set the value of Belady boundary?

In the paper of LRB, the Belady boundary is defined as the minimum time-to-next-request of objects evicted by Belady’s MIN algorithm. However, the MIN algorithm is offline, so it is reasonable to assume that the Belady boundary is approximately stationary and is updated continuously during the machine learning warmup period.

But in practice, Belady boundary is set to the length of sliding memory window which is equal to 67108864. I am confused whether the Belady boundary should be implicitly related to the cache capacity and the average size of cached objects.

Thank you and have a nice day!