comcast / superior-cache-analyzer Goto Github PK

View Code? Open in Web Editor NEW

14.0 6.0 7.0 168 KB

A tool for inspecting the contents of Apache Traffic Server caches

License: Apache License 2.0

Python 98.83% Shell 1.17%

apache traffic cache analysis analyzer ats traffic-server proxy-cache proxy-server

superior-cache-analyzer's Introduction

SCAN

Superior Cache ANalyzer

F.A.Q

Q: Scan says it can't find my cache file, but I know it's there. What do?

A: Sometimes, ATS installs point to cache files that are installed relative to the ATS root directory. This is pretty common in test setups right after a basic install. There's not really any way for SCAN to 'detect' when this is happening (though it will try), so the best solution is often just to try running scan from that ATS root directory. For example, if you have a directory /opt/trafficserver that holds all of the trafficserver files, try going to that directory before running scan.
Q: Scan is giving me an error, and I don't know what it means/how to fix it. How fix?

A: Congratulations, you've just been drafted! Try running scan like this: scan --debug <other options that you specified last time> 2>scan.err (you may not see the error message this time) and then create a github issue and upload/pastebin/^C^V the scan.err file that should've been created and link/paste it into the box, along with a description of what you were trying to do and what went wrong. I'll fix it as soon as I can.
Q: Why can't Scan see the thing that I KNOW is in the cache?

A: It's possible that you scaned for 'thing' before it was written. The Apache Traffic Server^TM will only sync directories every 60 seconds by default (effectively, this means scan can only see cache changes at that frequency). You could either wait a bit, or set the ATS configuration parameter proxy.config.cache.dir.sync_frequency to a lower value (in seconds). If that doesn't work, check out the above question.

User Guide

SCAN's primary use is as a library for inspecting Apache Traffic Server^TM (ATS) caches. SCAN also provides a command-line utility (scan), which is described here.

Installation

Prerequisites

scan requires the following dependencies:

numpy - A highly-performant library for working with vectorized functions on huge data structures (used for reading/manipulating cache directories) link.
psutil - A cross-platform process and system interface library (used for ionice setting) link.
setuptools - "Easily download, build, install, upgrade, and uninstall Python packages" link.
typing - Provides a backport of type-hinting for old versions of Python (< v3.5) link.

If you have Python version 3.5 or greater, you already have typing. If you have pip3 (for any Python version > 3.4.0), you likely already have setuptools. To install a dependency "DEP" on Python versions < 3.5, simply run sudo -H pip3 install DEP. If you don't have pip3, then to install either dependency on CentOS/Fedora/RHEL distros, do sudo yum install -y python34-DEP, on Ubuntu/Mint/Debian distros do sudo apt-get install python3-DEP, and on Arch/Manjaro/Gentoo(?) do sudo pacman -S python3-DEP. If you need the dependencies and you're on MacOS/BSD/Windows, then gods help you - I can't.

Installing 'scan'

Via `pip`

By far the easiest way to install SCAN is to simply use pip like so:

pip install Superior-Cache-ANalyzer

Note that you'll probably need to run that command as an administrator (Windows), with sudo (Everything Else), or with the --user option (Everything Including Windows)

From a Release

On the Releases page you can download the wheel (the .whl file) and install that manually with

sudo -H python3 -m pip install -y /path/to/Superior-Cache-ANalyzer.<version stuff>.whl

Note that this may require you to upgrade/install the pip module, so if you get an error like No module named 'pip' try installing the python3-pip package (python34-pip on RedHat/CentOS/Fedora) or running sudo -H python3 -m ensurepip. Other errors could possibly be fixed by running sudo -H python3 -m pip install -yU pip and then trying the install again. If all else fails, then you can probably install from source.

From Source

To install from source, you'll first want to download the source from the Comcast Github. Once you've done that, go to the downloaded folder and run

sudo -H pip3 install .

... or, if you don't have pip3:

sudo python3 setup.py install

Note that SCAN is only guaranteed to work for Python versions 3.4.1 and greater. If you want to run the tests see 'Tests' below.

Usage

The basic usage of scan is pretty simple at the moment; to start the utility simply run:

scan [ --debug ] [ -f --fips ] [ -d --dump [ SPAN ] ] [ -c --config-dir DIR ] [ --tgm ]
scan [ --debug ] [ -f --fips ] [ -D --dump-breakdown [ SPAN ] ] [ -c --config-dir DIR ] [ --tgm ]

where the options have the following meanings:

-c or --config-dir DIR

This option allows you to directly specify the config dir of your ATS install. This allows you to skip the prompt when scan first starts where you must input your configuration directory. In non-interactive mode (-d/--dump given), this option must be used if ATS is not installed under /opt/trafficserver.
--debug

When provided, this flag causes scan to output some verbose debugging information and exception stack traces. It also causes it to be run without optimization, which - depending on your Python interpreter - can have a serious impact on performance.
-d or --dump [SPAN]

Dumps the contents of the cache in Tabular YAML format to stdout, then exits. This will cause any -D/--dupm-breakdown flags given to be ignored. If specified, SPAN should be the path to a cache span to dump as specified in storage.config e.g. /dev/sdk. WARNING: As of the time of this writing, scan's "ionice" value is being set to the lowest possible value on startup, which means that this operation could take several hours to complete if you do not specify a single span. Currently, if you do not use the -l or --loadavg option, it takes about 400-500 seconds to dump a 1TB hard disk cache and about 3-7 seconds to dump an 8GB RAM cache. Use of this option with -l or --loadavg is not recommended at this time, as it will radically increase the time it takes to complete.
-D or --dump-breakdown [SPAN]

Dumps the usage of the cache to stdout in Tabular YAML format, broken down by host, then exits. If -d/--dump was given on the command line, this flag will be ignored if present. If specified, SPAN should be the path to a cache span to dump as specified in storage.config e.g. /dev/sdk. WARNING: As of the time of this writing, scan's "ionice" value is being set to the lowest possible value on startup, which means that this operation could take several hours to complete if you do not specify a single span. Currently, if you do not use the -l or --loadavg option, it takes about 400-500 seconds to dump a 1TB hard disk cache and about 3-7 seconds to dump an 8GB RAM cache. Use of this option with -l or --loadavg is not recommended at this time, as it will radically increase the time it takes to complete.
-f or --fips

You must use this option if the ATS running on your system was compiled with ENABLE_FIPS enabled. If you don't, everything will be messed up. Actually, some things will still be messed up even if you do.
-l or --loadavg LOADAVG

This flag allows the specification of a maximum system load average to be respected by the program. This is expected to be a comma-separated list of floating-point numbers (see man uptime). For example: scan -l "25.0, 25.0, 25.0" ensures that no more than 25 processes will be waiting for CPU time or disk I/O on average ever 1, 5 or 15 minutes. Note that this option assumes that the system's loadavg at the time scan starts is representative of the system's loadavg for the entirety of its execution; if you start a very long scan job on e.g. a 1TB span, and then decide to play Crisis 1 on Medium settings using integrated graphics, your system may very well exceed a specified maximum loadavg, through no fault of scan itself. Note that if your system is already at or above the LOADAVG specified, scan will immediately exit as it cannot possibly run. (Implementation note: effectively this controls the number of sub-processes that can be used to scan a stripe at once, since each sub-process is potentially another process that will wait for CPU time or Disk I/O.) Note that this is only available on POSIX-compliant systems. Usage of this flag alongside -d or --dump is discouraged.
--tgm

Toggles 'God Mode'. When passed on the command line, scan will not set its ionice level, and will ignore any -l/--loadavg flags provided. Using this can slow down your system, and may even have a non-negligible impact on running Apache Traffic Server instances.
-V or --version

Prints the version information and exits. This will print both scan's version and then on the next line the version and implementation of the Python interpreter used to run it. This second line would - for example - usually look like the follow on CentOS7.x systems: Running on CPython v3.4.5.

Once the utility is started (provided the -d/--dump or -c/--config flags are not given) you'll be faced with a pretty basic prompt. At first, your only option will be [1] Read Storage Config. After you select this option, you'll be prompted to enter the location of your ATS configuration files. "Tab-completion" is supported for most interactive prompts, including the ATS configuration file prompt. SCAN will expect all of them to be in the same directory, and will guess that they are in /opt/trafficserver/etc/trafficserver/ by default. Note that the use of FIPS at compilation time cannot be determined from the config files, and MUST be given on the command line. Once the configuration has been read, all menu options will be unlocked. They are as follows:

`[1] Show Cache Setup`

This option will print out the spans and volumes declared in the configuration. Output will look like:

Cache files:
/path/to/a/span	Span of <n> stripes	XXX.XB

Volumes:
#1	<type>	XXX.XB

where <n> is the number of stripes in the span on that line, and XXX.XB is the size of a span/volume (but it will be displayed in human-readable approximations in units of B, kB, MB, or GB as appropriate). Volumes defined as a percent of total storage will have their size calculated at runtime, and displayed in absolute terms. <type> will be the type of volume declared. In nearly all cases, this will be http, but certain plugins could define other volume types. Finally, it should be noted that while this example shows one volume on one span, this menu option will display all volumes and all spans, in no particular order and with no distinction between cache spans on files, block devices, or ram devices.

`[2] List Settings`

This option will list the settings declared in records.config, in proper ATS syntax. An example:

proxy.config.log.collation_host STRING NULL
proxy.config.ssl.compression INT 1

Only one or two of these settings actually has any impact on the function of scan, but all values are read in to facilitate future extension.

`[3] Search for Setting`

This option will bring up a prompt to type a search string for a specific setting from records.config. Python-syntax regex is supported and enabled by default (meaning searching for 'proxy.config' will match 'proxyZconfig' as well as the exact string typed).

`[4] List Stripes in a Span`

This option will prompt you to enter a span (which is the full path to the span file) and then list all stripes within it. The output is in the format:

XXX.XB stripe, created Www Mmm D hh:mm:ss (version XX.X)

where XXX.XB is the size of a stripe (but it will be displayed in human-readable approximations in units of B, kB, MB, or GB as appropriate), Www Mmm D hh:mm:ss is the date of the stripe's creation (in the system's ctime(3) format) and XX.X is the decimal-separated major and minor version numbers of the cache system that created it. Note that this version is not the same as the version of ATS using the cache. Also note that as of this time only version 24.0+ is supported by scan, and using lower versions with scan will cause to crash and/or give incorrect output.

`[5] View URLs of objects in a Span`

When selected, this option will first prompt you for a span. It will then search all of the stripes on that span for stored objects, and catalog their URLs, printing them to the screen as they are found. Each URL is printed in the format:

protocol://[[user]:password@]host/path/to/content	 - XXX.XB - 	x<Y>

where protocol is the protocol used to retrieve the content (nearly always http or https), [[user]:password@] is the username (if used, usually not) 'colon' password (if used, usually not) used to access the content 'at' the host - which is the fully-qualified domain name of the content host, and path/to/content is the location on that host of the content stored in the cache. A typical example of a path is images/test/testquest.png. XXX.XB is the size of this content (but it will be displayed in human-readable approximations in units of B, kB, MB, or GB as appropriate). Finally, <Y> will be the number of times this same URL is stored in the cache (typically in 'alternate' forms). For example, if a given item is stored only once in the cache span, its line will end in x1, and if it is encountered 42 times, then it will end in x42. Note that the size of a given object is reported as the size of one instance of this item, regardless of the number actually stored.

Warning: When tested on a span of a single, roughly 830GB stripe, this operation took between 39 and 44 seconds to complete. Be aware that the time this takes is directly proportional to the size of the spans, and the number of spans that it is searching. However, results are cached so that subsequent searches (or uses of menu option 6) on the same span should be significantly quicker. To help recognize that the program has not frozen, findings are printed to the screen as they are found, and the main menu will display upon completion.

`[6] View Usage of a Span broken down by host`

This option will first prompt for a span, then it will list the hosts that have content stored in that span, as well as the total storage size used, the storage size as a percent of the total available storage, and the storage size as a percent of the storage currently in use. The output format for each host is as follows:

<host>	 - XXX.XB - 	YY.YY% of available space - 	ZZ.ZZ% of used space

where <host> is the fully-qualified domain name of the host, XXX.XB is the total size of that host's content on disk (but it will be displayed in human-readable approximations in units of B, kB, MB, or GB as appropriate), YY.YY is the percent of available space taken up by this host's content, and ZZ.ZZ is the percent of space currently being used to store objects that is taken up by this host's content.

Warning: When tested on a span of a single, roughly 830GB stripe, this operation took between 39 and 44 seconds to complete. Be aware that the time this takes is directly proportional to the size of the spans, and the number of spans that it is searching. However, results are cached so that subsequent searches (or uses of menu option 5) on the same span should be significantly quicker. To help recognize that the program has not frozen, findings are printed to the screen as they are found, and the main menu will display upon completion.

`[7] Dump cache usage stats to file (Tabular YAML format)`

This option will ask you to first name a file for output (relative or absolute paths - doesn't matter which), then it will dump the output of a call to the 'View URLs of objects in a Span' for ALL spans in the cache system to the named file in Tabular YAML (TYAML) format (which is just YAML but indented with tabs instead of spaces and accepts None as a null value.)

Tests

If you want to run the tests, be sure you're in the project's root directory and run the test.sh script. Note that the unit tests will download and attempt to build Apache Traffic Server from source and as such will also require all of the dependencies of Apache Traffic Server. A minimal linting test (Good for auditing your contribution at a glance) can be run with pylint by just running pylint --rcfile=./.pylintrc scan/ from the project's root directory. A pylint score above 9.5 and with no erros (e.g. E001: SyntaxError) is considered "passing".

Tabular YAML Format

The output of the interactive mode's 7^th option and the -d or --dump option are given in what's been referred to as "Tabular YAML Format". As the name implies, this is similar to YAML. In fact, it should be considered syntactically identical to YAML but for one exception: indentation is always done via the tab character, never with spaces. This was done because without harming its human readability, it allows for much easier pipelining of output e.g. via cut.

superior-cache-analyzer's People

Contributors

Stargazers

Watchers

Forkers

wjnforever limited dalavancloud ericfrazer arnold-maderthaner ghas-results ntheanh201

superior-cache-analyzer's Issues

`ui` Module Tech. Debt

The ui module is a mess of repeated code, especially for dumping cache info. There's also some stuff that could be just cleaner in general.

URL parsing issues on Python 3.6

On Python v3.6 (+?) the debug output of scan shows

DEBUG: TypeError('must be str, not numpy.int64',)

when attempting to read in some (many, actually) URLs. Suspect that the URL.__str__ method is to blame, trying to coerce something to the wrong type.

Performance enhancements

Currently, the amount of time it takes to dump an entire node in production (~20 1 TB HDD caches and ~4 8GB RAM device caches) with a loadavg specified that allows for less processes than the number of available cores is frankly unacceptably long. Profiling seems to indicate that nearly 98% of any given dump is spent in the read calls which... is pretty much exactly what you want. I don't know where to go from there, I can't optimize system calls from my end.

Better HdrHeap Handling

Most of the interesting data (request/response headers etc.) is stored in alternates in a marshaled HdrHeap struct. I don't understand very well how this structure works, and the current method of extracting a URL from one them involves some assumptions and guesswork. To go further, a deeper understanding is needed (I'm looking at you, amc).

Calculation for offset of Copy B metadata is wrong

When attempting to find the up-to-date directory for a stripe, for certain (very large) stripes the offset for the copy B metadata will not be calculated correctly, and will result in scan being unable to initialize.

Sorted outputs

Currently, various outputs such as e.g. "Show Cache Setup" print their outputs in a seemingly random order. For a friendlier, more consistent UX, such things ought to be sorted alphabetically.

Command-Line parameter for config directory

When you start scan in interactive mode, it'll ask you where your config files are. But if you use -d/--dump, then it just assumes that they can be found in /opt/trafficserver/etc/trafficserver, which is definitely not standard. It'd be nice if there was something like -c/--config-dir to specify on the command line.

Dump by hostname/domain

Currently, -d/--dump only supports output sorted by url. However, it'd be nice to be able to dump all of the content stored according to hostname/domain, e.g.

...
steampowered.com:
    docs: 170
    total_size: 38GB
play.google.com:
    docs: 82
    total_size: 5.2GB
...

Windows tab-completion support

Currently, the package doesn't even build on Windows because of the readline dependency. A possible solution would be based on pyreadline.

Implement query/params in URL string coercion

Currently, if you print a URL object, you get something like https://example.com/example.html . However, many request URLs include query strings and parameters like e.g. https://example.com/example.html?querystring&param=val . In general, this information should exist when the URL is created, all that needs to be done is some printing logic in URL.__str__ (defined as URLtoString).

Unit Tests crash while looking for a span

test.sh, including on TravisCI crashes looking for a span


Traceback (most recent call last):
  File "./test.py", line 275, in <module>
    exit(main())
  File "./test.py", line 264, in main
    results = testSpan()
  File "./test.py", line 53, in testSpan
    s = config.spans()['storage/cache.db'][1]
KeyError: 'storage/cache.db'

The config.spans() dict actually contains keys of an absolute path instead of a relative path

scan crashes while reading config files

Traceback (most recent call last):
  File "/bin/scan", line 11, in <module>
    sys.exit(main())
  File "/usr/lib/python3.6/site-packages/scan/__init__.py", line 206, in main
    ui.mainmenu()
  File "/usr/lib/python3.6/site-packages/scan/ui.py", line 557, in mainmenu
    output = MENU_ENTRIES[choice][1]()
  File "/usr/lib/python3.6/site-packages/scan/ui.py", line 142, in getConfig
    config.init(choice)
  File "/usr/lib/python3.6/site-packages/scan/config.py", line 181, in init
    num = readStorageConfig()
  File "/usr/lib/python3.6/site-packages/scan/config.py", line 335, in readStorageConfig
    caches = parseStorageConfig(contents)
  File "/usr/lib/python3.6/site-packages/scan/config.py", line 311, in parseStorageConfig
    ret[cache] = (utils.fileSize(cache), span.Span(cache))
  File "/usr/lib/python3.6/site-packages/scan/span.py", line 70, in __init__
    spanblock.read()
  File "/usr/lib/python3.6/site-packages/scan/stripe.py", line 443, in read
    if B[0] == this.MAGIC and B[10] > A[10]:
NameError: name 'this' is not defined

config files could not read in scan

Error in config file - cache not found: [Errno 20] Not a directory: '/opt/trafficserver/etc/trafficserver/storage.config/records.config'
Choose an option (or option number)

[1] Read Storage config

(option, or use ^C or ^D to quit):

Retrive Objects by URL

This sounds pretty basic (actually doing it is a total nightmare), but it's a feature S.C.AN. doesn't have. You should be able to pull an object out of the cache by looking up a specific URL.

FIPS support

If ATS is compiled with "FIPS" enabled, the size of INK_MD5 structure changes, which causes cascading effects in the cache, since Docs contain two of them. At this time, some very minor handling is being done for this, but more work needs to be done to determine what exactly this changes (from the cache's POV). Plus, I wouldn't hate it if someone could tell me what FIPS even is...

Invalid argument

The cache analyzer shows the following

Choose an option (or option number)

[1] Read Storage config

(option, or use ^C or ^D to quit):

after I input the path to my configs.

My configs look like as follows

storage.config
/dev/sda4
/dev/sda3
volume.config
volume=1 scheme=http size=30%
volume=2 scheme=http size=70%

Use builtin logging

When I made this, I didn't know anything about the logging module, so I've kind of rolled my own. It'd be better (and likely much faster) to use the builtin library for that.

Un-Pythonic config system

The way scan currently works is you must call config.init(<config_dir>) before reading in a span or really doing anything at all. That's super un-Pythonic, so at the very least it'd be nice to move that logic into __init__.py, since it always must be done to use the package. Not sure how that'd go with importing as a module though, since you can't really pass arguments to import, and ATS can legally exist anywhere on the system.

Readline fixes

Currently, the readline library isn't behaving as expected. On the initial test system, tab-completions appeared to be working, but this is no longer the case. It should ideally feature tab-completion wherever textual input is expected: file system completion for the config file prompt, setting completion for setting search, and tab completion for span files.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.