yarn-logs-helpers

Scripts for parsing / making sense of yarn logs.

`yarn-container-logs`

The main script of note here is yarn-container-logs:

$ yarn-container-logs 0018

It can take a full application ID (e.g. application_1416279928169_0018) or just the last 4 digits of one (0018).

It downloads the YARN logs for that application into a local directory (defaulting to the application ID, but can be overriden with an optional second argument, after the app ID) and splits them into per-container files:

# Directory created by yarn-container-logs
$ cd application_1416279928169_0018

# Directory with per-container logs
$ cd containers

# Per-container log files have prefix /container_/
$ ls container_*
container_1416279928169_0018_01_000015
container_1416279928169_0018_01_000016
container_1416279928169_0018_01_000017
...

# The files contain exactly what was pulled down from YARN.
$ head container_1416279928169_0018_01_000015
Container: container_1416279928169_0018_01_000015 on my-node-11-10.rest.of.domain.name_port
===================================================================================================
LogType: stderr
LogLength: 700
Log Contents:
...

It also creates a directory per node (a.k.a. "host") containing symlinks to the log-files of all containers that ran on that node:

$ cd hosts
$ ls -l my-node-*
my-node-08-1:
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000065 -> ../container_1416279928169_0018_01_000065
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000094 -> ../container_1416279928169_0018_01_000094
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000123 -> ../container_1416279928169_0018_01_000123
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000258 -> ../container_1416279928169_0018_01_000258
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000338 -> ../container_1416279928169_0018_01_000338

my-node-08-10:
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000041 -> ../container_1416279928169_0018_01_000041
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000158 -> ../container_1416279928169_0018_01_000158
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000275 -> ../container_1416279928169_0018_01_000275
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000354 -> ../container_1416279928169_0018_01_000354
lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000424 -> ../container_1416279928169_0018_01_000424
...

This functionality lives in rename-and-link-hosts.
In this example, the per-node directories have had a shared suffix of the form .rest.of.domain.name_<port> removed for brevity; this is enabled by setting the $YARN_HELPERS_DROP_HOST_SUFFIX_FROM environment variable; see the Installing section for more details on setting $YARN_HELPERS_DROP_HOST_SUFFIX_FROM.

Spark-specific parsing

A common use case is parsing logs from Spark apps running on YARN, for which yarn-container-logs has some specific functionality:

It can identify the logs corresponding to Spark driver containers. It greps all container logs for spark.SparkContext to identify drivers (you can override this by setting the $YARN_HELPERS_DRIVER_GREP_NEEDLE environment variable), and creates symlinks to them in the drivers directory:
```
  $ ls -l drivers
  lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 0 -> ../container_1416279928169_0018_01_000015
  lrwxrwxrwx 1 <user> <group> 41 Nov 20 04:42 container_1416279928169_0018_01_000015 -> ../container_1416279928169_0018_01_000015
```
If exactly one was found, an additional top-level driver symlink will point to it:
```
  $ ls -l driver
  lrwxrwxrwx 1 <user> <group> 9 Nov 20 04:42 driver -> drivers/0
```
This functionality lives in link-driver-logs.
It will create a tids directory and populate it with symlinks for each Spark task ID that it finds evidence of in the logs to the container-log-file where that TID seemingly ran.
- This functionality lives in link-tids.
- Note that if a Spark job had multiple Application Masters ("drivers"), it will likely have had multiple tasks with some task IDs, which will cause errors to be emitted by this stage. See discussion at #2.

Stack Trace Parsing / Histogram

yarn-logs-stack-traces uses a stack-trace-parsing library on the output of yarn-logs. Example usage:

$ yls 0018 -d  # -d means "show a histogram in descending order"
635 stacks in total

71 occurrences:
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 4
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:386)
        at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$1.apply(MapOutputTracker.scala:383)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        ...
        at java.lang.Thread.run(Thread.java:744)

60 occurrences:
java.io.IOException: Failed to connect to demeter-csmaz11-16.demeter.hpc.mssm.edu/172.29.46.86:33263
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:141)
        at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
        at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
        ...
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)

...

Other Miscellaneous Scripts

This repo contains several other scripts that basically wrap YARN commands in calls to yarn-appid, allowing last-4-lookup of application IDs:

yarn-kill: wrapper for yarn application -kill <appid>.
yarn-logs: wrapper for yarn logs -applicationId <appid>.
- yarn-logs-less: pipes yarn-logs to less.

Installing

Download this repository with:

    git clone --recursive https://github.com/hammerlab/yarn-logs-helpers.git

In your .bashrc (or equivalent), source .yarn-logs-helpers.sourceme:

    $ source /path/to/repo/.yarn-logs-helpers.sourceme

This will:

try to fetch your cluster's ID using the yarn-refresh-cluster-id script.
- If found, the result will be cached in $yarn_cluster_id_file (default: $HOME/.yarn-cluster-id).
- This will allow all scripts in this repo to look up YARN application IDs by their last 4 digits (using yarn-appid).
set aliases for most functionality in this repo.
add the root directory of this repo to your $PATH.

Env vars

Setting $YARN_LOGS_USER may allow yarn-container-logs to fetch logs from apps run by users other than you.

You can set it permanently in your .bashrc to a user that has permissions to read all YARN users' logs, or just on the cmdline for one call:

YARN_LOGS_USER=someone yarn-logs 1234

You may also want to export YARN_HELPERS_DROP_HOST_SUFFIX_FROM (discussed above):

    # Pattern for abbreviating host names when creating per-host log directories.
    export YARN_HELPERS_DROP_HOST_SUFFIX_FROM=".rest.of.domain.name_"

`stack-traces` submodule

Finally, ryan-williams/stack-traces is included in this repository as a git submodule, and used by yarn-log-stack-traces.

You'll need to git clone --recursive when you check out the project, or run git submodule init && git submodule update from within the stack-traces subdirectory, for it to work. git-scm.com has a good intro to using git submodules if you are not familiar.

With those done you should be all set!

gsquare94 / yarn-logs-helpers Goto Github PK