jeremydouglass / edger Goto Github PK

an edge list converter

Processing 100.00%

edger's Introduction

edger: an edge list converter

record simple graphs, convert into multiple formats

Edger is a simple batch processor for textual graph data. It takes a directory of text files, parses them, and outputs graph files, images, and stats logs for them.

It was developed for use in data processing of interactive narratives (such as gamebooks).

Install

Edger is implemented as a cross-platform Processing(Java) sketch -- it can be run in the Processing Development Environment (PDE) or exported from PDE to a standalone application.

Install Processing
Download Edger
(optional) Install Graphviz for Mac or Windows to enable PNG image output.
(optional) Export an application
- Launch Edger.pde in Processing
- File > Export Application to create a Mac or Win app.

Edger relies on Graphviz being installed separately in order to perform for image rendering, although it will run without it. It also uses the GraphStream core for summary statistics -- which is built-in.

Use

To use Edger as a Processing sketch:

Launch Edger.pde
Press Run (">")
Select working directory (location of txt files
Edger will process files and produce output
- Click floating windoe to re-process files
- SPACE to toggle PNG image generation
- ESC or Quit when finished

To use Edger after exporting it as an application:

Launch Edger.app / Edger.exe
Select working directory (location of txt files
Edger will process files and produce output
- Click floating windoe to re-process files
- SPACE to toggle PNG image generation
- ESC or Quit when finished

On run, Edger requests a working directory, and processes all .txt files in that directory. Original text files are untouched, with output files are replaced each re-run. Note that if source file names change then old graph and image outputs may be left behind -- although this will be visible by checking file dates.

Output

For each input text file name.txt, Edger outputs:

/gv/name.gv: a Graphviz DOT file (for use with Graphviz)
/tgf/name.tgf: a Trivial Graph Format file (for use with yEd)
/log/name.log: a log files of graph descriptive statics
/gv/name.gv.png: an image, rendered by Graphviz

In addition, for each batch of files processed it produces:

/log/_graph_stats.log.csv: a summary file of key statistics

Input

Edger processes a directory of plain text files (.txt). Specifically, these text files are sparse edge lists, a custom graph data format designed for quick data entry. This means that Edger supports the simple edge list format:

1 2
2 3
2 4

...as well as numerous extensions to the edge list format, including:

whitespace
graph labels
code comments
sparse entries

Here is an example of a sparse edge list:

# File is tab-separated (tsv)
# Filename ends in .txt

# These are edges, with or without comments
1 2
2 3 edge  # a labeled edge w/comment
3   node  # a labeled node w/comment
4 5
4 8   # separate node lines are optional

# These are whole-line comments
     # ## Comments begin with '#' after any amount of whitespace
# Blank may be used to organize material

# repeat edges may be specified
5 6
5 7
5 8

# repeat edges may have an implied first node
6 9 choice1
  10  choice2
  11  choice3
9 12  c1
  13  c2

# nodes and edges may be listed in any order
1   Start
12    End1
13    End2

# unlabeled node lines
# previous node 2 unaffected
2
# new floating node 100 created 
100

edger's People

Contributors

Stargazers

Watchers

edger's Issues

Buttons

Edger currently has a small collection of hotkeys -- it could use a UI such as G4P to tie functions to a small floating interface.

Use GraphStream for Graphviz export?

Depending on how attributes may be attached to this object, exporting via FileSinkDOT might be a robust replacement for the manual construction of Graphviz DOT templates.

https://data.graphstream-project.org/api/gs-core/current/org/graphstream/stream/file/FileSinkDOT.html

Graphviz call fails on some Win machines

The Graphviz rendering call fails on some Win machines -- even with a correct path. ML laptop is an example.

Node ids restricted to ints

The could/should be supported as strings -- internally and for output formats as appropriate.

Live entry

G4P or ControlP5 probably support a text field that could be checked live -- and used to update a live GraphStream window.

Possibly gratuitous, although it might be helpful for debugging bad data entries.

Logger degreeDistribution crashes on empty txt file

Graphviz style files are brittle

At present the graphviz style files work but they are a brittle hack --

the files mix required fields with optional fields and custom fields
material could be hierarchically organized for lookup -- e.g. JSON
contents are raw fragments of DOT code -- the method can't be generalized to the other renderers
comma separation of arguments isn't handled well -- the final separator is omitted (allowed in DOT, but messy)
the required fields must have contents (can't be empty, or they print invalid "null" text into the gv, endering it un-render-able).

Depth meter for graphviz output

For some rank separated directed graphs it is possible to print a line of nodes down the edge as a visual aid -- kind of "depth meter" indicating which nodes are at depth 1,2,3...20,21 etc.

A template for this can be done fairly easily in a subgraph, but the question is how to determine the appropriate length of the node series, as some graphs are very short and others are extremely long.

Perhaps use GraphStream diameter, which on should almost always correspond -- although this then creates a dependency for the graphviz renderer.

Contiguous node compression (for long linear works)

Add an option to compress numerical node series (continuous runs) into a single node.

Related - option to simplify graphs to just their choices by contracting nodes with in/out degree 1:1.

UI listener mode

Rather than requesting re-runs, Edger could run like a daemon, triggering on a file update and checking every n seconds for updates.

Style preset options

A collection of pre-defined stylesheet options for LR and TD layouts, small and large, bw / grayscale / color.

Subfolder searching

Currently the working directory is flat. Running edger on recursive subdirectories of txt files (Box 1, Box 2, Box 3...) would help when working with large projects, but requires careful thinking about whether output would be pooled at the top level or created per-subdirectory -- each might be desirable in different circumstances.

Use graphviz dot label wildcards (escString)

Currently building custom labels and xlabels is handled in Java using the in-memory label name, but this could be made standard in stylesheets using \N

label

Text label attached to objects. If a node's shape is record, then the label can have a special format which describes the record layout. Note that a node's default label is "\N", so the node's name or ID becomes its label.

A label is an escString:

escString

A string allowing escape sequences which are replaced according to the context. For node attributes, the substring "\N" is replaced by the name of the node, and the substring "\G" by the name of the graph. For graph or cluster attributes, the substring "\G" is replaced by the name of the graph or cluster. For edge attributes, the substring "\E" is replaced by the name of the edge, the substring "\G" is replaced by the name of the graph or cluster, and the substrings "\T" and "\H" by the names of the tail and head nodes, respectively. The name of an edge is the string formed from the name of the tail node, the appropriate edge operator ("--" or "->") and the name of the head node. In all cases, the substring "\L" is replaced by the object's label attribute.

http://www.graphviz.org/doc/info/attrs.html#k:escString

Whitespace breaks node ids

Extra spaces render CSV fields into non-ints (0) -- need to trim all fields and catch errors on non-ints (and support strings after trimming, as in #1 ).

Skip and log invalid files

Edger doesn't have a lot of error handling -- invalid files can take down the process mid-stream, or produce junk output. Better to not produce output for bad files, and to log them and print warnings.

Layout for disconnected components

Some works are disconnected graphs with several separate large connected components -- e.g. one component graph per chapter.

The current dot layout algorithm attempts to pack these disconnected components together:

..., rather than clearly separating them vertically into a stack of horizontal lanes, which would be preferable and more legible.

One approach when using graphviz as a renderer is to identify cluster subgraphs:

http://www.graphviz.org/Gallery/directed/cluster.html

...however a challenge for auto-generating this from the data is that the components are not known at encoding time, and the edges are recorded in book page order, not in component groups.

Return arrows

Reference-and-return instructions need special annotation in data encoding -- R for reference/return?

For example, the DOT digraph supports double arrows with specially marked return directions through [dir=both arrowtail=tee]

http://www.graphviz.org/doc/info/attrs.html#k:arrowType

Persist last settings

Currently preferences such as the style file (Mac, Win) and the working directory (Win) do not necessarily persist across different runs.

Last settings could persist through an auto-updated preferences file -- with a reset-to-default option.