Giter Club home page Giter Club logo

edger's Introduction

edger: an edge list converter

record simple graphs, convert into multiple formats

Edger is a simple batch processor for textual graph data. It takes a directory of text files, parses them, and outputs graph files, images, and stats logs for them.

It was developed for use in data processing of interactive narratives (such as gamebooks).

Install

Edger is implemented as a cross-platform Processing(Java) sketch -- it can be run in the Processing Development Environment (PDE) or exported from PDE to a standalone application.

  1. Install Processing
  2. Download Edger
  3. (optional) Install Graphviz for Mac or Windows to enable PNG image output.
  4. (optional) Export an application
    • Launch Edger.pde in Processing
    • File > Export Application to create a Mac or Win app.

Edger relies on Graphviz being installed separately in order to perform for image rendering, although it will run without it. It also uses the GraphStream core for summary statistics -- which is built-in.

Use

To use Edger as a Processing sketch:

  1. Launch Edger.pde
  2. Press Run (">")
  3. Select working directory (location of txt files
  4. Edger will process files and produce output
    • Click floating windoe to re-process files
    • SPACE to toggle PNG image generation
    • ESC or Quit when finished

To use Edger after exporting it as an application:

  1. Launch Edger.app / Edger.exe
  2. Select working directory (location of txt files
  3. Edger will process files and produce output
    • Click floating windoe to re-process files
    • SPACE to toggle PNG image generation
    • ESC or Quit when finished

On run, Edger requests a working directory, and processes all .txt files in that directory. Original text files are untouched, with output files are replaced each re-run. Note that if source file names change then old graph and image outputs may be left behind -- although this will be visible by checking file dates.

Output

For each input text file name.txt, Edger outputs:

  • /gv/name.gv: a Graphviz DOT file (for use with Graphviz)
  • /tgf/name.tgf: a Trivial Graph Format file (for use with yEd)
  • /log/name.log: a log files of graph descriptive statics
  • /gv/name.gv.png: an image, rendered by Graphviz

In addition, for each batch of files processed it produces:

  • /log/_graph_stats.log.csv: a summary file of key statistics

Input

Edger processes a directory of plain text files (.txt). Specifically, these text files are sparse edge lists, a custom graph data format designed for quick data entry. This means that Edger supports the simple edge list format:

1 2
2 3
2 4

...as well as numerous extensions to the edge list format, including:

  • whitespace
  • graph labels
  • code comments
  • sparse entries

Here is an example of a sparse edge list:

# File is tab-separated (tsv)
# Filename ends in .txt

# These are edges, with or without comments
1 2
2 3 edge  # a labeled edge w/comment
3   node  # a labeled node w/comment
4 5
4 8   # separate node lines are optional

# These are whole-line comments
     # ## Comments begin with '#' after any amount of whitespace
# Blank may be used to organize material

# repeat edges may be specified
5 6
5 7
5 8

# repeat edges may have an implied first node
6 9 choice1
  10  choice2
  11  choice3
9 12  c1
  13  c2

# nodes and edges may be listed in any order
1   Start
12    End1
13    End2

# unlabeled node lines
# previous node 2 unaffected
2
# new floating node 100 created 
100


edger's People

Contributors

jeremydouglass avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

edger's Issues

Buttons

Edger currently has a small collection of hotkeys -- it could use a UI such as G4P to tie functions to a small floating interface.

Live entry

G4P or ControlP5 probably support a text field that could be checked live -- and used to update a live GraphStream window.

Possibly gratuitous, although it might be helpful for debugging bad data entries.

Graphviz style files are brittle

At present the graphviz style files work but they are a brittle hack --

  1. the files mix required fields with optional fields and custom fields
  2. material could be hierarchically organized for lookup -- e.g. JSON
  3. contents are raw fragments of DOT code -- the method can't be generalized to the other renderers
  4. comma separation of arguments isn't handled well -- the final separator is omitted (allowed in DOT, but messy)
  5. the required fields must have contents (can't be empty, or they print invalid "null" text into the gv, endering it un-render-able).

Depth meter for graphviz output

For some rank separated directed graphs it is possible to print a line of nodes down the edge as a visual aid -- kind of "depth meter" indicating which nodes are at depth 1,2,3...20,21 etc.

A template for this can be done fairly easily in a subgraph, but the question is how to determine the appropriate length of the node series, as some graphs are very short and others are extremely long.

Perhaps use GraphStream diameter, which on should almost always correspond -- although this then creates a dependency for the graphviz renderer.

UI listener mode

Rather than requesting re-runs, Edger could run like a daemon, triggering on a file update and checking every n seconds for updates.

Style preset options

A collection of pre-defined stylesheet options for LR and TD layouts, small and large, bw / grayscale / color.

Subfolder searching

Currently the working directory is flat. Running edger on recursive subdirectories of txt files (Box 1, Box 2, Box 3...) would help when working with large projects, but requires careful thinking about whether output would be pooled at the top level or created per-subdirectory -- each might be desirable in different circumstances.

Use graphviz dot label wildcards (escString)

Currently building custom labels and xlabels is handled in Java using the in-memory label name, but this could be made standard in stylesheets using \N

label

Text label attached to objects. If a node's shape is record, then the label can have a special format which describes the record layout. Note that a node's default label is "\N", so the node's name or ID becomes its label.

A label is an escString:

escString

A string allowing escape sequences which are replaced according to the context. For node attributes, the substring "\N" is replaced by the name of the node, and the substring "\G" by the name of the graph. For graph or cluster attributes, the substring "\G" is replaced by the name of the graph or cluster. For edge attributes, the substring "\E" is replaced by the name of the edge, the substring "\G" is replaced by the name of the graph or cluster, and the substrings "\T" and "\H" by the names of the tail and head nodes, respectively. The name of an edge is the string formed from the name of the tail node, the appropriate edge operator ("--" or "->") and the name of the head node. In all cases, the substring "\L" is replaced by the object's label attribute.

http://www.graphviz.org/doc/info/attrs.html#k:escString

Whitespace breaks node ids

Extra spaces render CSV fields into non-ints (0) -- need to trim all fields and catch errors on non-ints (and support strings after trimming, as in #1 ).

Skip and log invalid files

Edger doesn't have a lot of error handling -- invalid files can take down the process mid-stream, or produce junk output. Better to not produce output for bad files, and to log them and print warnings.

Layout for disconnected components

Some works are disconnected graphs with several separate large connected components -- e.g. one component graph per chapter.

The current dot layout algorithm attempts to pack these disconnected components together:

42-48 you are there bible adventures txt gv

..., rather than clearly separating them vertically into a stack of horizontal lanes, which would be preferable and more legible.

One approach when using graphviz as a renderer is to identify cluster subgraphs:

...however a challenge for auto-generating this from the data is that the components are not known at encoding time, and the edges are recorded in book page order, not in component groups.

Persist last settings

Currently preferences such as the style file (Mac, Win) and the working directory (Win) do not necessarily persist across different runs.

Last settings could persist through an auto-updated preferences file -- with a reset-to-default option.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.