TextArc was initially developed as a text analysis tool, and we are continuing that development. We see it as a possible way to make sense of the ever-increasing deluge of information that knowledge workers have to absorb in day to day working situations.

Here are some of the notes I've given to curators or text researchers who show the tool to new audiences: they speak to just a few of the "visual analysis strategies" people have developed to rtake advantage of this unusual reorganization of a text. (It might help to open the running Alice TextArc in another window to click and drag along, if you're not too familiar with the tool yet.)

My honed four-minute demo goes something like this;
even unmotivated audiences can typically follow:


TextArc was built as a Structuralist text analysis
tool to show distribution of words in texts that have
no "meta-data" descriptions, such as a table of content
or an index.


It first draws the entire text, line by line, around
the outside of the screen (in a font too tiny to read), 
to give a context. Even at this scale, typographic
layout conveys some of the structure of the text: see
the chapter breaks, the poem at 10:00; the "Mouse's
Tale" at 2:00.

Then, since we are looking to get some meaning from
the text and meaning is tied to words more than lines, 
every word is drawn in a similar ellipse, just inside 
the lines. Words that appear more often are brighter
so they draw the eye to what may be important.

Any word that appears more than once is drawn at its 
average position--essentially held in place by rubber-
bands attached to each place it is used. [Drag a a word, 
like "mouse" to show this.]


The thing that makes it useful is that it exposes the
structure implied by word distribution for experts to
analyse and interpretat: it lets them ask questions
that they can go back to the text to answer.

We can see that rolling over "mouse" or "Griffin"
shows that they are mostly used in a single chapter.

We can see that rolling over "Rabbit" shows that he
shows up between chapters; perhaps that's why we
think of him as important, even though he's mentioned
less than the Duchess.

Alice is mentioned everywhere.

DEEPER INTERPRETATION (This is the important stuff)

But even higher levels of interpretation are possible:
this is the reason Brad built the tool, and some
features of this text may never have been seen with
such clarity before this tool existed.

For example: rolling over "Hatter," "Doormouse," and
"March Hare" shows that they all appear in the same
two places: the tea party (bottom) and near the end.
This may mean that during the dramatic development
of this book they are a single force--essentially a 
three-headed character, or "ensemble character."

Clicking on the "King" shows that he also appears
in only two places, and rolling over the "Queen" lets 
us compare the two to see that they seem almost to be 
another ensamble character--except for references
that appear in four places for the queen before the
king appears.

If we look at the text itself, though, [here, click
on the "Show text" box, then click on "Queen"] we
can see that those early references to the just 
mention her, she has not appeared yet.

This is what forshadowing looks like in this text.
[An "aha"' point for most audiences, I pause a beat]

Lewis Carroll, storytelling genius that he was,
knew that the queen would be a major player toward
the end of the book, so he introduced her early,
and repeated the introduction every time you are
just about to forget her. 

Carroll's regular, almost musical structure (two 
references initially to lock the name in your 
memory, then one, then one, then two more just 
before she appears) is clear in TextArc, perhaps 
clearer than ever before. It is also clearly not 
accidental, as shown by the equally-spaced lines:
informed by the length of a little girl's memory.

PURPOSE, again

Revealing previously hidden genius like this,
exposed in the timing and interconnections in a
text is what makes creating tools like TextArc
fulfilling to Brad. 

Some of the other things we've considered or tested TextArc on include:

  • E-mail archives
  • Web sites
  • Source code (needs some scoping info & other pre-processing)
  • Video or Film
  • Image collections
  • Legal archives, contracts, or depositions
  • Financial news updates (this works well)
  • Music (with a transposition-invariant pattern matcher)
  • Genomics (multi-sequence comparisons)
  • Intelligence

Watch this site, and the upcoming sister site, for applications as they are explored and released.

And please drop a note to if you're interested in one of these areas, have another need, or want us to apply this and related information-management technologies to your own field.