This is a summary of the rules for repeating strings, which are linked by an arc. Basically there are 3 rules:
An essential matching pair is a pair of substrings of S, X and Y, which are:
A maximal matching pair not contained in any repetition region,
Or, a maximal matching pair contained in the same fundamental substring of any repetition region that contains it
Or, two consecutive fundamental substrings for a repetition region.
Furtherer more you should know what maximal matching pairs and repetition regions are.
A maximal matching pair is a pair of substrings of S, X and Y, which are:
Identical. X and Y consist of the same sequence of symbols.
Non-overlapping. X and Y do not intersect.
Consecutive. X occurs before Y, and there is no substring Z, identical to X and Y, whose beginning falls between the beginning of X and the beginning of Y.
Maximal. There do not exist longer identical non-overlapping subsequences X’ and Y’ with X’ containing X and Y’ containing Y’
A repetition region R is a substring R of S such that R is made up of a string P repeated two or more times in immediate succession. Each repetition of P is called a fundamental substring for R.
The definition of repetition regions is fairly unclear if you think about it. Which of the following arcs will be correctly? The ones above or the ones beneath the string?
So we decided to contact Mr. Wattenberg to get clear information about this special problem. When we got response we received an updated definition about repetition regions:
Whichever has the smallest fundamental pattern wins
When the fundamental patterns are the same lengths, whichever starts nearest to the beginning of the string wins
The program code was implementet in Java. For graphic visualization we used the Piccolo framework and Java Graphics 2D.
The algorithm to analyze an input string uses several lists to find repeating substrings. Based on the rules mentioned above arcs are drawn to visualize a match.
The following list of features were implemented in the Arc Diagrams Program:
analysis of strings and midi music
Setting the minimal and maximal size of substings that will be searched
Reading from a file or typing text directly into the program window
Searching only for matching words
Case sensivity on or off
Ignorance of whitespaces between words
Using the soundex algorithm which identifies words that are not equal but sound similar
Switching between several tracks of a midi file
Fuzzy Factor which draws arcs even if the matching notes do not actually have the same duration
Zooming to a selected arc to see more details
Displaying real notes instead of notes in midi interpretation format
Plugin based code to create new plugins easily (For example a DNA - Analysis plugin)