Smart Word Wrap

Recursion? Oh, we all love it!

This post will try to illustrate how useful the recursion is and how complicated it can get sometimes.
We, programmers, use recursion frequently to create small and elegant programs which can do complicated things.

Just think about how would you split the following sequence "aaa bb cc ddddd" into individual rows with a maximum width of 6 characters per row.

Well, one solution is to go from the left side of the string and just take words, and when the width limit is reached, append a newline. This is called the 'greedy word wrap algorithm'. It is very fast, indeed, but the output is not always pretty.

Here enters another algorithm into play. We call it the 'smart word wrap algorithm'. Don't be fooled by the name. It's not smart at all! It can be, but not this one. For now it just tries in its head all the possible combinations, does some math and returns the prettiest result possible. It is much slower than the first algorithm, especially on very large strings with a great width value because it has to create and remember more combinations.

How many combinations do we have for the above sequence? Well, there are three of them:
  1. ["aaa", "bb", "cc", "ddddd"]
  2. ["aaa", "bb cc", "ddddd"]
  3. ["aaa bb", "cc", "ddddd"]
How do we know which one is the best? We can find this by summing the squared number of remaining spaces on each line.

The combinations are represented as arrays. Each element of the array is a line. For example, the first array has four lines. Let's do the math to see what it says. Remember, the maximum width == 6;

1. "aaa"    -> (6-3)**2 =  9
   "bb"     -> (6-2)**2 = 16
   "cc"     -> (6-2)**2 = 16
   "ddddd"  -> (6-5)**2 =  1
   TOTAL:                 42

2. "aaa"    -> (6-3)**2 =  9
   "bb cc"  -> (6-5)**2 =  1
   "ddddd"  -> (6-5)**2 =  1
   TOTAL:                 11

3. "aaa bb" -> (6-6)**2 =  0
   "cc"     -> (6-2)**2 = 16
   "ddddd"  -> (6-5)**2 =  1
   TOTAL:                 17

Clearly, we can see that the first combination is the worst of all. A lower sum indicates more uniformity near the edges. The best solution is the second one, which has a sum of 11, and is the result of a smart word wrap algorithm. The third combination is achieved by a greedy word wrap algorithm.

Here are the steps of a smart word wrap algorithm with combinations:
  1. Split the string into words
  2. Create all the possible paths
  3. Normalize the paths
  4. Create all the possible combinations
  5. Normalize the combinations
  6. Find the best result

In phase 1, the words will look like this:
       ("aaa", "bb", "cc", "ddddd")

In phase 2, we will have an array with two sub-arrays, which contains other sub-arrays:
     ["aaa", ["bb", ["cc", ["ddddd"]]], ["bb", "cc", ["ddddd"]]],
     ["aaa", "bb", ["cc", ["ddddd"]]],

The phase 3 represents a small transformation of the paths from the phase 2:
        { "aaa" => [{ "bb" => [{ "cc" => "ddddd" }] }] },
        { "aaa" => [{ "bb cc" => "ddddd" }] },
      [{ "aaa bb" => [{ "cc" => "ddddd" }] }],

In phase 4, we need to create the combinations using the paths from the phase 3:
      [[[["aaa", "bb", "cc", "ddddd"]]]],
      [[["aaa", "bb cc", "ddddd"]]],
      [[["aaa bb", "cc", "ddddd"]]],

In phase 5, the combinations have to be arrays of strings, so we need to normalize them:
      ["aaa", "bb", "cc", "ddddd"],
      ["aaa", "bb cc", "ddddd"],
      ["aaa bb", "cc", "ddddd"],

Finally, in phase 6, after some calculations, the best result pops up:
    ("aaa", "bb cc", "ddddd")

As shown in the above phases (or steps), the algorithm does many useless transformations before it gets to the best result. The transformations take time, but they are beautiful. :)

Here is an example for a random text with MAX_WIDTH=20:

As shown in the     |
above phases        |
(or steps), the     |
algorithm does      |
many useless        |
transformations     |

*** GREEDY WRAP (Text::Wrap)
As shown in the     |
above phases (or    |
steps), the         |
algorithm does many |
useless             |
transformations     |

An implementation of the algorithm described in this post can be found by clicking on one of the following links: