Unique prefixes

This is a problem about finding the shortest unique prefixes for a list of words.


To illustrate this, bellow is an example with the prefixes highlighted:

  • playing
  • player

To solve this problem, we are going to implement a generic algorithm which will work for any number of words. A very fast way of doing this, is by using multidimensional hashes.

The simplest way to create a multidimensional hash, is to create a reference to the original hash, iterate over the keys you want to add and create a new hash for each key which doesn't exists inside the current hash, and change the hash reference to this new hash.

Let's define a function which takes a reference to a hash and a string. First, it splits the string into characters, then it traverses the hash, adding each character to the corresponding branch:

sub add_entry {
    my ($ref, $word) = @_;
    foreach my $char (split(//, $word)) {
        $ref = $ref->{$char} //= {};
    }
}

Now, let's define an %hash and call the above function to add an 'word' into it:
my %hash;
add_entry(\%hash, 'word');

The %hash now contains:
{ w => { o => { r => { d => {} } } } }

We can add more words afterwards:
add_entry(\%hash, 'world');

...making the hash to look like this:
{ w => { o => { r => { d => {},
                       l => { d => {} } } } } }

In order to find the unique prefixes, we have to walk the nested hash and find the spot where the hash has two or more keys.

# Example:

decadere   -> deca
decorat    -> deco
deodorant  -> deo
jaguar     -> j
placere    -> pla
plecare    -> plecar
plecat     -> plecat

# Conclusion:

The described algorithm is very fast and very scalable. It can be used in pattern matching, natural language processing and some special dynamic systems, such as implementing the auto-complete feature in text entries.

# Implementations:

# Library:

Comments