Showing posts from 2013

Smart Word Wrap

Recursion? Oh, we all love it!

This post will try to illustrate how useful the recursion is and how complicated it can get sometimes.
We, programmers, use recursion frequently to create small and elegant programs which can do complicated things.

Just think about how would you split the following sequence "aaa bb cc ddddd" into individual rows with a maximum width of 6 characters per row.

Well, one solution is to go from the left side of the string and just take words, and when the width limit is reached, append a newline. This is called the 'greedy word wrap algorithm'. It is very fast, indeed, but the output is not always pretty.

Here enters another algorithm into play. We call it the 'smart word wrap algorithm'. Don't be fooled by the name. It's not smart at all! It can be, but not this one. For now it just tries in its head all the possible combinations, does some math and returns the prettiest result possible. It is much slower than the first algo…

tzip - file compressor

tzip is (probably) the simplest text-file compressor. It compresses a file by counting the number of different bytes and map those bytes to shorter sequences of bits.

Each byte is composed from 8 bits. A byte value can range from 0 to 255 (inclusive), which means that 8 bits can be mixed up to represent only 256 different bit-sequences. In a text file, however, not all this bytes are used, and for a lower number of patterns, we can use less bits per pattern.

For example, if we take the word 'youtube', it is composed from 7 bytes, but has only 6 different bytes:

To represent this bytes, we will need sequences of only three bits. To calculate exactly how many bits are required in a sequence for a given number of different bytes (n) we have the following formulas:
   where, the function next_power_of_two(), is defined as:

Finding similar file names

This is a story about the creation of a nice script...
A friend of mine needed a script to find files which are similar, without reading their content. She suggested comparing the file attributes, like the file size and some other attributes.
So, I gave her this script. It works great when there not many files which have the same size. So, you can safely say that the files are similar if they have the same attributes.
But in some cases, this may be a real disaster because there may be lots of files that have the same attributes and are not actually similar. So, I came up with a new idea. I suggested grouping the files by size and compare their names.
If the shortest name is contained in the biggest name, then the files are similar. But not so fast. We need some rules, because we can have two files like: 'A happy file name.txt' and 'a.txt'! They are not similar, even if the shortest name 'a' is found in the longest name 'A happy file name'.
So, how to s…

Find duplicate files

fdf - a pretty clever Perl script to find duplicate files across directories.

It is very simple and very fast. It doesn't use any checksum. It reads (through the File::Compare module) chunks of data from two files. If the chucks are not the same, we already know that the files can't be duplicates of each other.

This is one reason why it is fast, but the other reason is that, if we have 3 duplicate files, let's say: "A", "B" and "C", we compare "A" with "B" and they are the same, then we compare "A" with "C", and are the same, we don't have to compare "B" with "C" because they are duplicates of "A", which tells us that they must be exactly the same.

Yet another reason: files are grouped by file size. So we compare two files only if they have the same size. (Getting the size of a file it's a very simple and fast check).

The script accepts one argument: either '-f&#…


It's a simple script to get and print the lyrics for the current playing song in the moc player.

It uses the Google search engine to search for the lyrics with the song name, looking for popular lyrics websites, trying to get lyrics of the current song and print them to the standard output.

One important feature is the easy way of adding support for more websites.

UPDATE: The project evolved into clyrics!


pview is a perl source code viewer. It highlights the code using a pretty good Perl tokenizer.

The color scheme can be very easy customized inside the script.

Download pview

On the same concept, a new script has been created, called 'scgrep' which grabs form a Perl script only the elements that the reader wants. (e.g.: regular expressions, operators, numbers, etc...)

Download scgrep

Even better, this tools are now available as part of the Perl::Tokenizer CPAN module, which also include another interesting tool, called pl2html which highlights Perl code in HTML.


RAM and disk usage status.

A very simple command line tool which displays a status bar with the used space for each partition.
It uses the `df` command to get the information needed and automatically adjusts itself to the terminal width.

It can, also, be very easily adjusted to your own preferences.

Just for fun. :)