Perl code analyzer
In this post we're going to take a look at code analysis. It's an old problem and a very hard one. The task is to determine the code quality and how good was the programmer that wrote that code (his knowledge about the language features).
One solution would be to try to simplify the code so much that it will look like it was written by a newbie. For example, if there are special quotes which enclose the strings, simplify them to just one type of quotes, or put/remove parentheses even where they are unnecessary, transform hexadecimal and binary literals into decimal numbers, etc...
We're going to implement such a program in Perl which will analyze other Perl programs, but we're going to cheat here. Perl already has a deparser which simplifies the things for us. All we need, is a clever solution to compare those transformations. But we want this solution to be fast and reliable.
Here is the main picture of the algorithm:
Pretty simple, right? Not really...
One solution would be to try to simplify the code so much that it will look like it was written by a newbie. For example, if there are special quotes which enclose the strings, simplify them to just one type of quotes, or put/remove parentheses even where they are unnecessary, transform hexadecimal and binary literals into decimal numbers, etc...
We're going to implement such a program in Perl which will analyze other Perl programs, but we're going to cheat here. Perl already has a deparser which simplifies the things for us. All we need, is a clever solution to compare those transformations. But we want this solution to be fast and reliable.
Here is the main picture of the algorithm:
- Read code
- Deparse code
- Tokenize the original and deparsed code
- Get the number of common tokens
- Calculate the score using an arbitrary formula
- Output the results.
- Done!
# How it actually works?
It deparses the code, using B::Deparse, then it tokenizes the original code and the deparsed one, using Perl::Tokenizer, then it calculates the number of tokens they have in common.# On what criteria the tokens differ?
For example, if we deparse "qx(ls)", we'll get "`ls`", which is a different token. Another example would be "qw(1 2 3)" which is deparsed as "('1', '2', '3')". More transformations tell us that that code is more likely to be written by a Perl expert.
# How is the score computed?
The score is computed using the following formula:
100 - ((NumberOfCommonTokens - TokenDifferenceNumber) / NumberOfTokensInOriginalCode * 100)
# Is a higher score better?
Yes! A higher score indicate that the person who wrote that code is very familiar with Perl, and is not afraid of using his or hers knowledge wherever possible.
P.S. This script is meant to be more of a joke than a real thing. So everyone who's employer out there, please don't fire your Perl programmers based on the claims of this script, believing that he or she is a Perl newbie. :)
Source code: https://github.com/trizen/perl-scripts/blob/master/Analyzers/perl_code_analyzer.pl
Source code: https://github.com/trizen/perl-scripts/blob/master/Analyzers/perl_code_analyzer.pl
Comments
Post a Comment