Fork me on GitHub

February 24, 2010

Groovy Globbing

Post moved to http://log2.kares.org/post/59891235623/groovy-globbing

Having Java as a first language, it always felt like dynamic languages are closer to the filesystem. Maybe it's cause there's only the bare minimal java.io.File in the standard Java API. In Ruby, for example, there's a bunch of shell-like methods in the FileUtils module, provided as a part of the language core.
Though maybe I shouldn't compare apples with oranges, so I'll get less javish and more groovier. Luckily, I've had a chance of going from Groovy to Ruby and back again. Retrospectively I must say I enjoyed and still enjoy both, but sometimes I miss a feature available in another.
Probably one of the most used methods, and besides my personal favorite, when it comes to matching files is Ruby's Dir.glob. I was truly missing it. To be fair Groovy has a bunch of File additions that come handy. There's eachFileMatch and eachFileRecurse, besides AntBuilder is in the pocket. Yet even the 3 Princes (Perl, Python and PHP) are all "globbing" positive, Groovy should definitely not stand out the row !

A cca. 100 lines later, here it is: a File.glob implemented in (and for) Groovy.

The code stinks after Java and is a bit long for a method (and probably too long for a closure), that's cause my 2 side goals. At first make it easily portable to plain Java in-case I need it someday, thus I avoided Groovy's File extensions. Second, have no "external" dependencies (e.g. helper methods) which made the resulting code less readable but certainly more "embeddable" for the extremely popular CPP (Copy & Paste Programming).

Now, it's useless to show the code (it's dam ugly anyway) but I'll end with a simple example (for those who have not yet discovered the power of globbing) plus a few lines of documentation inspired by rubydoc :



/**
 * Returns filenames found by expanding the passed pattern which is String or
 * a List of patterns.
 * NOTE: that this pattern is not a regexp (it’s closer to a shell glob).
 * NOTE: that case sensitivity depends on your system.
 * 
 *   *                 Matches any file. Can be restricted by other values in
 *                     the glob pattern (same as .* in regexp).
 *                      *  will match all files,
 *                      c*  will match all files beginning with c,
 *                      *c  will match all files ending with c.
 * 
 *   **                Matches directories recursively.
 *
 *   ?                 Matches any one character. Equivalent to . in a regular 
 *                     expression.
 *
 *   [set]             Matches any one character in set. Behaves like character
 *                     sets in regex, including negation ([^a-z]).
 *                     
 *   {p,q}             Matches either literal p or literal q. Matching literals
 *                     may be more than one character in length. More than two
 *                     literals may be specified. Same as alternation in regexp.
 *
 * NOTE: when matching special characters an escape is required, for example :
 * "\\*" or "\\\\".
 *
 * NOTE: flags (e.g. case insensitive matching) are not supported.
 *
 * @see http://ruby-doc.org/core/classes/Dir.html
 * @see http://www.faqs.org/docs/abs/HTML/globbingref.html
 * @author Karol Bucek
 */


BTW, globbing is a built-in feature of decent command interpreters like bash, try ls **/*.txt next time.