Nux 1.6

nux.xom.pool
Class FileUtil

java.lang.Object
  extended by nux.xom.pool.FileUtil

public class FileUtil
extends Object

Various file related utilities.

Author:
whoschek.AT.lbl.DOT.gov, $Author: hoschek3 $

Method Summary
static URI[] listFiles(String directory, boolean recurse, String includes, String excludes)
          Returns the URIs of all files who's path matches at least one of the given inclusion wildcard or regular expressions but none of the given exclusion wildcard or regular expressions; starting from the given directory, optionally with recursive directory traversal, insensitive to underlying operating system conventions.
static byte[] toByteArray(InputStream input)
          Reads until end-of-stream and returns all read bytes, finally closes the stream.
static String toString(InputStream input, Charset charset)
          Reads until end-of-stream and returns all read bytes as a string, finally closes the stream, converting the data with the given charset encoding, or the system's default platform encoding if charset == null.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

listFiles

public static URI[] listFiles(String directory,
                              boolean recurse,
                              String includes,
                              String excludes)
Returns the URIs of all files who's path matches at least one of the given inclusion wildcard or regular expressions but none of the given exclusion wildcard or regular expressions; starting from the given directory, optionally with recursive directory traversal, insensitive to underlying operating system conventions.

An inclusion and exclusion pattern can contain zero or more expressions, each separated by one or more ':',';', ',' or whitespace characters, for example "*.xml *.xsl" or "*.xml, *.xsl".

A wildcard expression can contain zero or more "match any char sequence" wildcards, denoted by the '*' character. Wildcard expressions (and the directory parameter) have identical behaviour no matter whether they contains Unix '/' or Windows '\' file separator characters, on any host operating system. In other words, path/to/file and path\to\file and path/to\file will always work fine, irrespective of the underlying OS. This is not the case for regular expressions (because those characters can have multiple meanings, and thus cannot be safely substituted).

Wildcard expressions are simple and intuitive, whereas regular expressions are more complex and powerful. A wildcard expression is indicated by the absence of a leading '#' character. Otherwise the expression is treated as a normal Java regular expression, with the leading '#' character stripped off.

Direct or indirect infinite cycles in recursive directory traversal (cyclic symbolic links etc.) are detected and avoided via File.getCanonicalPath() duplicate checks.

Example usage:

 // Simple wildcard expressions
 // all files ending with ".xml" or ".xsl" or ".svg" in the "/tmp" dir: 
 listFiles("/tmp", false, "*.xml *.xsl *.svg", null); 
 
 // Simple wildcard expressions
 // all files in the current working dir and descendants,  
 // excluding hidden dot files and files ending with "~" or ".bak", 
 // and excluding files in CVS directories,
 // and excluding files starting with "error" or of the form "bugXYZreport-XYZ.xml" 
 // where XYZ can be any character sequence (including zero chars)
 listFiles(".", true, null, ".*, *~, *.bak, CVS/*, error*, bug*report-*.xml"); 
 
 
 
 // Advanced regular expressions
 // Note that javadoc renders regexes in buggy ways: for the correct regexes 
 // check the source code rather than the javadoc HTML output. 
 
 // all files ending with ".xml" or ".xsl" or ".svg" in the "/tmp" dir: 
 listFiles("/tmp", false, "#.\*\.xml, #.\*\.xsl, #.\*\.svg", null); 
 
 // Advanced regular expressions
 // all files in the current working dir and descendants,  
 // excluding files in CVS directories, hidden dot files and files ending with "~" or ".bak":
 String dotFiles = "#.*"  +  "/\\.[^/]*";
 listFiles(".", true, null, "#.*CVS/.*, " + dotFiles + ", #.*~, #.\*\.bak"); 
 

Note: The returned URIs can be converted to valid files via new File(uri) or to strings via uri.toString(). These can be passed to, say, new Builder().build(...) or be used to load documents in XPath/XQuery via

     declare namespace util = "java:nux.xom.pool.FileUtil"; 
     for $uri in util:listFiles(".", false(), "*.xml", "") 
     return count(doc(string($uri))//*);
 
This method is thread-safe.

Parameters:
directory - the path or URI of the directory to start at. Leading "file://" and "file:" URI prefixes are stripped off if present. If null or "" or "." defaults to the current working directory. If the directory does not exist an empty result set is returned.

Absolute examples: "/tmp/lib", "file:/tmp/lib", "file:///tmp\lib","C:\tmp\lib", "file://C:\tmp\lib", Windows UNC "\\server\share\tmp\lib", "file:\\server\share\tmp\lib" "file://\\server\share\tmp\lib", etc.

Relative examples: ".", "nux/lib/CVS","nux/lib\CVS"

recurse - whether or not to traverse the file system tree.
includes - zero or more wildcard or regular expressions to match for result set inclusion; as in File.getPath().matches(regex). If null or an empty string defaults to matching all files. Example: "*.xml *.xsl". Example: "*.xml, *.xsl".
excludes - zero or more wildcard or regular expressions to match for result set exclusion; as in File.getPath().matches(regex). If null or an empty string defaults to matching (i.e. excluding) no files. Example: "*.xml *.xsl". Example: "*.xml, *.xsl".
Returns:
the URIs of all matching files (omitting directories)
See Also:
Pattern, File, String.matches(java.lang.String)

toByteArray

public static byte[] toByteArray(InputStream input)
                          throws IOException
Reads until end-of-stream and returns all read bytes, finally closes the stream.

Parameters:
input - the input stream
Returns:
the bytes read from the input stream
Throws:
IOException - if an I/O error occurs while reading the stream

toString

public static String toString(InputStream input,
                              Charset charset)
                       throws IOException
Reads until end-of-stream and returns all read bytes as a string, finally closes the stream, converting the data with the given charset encoding, or the system's default platform encoding if charset == null.

Parameters:
input - the input stream
charset - the charset to convert with, e.g. Charset.forName("UTF-8")
Returns:
the bytes read from the input stream, as a string
Throws:
IOException - if an I/O error occurs while reading the stream

Nux 1.6