|
Nux 1.6 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object nux.xom.pool.FullTextUtil
public class FullTextUtil
Thread-safe XQuery/XPath fulltext search utilities; implemented with the Lucene engine and a custom high-performance adapter for on-the-fly main memory indexing with smart caching for indexes, queries and results.
Complementing the standard XPath string and regular
expression matching functionality, Lucene has a powerful query syntax with support
for word stemming, fuzzy searches, similarity searches, approximate searches,
boolean operators, wildcards, grouping, range searches, term boosting, etc.
For details see the Lucene Query
Syntax and Examples.
Also see MemoryIndex
and PatternAnalyzer
for detailed documentation.
Example Java usage:
Analyzer analyzer = PatternAnalyzer.DEFAULT_ANALYZER; float score = FullTextUtil.match( "Readings about Salmons and other select Alaska fishing Manuals", "+salmon~ +fish* manual~", analyzer, analyzer); if (score > 0.0f) { // query matches text } else { // query does not match text }Example XQuery/XPath usage:
declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; lucene:match( "Readings about Salmons and other select Alaska fishing Manuals", "+salmon~ +fish* manual~")Example XQuery/XPath usage to find all books that have a title about salmon fishing:
declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; /books/book[lucene:match(title, "+salmon~ +fish* manual~") > 0.0]An XQuery that finds all books authored by "James" that have something to do with "salmon fishing manuals", sorted by relevance:
declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; declare variable $query := "+salmon~ +fish* manual~"; (: any arbitrary Lucene query can go here :) (: declare variable $query as xs:string external; :) for $book in /books/book[author="James" and lucene:match(abstract, $query) > 0.0] let $score := lucene:match($book/abstract, $query) order by $score descending return $bookExtracting sentences:
for $book in /books/book for $s in lucene:sentences($book/abstract, 0) return if (lucene:match($s, "+salmon~ +fish* manual~") > 0.0) then normalize-space($s) else ()Using a custom text tokenizer/analyzer, limiting to the first 100 words, with debug logging:
declare namespace lucene = "java:nux.xom.pool.FullTextUtil"; declare namespace analyzerUtil = "java:org.apache.lucene.index.memory.AnalyzerUtil"; declare namespace patternAnalyzer = "java:org.apache.lucene.index.memory.PatternAnalyzer"; declare namespace system = "java:java.lang.System"; lucene:match( "Readings about Salmons and other select Alaska fishing Manuals", "+salmon~ +fish* manual~", analyzerUtil:getLoggingAnalyzer( analyzerUtil:getMaxTokenAnalyzer( patternAnalyzer:DEFAULT_ANALYZER(), 100), system:err(), "log"), patternAnalyzer:DEFAULT_ANALYZER() )
Method Summary | |
---|---|
static float |
match(String text,
String query)
Lucene fulltext search convenience method; equivalent to match(text, query, null, null) . |
static float |
match(String text,
String query,
Analyzer textAnalyzer,
Analyzer queryAnalyzer)
Lucene fulltext search convenience method; Returns the relevance score by matching the given text string against the given Lucene query expression. |
static String[] |
paragraphs(String text,
int limit)
Returns at most the first N paragraphs of the given text. |
static String[] |
sentences(String text,
int limit)
Returns at most the first N sentences of the given text. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static float match(String text, String query) throws ParseException
match(text, query, null, null)
.
text
- the string to match the query againstquery
- the Lucene fulltext query expression
ParseException
- if the query expression has a syntax errorpublic static float match(String text, String query, Analyzer textAnalyzer, Analyzer queryAnalyzer) throws ParseException
Typically, both analyzers are identical, but this need not be the case.
text
- the string to match the query againstquery
- the Lucene fulltext query expressiontextAnalyzer
- Stream tokenizer that extracts query terms from query
according to some policy. May be null, in which case a default is used.queryAnalyzer
- Stream tokenizer that extracts index terms from text
according to some policy. May be null, in which case a default is used.
ParseException
- if the query expression has a syntax errorpublic static String[] paragraphs(String text, int limit)
text
- the text to tokenize into paragraphslimit
- the maximum number of paragraphs to return; zero indicates "as
many as possible".
public static String[] sentences(String text, int limit)
text
- the text to tokenize into sentenceslimit
- the maximum number of sentences to return; zero indicates "as
many as possible".
|
Nux 1.6 | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |