Lucene has a custom query syntax for querying its indexes. Unless you explicitly specify an alternative query parser such as DisMax or eDisMax, you're using the standard Lucene query parser by default.
Here are some query examples demonstrating the query syntax.
Search for word "foo" in the title field.
title:foo
Search for phrase "foo bar" in the title field.
title:"foo bar"
Search for phrase "foo bar" in the title field AND the phrase "quick fox" in the body field.
title:"foo bar" AND body:"quick fox"
Search for either the phrase "foo bar" in the title field AND the phrase "quick fox" in the body field, or the word "fox" in the title field.
(title:"foo bar" AND body:"quick fox") OR title:fox
Search for word "foo" and not "bar" in the title field.
title:foo -title:bar
Search for any word that starts with "foo" in the title field.
title:foo*
Search for any word that starts with "foo" and ends with bar in the title field.
title:foo*bar
Note that Lucene doesn't support using a * symbol as the first character of a search.
Lucene supports finding words are a within a specific distance away.
Search for "foo bar" within 4 words from each other.
"foo bar"~4
Note that for proximity searches, exact matches are proximity zero, and word transpositions (bar foo) are proximity 1.
A query such as "foo bar"~10000000 is an interesting alternative to foo AND bar.
Whilst both queries are effectively equivalent with respect to the documents that are returned, the proximity query assigns a higher score to documents for which the terms foo and bar are closer together.
The trade-off, is that the proximity query is slower to perform and requires more CPU.
Solr DisMax and eDisMax query parsers can add phrase proximity matches to a user query.
Range Queries allow one to match documents whose field(s) values are between the lower and upper bound specified by the Range Query. Range Queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically.
mod_date:[20020101 TO 20030101]
Solr's built-in field types are very convenient for performing range queries on numbers without requiring padding.
Query-time boosts allow one to specify which terms/clauses are "more important". The higher the boost factor, the more relevant the term will be, and therefore the higher the corresponding document scores.
A typical boosting technique is assigning higher boosts to title matches than to body content matches:
(title:foo OR title:bar)^1.5 (body:foo OR body:bar)
You should carefully examine explain output to determine the appropriate boost weights.
The official docs for the query parser syntax are here: http://lucene.apache.org/java/3_5_0/queryparsersyntax.html
The query syntax has not changed significantly since Lucene 1.3 (it is now 3.5.0).
Here is a list of differences between the Solr Query Parser and the standard Lucene query syntax (from the Solr wiki):
Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal score). The scoring factors tf, idf, index boost, and coord are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
Lucene 2.1 has also switched to use ConstantScoreRangeQuery for its range queries.
A * may be used for either or both endpoints to specify an open-ended range query.
field:[* TO 100] finds all field values less than or equal to 100
field:[100 TO *] finds all field values greater than or equal to 100
field:[* TO *] matches all documents with the field
-inStock:false finds all field values where inStock is not false
-field:[* TO *] finds all documents without a value for field
A hook into FunctionQuery syntax. Quotes will be necessary to encapsulate the function when it includes parentheses.
Example: _val_:myfield
Example: _val_:"recip(rord(myfield),1,2,3)"
Example: _query_:"{!dismax qf=myfield}how now brown cow"
© Copyright 2024 Kelvin Tan - Solr and Elasticsearch consultant