Correct formulation of queries

Simple terms and phrases


In the search, you can use simple terms , that is, single words and phrases , that is phrases composed of several words enclosed in quotation marks, eg 'Nicolaus Copernicus University'. If you use quotation marks, only documents containing the entire phrase will be searched.

You can use Boolean operators to concatenate search terms. You can also use the so-called Masking characters that replace any letters and numbers and their strings, find similar terms that are some distance apart, or prioritize search terms.

Logical operators


  • AND - also saved as &&
    - means that terms connected by the operator must appear simultaneously in the searched document . For example, a query in the form Copernicus && Chopin will select only those documents in which both surnames occur simultaneously. The use of the AND operator is the default behavior of the search engine when more than one word is entered, so the same result is obtained by entering Copernicus Chopin.
    or
    or
  • OR - also saved as ||
    - requires that at least one of the terms appear in the document being searched . E.g. a query Kopernik || Copernicus will select documents in which the astronomer's name appears in at least one of the given forms.
    or
  • NOT - also saved as !
    - exclude documents with negated term from the list of results . For example, the query 'Nicolaus Copernicus' NOT university will find documents containing the phrase 'Nicolaus Copernicus', but not containing the word university. This operator cannot be used alone, eg a query in the form of NOT 'Nicolaus Copernicus' will not return correct results.
    or
  • + (required term operator)
    - finds documents containing the term immediately following the '+' , but not necessarily other terms, e.g. +university library will select documents that must contain the word library and may or may not contain the word university.
  • - (operator forbidden term)
    - works similarly to the NOT operator. The query 'Nicolaus Copernicus' -'Nicolaus Copernicus University' will search for documents with the name 'Nicolaus Copernicus' but not with the name 'Nicolaus Copernicus University'.

Masking signs


  • ?
    - matches any one character . For example, the query Kowalsk? matches both Kowalski and Kowalska.
  • *
    - replaces the string . E.g. writing: bu*a will search for words such as buda, budda, buddysta, butonierka etc. The masking character must not be placed at the beginning of the search phrase.

Fuzzy search


Fuzzy search is used in the case of simple terms similar to each other, eg Copernicus , Copernikus, Kopernikus. Documents containing these terms can be found by adding a tilde character to the term: copernicus~.

The degree of similarity sought can be determined by a coefficient that ranges between 0 and 1. As the coefficient value gets closer to 1, terms with higher similarity will be searched for. By default, the similarity coefficient is set to 0.5. To change it, add a tilde to the search term along with a clearly specified factor, e.g. kopernik~0.4.

= Copernicus / Copernikus / Kopernikus

Search by neighborhood


It is also possible to specify the distance of one of the search terms from another (so-called proximity search). For example, if we remember that in the document Choral-buch and Westpreussen appeared close to each other, we can use the following query: 'Choral-buch Westpreussen'~6.

= Book of address of the city Toruń / or / Book of city debts of the city Toruń from the Thirteen Years' War

Determining the validity of the term


You can specify the priority of the search term by appending a '^' followed by a number (greater than 1). For example, the query stempowski^4 grydzewski will return documents in which both surnames appear, but at the beginning of the list there will be those in which the surname with a higher priority appears more often. The default search priority is 1.

Joining queries


You can group expressions in complex queries using parentheses. Such a procedure allows you to give complex queries an intended, unambiguous meaning, just like in arithmetic operations.

First, the partial expressions inside the parentheses are processed, and then the larger whole. Query about the shape: 'De revolutionibus orbium coelestium' AND (Copernicus OR Copernicus) will search for documents with the title of Copernicus's work and his name in at least one of two forms.

= 'De revolutionibus orbium coelestium' Mikołaj Kopernik / or / 'De revolutionibus orbium coelestium' Nicholas Copernicus

Special signs


For obvious reasons, characters used to build complex queries (+ - && || ! ( ) { } [ ] ^ " ~ * ? : \) are treated differently from the rest of the search: they act as elements of the query syntax, not as particles of the search phrase. To avoid the specific interpretation of special characters, place the so-called escape sign '\' in front of them. For example, to search for a phrase (2+2)*2 must be entered \(2\+2\)\*2. However, it should be noted that only letters and digits are indexed, therefore other characters do not affect search results.

Description source


Full description of how to formulate queries: Jakarta Lucene Query Parser Syntax .
The text was originally posted on the website of the Kujawsko-Pomorska Digital Library .
Creative Commons License This work is licensed under the Creative Commons Attribution-Share Alike 2.5 Poland license .
We use files, through the cookie quality improvement layer of our website.For more information, please read the document Privacy Policy