schema.xml

schema.xml is usually the first file you configure when setting up a new Solr installation.

The schema declares:

  • what kinds of fields there are
  • which field should be used as the unique/primary key
  • which fields are required
  • how to index and search each field

The XML consists of a number of parts. We'll look at these in turn:

Field Types


<types>
  <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
...
</types>

The example Solr schema.xml comes with a number of pre-defined field types, and they're quite well-documented. You can also use them as templates for creating new field types.

The commonly used ones are:

text

A generically useful text field. Its described in the documentation as:

A text field that uses WordDelimiterFilter to enable splitting and matching of words on case-change, alpha numeric boundaries, and non-alphanumeric chars, so that a query of "wifi" or "wi fi" could match a document containing "Wi-Fi". Synonyms and stopwords are customized by external files, and stemming is enabled.

string

Useful when you have a text field which you don't want tokenized, like IDs. Its described in the documentation as:

The StrField type is not analyzed, but indexed/stored verbatim. - StrField and TextField support an optional compressThreshold which limits compression (if enabled in the derived fields) to values which exceed a certain size (in characters).

date

Useful for dates. Its described in the documentation as:

The format for this date field is of the form 1995-12-31T23:59:59Z, and is a more restricted form of the canonical representation of dateTime http://www.w3.org/TR/xmlschema-2/#dateTime

float and int

Self-explanatory.

You can find a list of Java classes which implement FieldType here.

The Solr Wiki also has some information on field types.

Fields


<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" /> 
  <field name="name" type="textgen" indexed="true" stored="true"/>
...
</fields>

The documentation provides a list of valid attributes:

  • name: mandatory - the name for the field
  • type: mandatory - the name of a previously defined type from the <types> section
  • indexed: true if this field should be indexed (searchable or sortable)
  • stored: true if this field should be retrievable
  • compressed: [false] if this field should be stored using gzip compression (this will only apply if the field type is compressable; among the standard field types, only TextField and StrField are)
  • multiValued: true if this field may contain multiple values per document
  • omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
  • termVectors: [false] set to true to store the term vector for a given field. When using MoreLikeThis, fields used for similarity should be stored for best performance.
  • termPositions: Store position information with the term vector. This will increase storage costs.
  • termOffsets: Store offset information with the term vector. This will increase storage costs.
  • docValues: Store docValues. Enable for any field which you're faceting or sorting on. This will increase storage costs.
  • default: a value that should be used if no value is specified when adding a document.

The Solr Wiki has more information on fields like dynamic fields etc.

uniqueKey


id

Equivalent to the primary key of the document.

Field to use to determine and enforce document uniqueness. Unless this field is marked with required="false", it will be a required field


Note: We've only covered the most commonly-used configuration elements. The Solr Wiki has an extensive list of config elements in schema.xml.