As mentioned in a previous blog, using Lucene.Net to create and search an index was quick and easy. Here I will show you in these 4 steps how to do it.
- Create an index
- Build the query
- Perform the search
- Display the results
Before we get started I wanted to mention that Lucene.Net was originally designed for Java. Because of this I think the creators used some classes in Lucene that already exist in the .Net framework. Therefore, we need to use the entire path to the classes and methods instead of using a directive to shorten it for us.
Meaning, instead of using a directive and then referencing the method or class directly in our code like this:
using Lucene.Net.Analysis;
Directory directory = FSDirectory.Open(directoryPath);
Lucene.Net.Store.Directory directory =
Lucene.Net.Store.FSDirectory.Open(directoryPath);
Or
using LuceneStore = Lucene.Net.Store;
LuceneStore.Directory directory =
LuceneStore.FSDirectory.Open(directoryPath);
Create an index
The libraries needed to create an index are the Directory, Analyzer, IndexWriter, Document and Field.
LuceneStore.Directory directory =
LuceneStore.FSDirectory.Open(directoryPath);
Analyzer analyzer =
new StandardAnalyzer(LuceneUtil.Version.LUCENE_29);
IndexWriter writer =
new IndexWriter(directory, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED);
…………
Document doc = new Document();
doc.Add(new Field("TERM",
"TERMVALUE",
Field.Store.NO,
Field.Index.ANALYZED));
writer.AddDocument(doc);
…………
writer.Optimize();
writer.Close();
The directoryPath variable identifies which directory you want to index. The analyzer is used to remove ‘noise words’ like and, the, of, but, etc.… You can pass in a language specific analyzer if needed. Default is English. The IndexWriter is the class that will write your index. The ‘true’ parameter here is saying that I want a new index file created instead of updating the existing one. The writer writes the document to the index file which will later be searched. The index consists of a group of documents, which contain fields which contain terms as you see in the below image.
The parameters for adding a Field are available in the source. Below is my flavor of them.
Field.Store.NO | Not stored in the index, will not show in results |
Field.Store.YES | Stored in the index and can be viewed in results |
Field.Store.COMPRESS | Store original field in a compressed format |
Field will be searchable, no Analyzer and not boosting
Field.Index.ANALYZED | Field will be searchable and will use the Analyzer |
Field.Index.ANALYZED_NO_NORMS | Field will be searchable, no document boosting |
Field.Index.NO | Field will not be searchable |
Field.Index.NOT_ANALYZED | Field will be searchable but no Analyzer used |
Field.Index.NOT_ANALYZED_NO_NORMS |
Do not store list of terms and number of occurrences
Field.TermVector.WITH_OFFSETS
Field.TermVector.NO | |
Find difference between terms in similar documents | |
Field.TermVector.WITH_POSITIONS | Find relative position of term in document |
Field.TermVector.WITH_POSITIONS_OFFSETS | Combine definition for OFFSETS and POSITIONS |
Field.TermVector.YES | Store list of terms and number of occurrences |
TermVector interested me a lot, but I won’t try an repeat an already cool description of it here.
Referring to the code where the documents are being written, you will notice some dots (……..). You would probably want to put the code segment between the dots in a for or for-each block to add each of your fields to the document before adding it to the index. As well, you may have multiple documents to add to an index and therefore need a parent for or for-each block to write all the documents. For example you want to index all the files within a directory or all the rows returned from a database query. Something like below,
foreach (IList value in preparedToIndex)
{
Document doc = new Document();
foreach (string name in selectStatement)
{
doc.Add(new Field(selectedName, value[cntr].ToString(),
Field.Store.NO, Field.Index.ANALYZED));
}
doc.Add(new Field("TERM", "TERMVALUE", Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.YES));
writer.AddDocument(doc);
}
writer.Optimize();
writer.Close();
Once you have done the above, you have built a searchable index using Lucene.Net.