How to create and search a Lucene.Net index in 4 simple steps using C#, Step 1

As mentioned in a previous blog, using Lucene.Net to create and search an index was quick and easy. Here I will show you in these 4 steps how to do it.

  • Create an index
  • Build the query
  • Perform the search
  • Display the results

Before we get started I wanted to mention that Lucene.Net was originally designed for Java. Because of this I think the creators used some classes in Lucene that already exist in the .Net framework. Therefore, we need to use the entire path to the classes and methods instead of using a directive to shorten it for us.

Meaning, instead of using a directive and then referencing the method or class directly in our code like this:

using Lucene.Net.Analysis;
Directory directory = FSDirectory.Open(directoryPath);
Lucene.Net.Store.Directory directory = 
    Lucene.Net.Store.FSDirectory.Open(directoryPath);

Or

using LuceneStore = Lucene.Net.Store;
LuceneStore.Directory directory = 
        LuceneStore.FSDirectory.Open(directoryPath);

Create an index

The libraries needed to create an index are the Directory, Analyzer, IndexWriter, Document and Field.

LuceneStore.Directory directory = 
        LuceneStore.FSDirectory.Open(directoryPath);
Analyzer analyzer = 
        new StandardAnalyzer(LuceneUtil.Version.LUCENE_29);
IndexWriter writer = 
        new IndexWriter(directory, analyzer, true, 
IndexWriter.MaxFieldLength.UNLIMITED);
…………
Document doc = new Document();
doc.Add(new Field("TERM", 
                  "TERMVALUE", 
                  Field.Store.NO, 
                  Field.Index.ANALYZED));
writer.AddDocument(doc);
…………
writer.Optimize();
writer.Close();

The directoryPath variable identifies which directory you want to index. The analyzer is used to remove ‘noise words’ like and, the, of, but, etc.… You can pass in a language specific analyzer if needed. Default is English. The IndexWriter is the class that will write your index. The ‘true’ parameter here is saying that I want a new index file created instead of updating the existing one. The writer writes the document to the index file which will later be searched. The index consists of a group of documents, which contain fields which contain terms as you see in the below image.

image

The parameters for adding a Field are available in the source. Below is my flavor of them.

Field.Store.NO Not stored in the index, will not show in results
Field.Store.YES Stored in the index and can be viewed in results
Field.Store.COMPRESS Store original field in a compressed format

   Field will be searchable, no Analyzer and not boosting

Field.Index.ANALYZED Field will be searchable and will use the Analyzer
Field.Index.ANALYZED_NO_NORMS Field will be searchable, no document boosting
Field.Index.NO Field will not be searchable
Field.Index.NOT_ANALYZED Field will be searchable but no Analyzer used
Field.Index.NOT_ANALYZED_NO_NORMS

   Do not store list of terms and number of occurrences

Field.TermVector.WITH_OFFSETS

Field.TermVector.NO
Find difference between terms in similar documents
Field.TermVector.WITH_POSITIONS Find relative position of term in document
Field.TermVector.WITH_POSITIONS_OFFSETS Combine definition for OFFSETS and POSITIONS
Field.TermVector.YES Store list of terms and number of occurrences

TermVector interested me a lot, but I won’t try an repeat an already cool description of it here.

Referring to the code where the documents are being written, you will notice some dots (……..). You would probably want to put the code segment between the dots in a for or for-each block to add each of your fields to the document before adding it to the index. As well, you may have multiple documents to add to an index and therefore need a parent for or for-each block to write all the documents. For example you want to index all the files within a directory or all the rows returned from a database query. Something like below,




foreach (IList value in preparedToIndex)
{
 Document doc = new Document();
 foreach (string name in selectStatement)
 {
  doc.Add(new Field(selectedName, value[cntr].ToString(), 
              Field.Store.NO, Field.Index.ANALYZED));                
 }
  doc.Add(new Field("TERM", "TERMVALUE", Field.Store.YES, 
            Field.Index.ANALYZED, Field.TermVector.YES));
  writer.AddDocument(doc);
 }
writer.Optimize();
writer.Close(); 

Once you have done the above, you have built a searchable index using Lucene.Net.




Leave a Comment

Your email address will not be published.