How to create and search a Lucene.Net index in 4 simple steps using C#, Step 1

As mentioned in a previous blog, using Lucene.Net to create and search an index was quick and easy. Here I will show you in these 4 steps how to do it.

Create an index
Build the query
Perform the search
Display the results

Before we get started I wanted to mention that Lucene.Net was originally designed for Java. Because of this I think the creators used some classes in Lucene that already exist in the .Net framework. Therefore, we need to use the entire path to the classes and methods instead of using a directive to shorten it for us.

Meaning, instead of using a directive and then referencing the method or class directly in our code like this:
using Lucene.Net.Analysis; Directory directory = FSDirectory.Open(directoryPath);

Lucene.Net.Store.Directory directory = Lucene.Net.Store.FSDirectory.Open(directoryPath);
Or
using LuceneStore = Lucene.Net.Store; LuceneStore.Directory directory = LuceneStore.FSDirectory.Open(directoryPath);
Create an index

The libraries needed to create an index are the Directory, Analyzer, IndexWriter, Document and Field.
LuceneStore.Directory directory = LuceneStore.FSDirectory.Open(directoryPath); Analyzer analyzer = new StandardAnalyzer(LuceneUtil.Version.LUCENE_29); IndexWriter writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); ………… Document doc = new Document(); doc.Add(new Field("TERM", "TERMVALUE", Field.Store.NO, Field.Index.ANALYZED)); writer.AddDocument(doc); ………… writer.Optimize(); writer.Close();
The directoryPath variable identifies which directory you want to index. The analyzer is used to remove ‘noise words’ like and, the, of, but, etc.… You can pass in a language specific analyzer if needed. Default is English. The IndexWriter is the class that will write your index. The ‘true’ parameter here is saying that I want a new index file created instead of updating the existing one. The writer writes the document to the index file which will later be searched. The index consists of a group of documents, which contain fields which contain terms as you see in the below image.

The parameters for adding a Field are available in the source. Below is my flavor of them.

Field.Store.NO	Not stored in the index, will not show in results
Field.Store.YES	Stored in the index and can be viewed in results
Field.Store.COMPRESS	Store original field in a compressed format

Field will be searchable, no Analyzer and not boosting

Field.Index.ANALYZED	Field will be searchable and will use the Analyzer
Field.Index.ANALYZED_NO_NORMS	Field will be searchable, no document boosting
Field.Index.NO	Field will not be searchable
Field.Index.NOT_ANALYZED	Field will be searchable but no Analyzer used
Field.Index.NOT_ANALYZED_NO_NORMS

Do not store list of terms and number of occurrences

Field.TermVector.WITH_OFFSETS

Field.TermVector.NO
Find difference between terms in similar documents
Field.TermVector.WITH_POSITIONS	Find relative position of term in document
Field.TermVector.WITH_POSITIONS_OFFSETS	Combine definition for OFFSETS and POSITIONS
Field.TermVector.YES	Store list of terms and number of occurrences

TermVector interested me a lot, but I won’t try an repeat an already cool description of it here.

Referring to the code where the documents are being written, you will notice some dots (……..). You would probably want to put the code segment between the dots in a for or for-each block to add each of your fields to the document before adding it to the index. As well, you may have multiple documents to add to an index and therefore need a parent for or for-each block to write all the documents. For example you want to index all the files within a directory or all the rows returned from a database query. Something like below,
foreach (IList value in preparedToIndex) { Document doc = new Document(); foreach (string name in selectStatement) { doc.Add(new Field(selectedName, value[cntr].ToString(), Field.Store.NO, Field.Index.ANALYZED)); } doc.Add(new Field("TERM", "TERMVALUE", Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES)); writer.AddDocument(doc); } writer.Optimize(); writer.Close();
Once you have done the above, you have built a searchable index using Lucene.Net.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Best C# Programmer In The World - Benjamin Perkins

articles about C# and numerous other technologies

Leave a Comment Cancel reply