Radu Gheorghe

Paul Anderson photo

Software Engineer
Sematext Group Inc.

Radu Gheorghe is a search consultant, software engineer and trainer at Sematext, working mainly with Solr, Elasticsearch and logging-related projects.

Radu Gheorghe is speaking at the following session/s

Tweaking the Base Score: Lucene/Solr Similarities Explained

Thursday | 1:30PM - 2:10PM | Jefferson East

Lucene has a lot of options for configuring similarity, and Solr inherits them. Similarity makes the base of your relevancy score: how similar is this document to the query? The default similarity (BM25) is a good start, but you may need to tweak it for your use-case. In this session, you will learn how BM25 works and how you may want to change its parameters. Then, we'll move to other similarity classes: DFR, DFI, IB and LM. You will learn the thinking behind them, how that thinking translates to the similarity score, and which parameters allow you to tweak how score evolves based on things like term frequency or document length. By the end, you’ll have a good understanding of which similarity options are likely to work well for your use-case. You'll know which tunables are available and whether you need to implement a custom similarity class. As an example, we’ll focus on E-commerce, where you often end up ignoring term frequency altogether.

Attendee Takeaway
1) What are the built-in Lucene/Solr similarities and what they do
2) Which similarity to use for which use-case
3) How to use a custom similarity class in Solr

Intended Audience 
Lucene/Solr users interested in how scoring works, the ideas behind default scoring options and how to configure them

Level:
All Levels