Bruno Roustant

Paul Anderson photo

Principal Software Engineer

Bruno is a principal software engineer at Salesforce, with more than 18 years experience in the Information Retrieval software industry. He joined the Salesforce Search team in the Grenoble (France) R&D office in 2013. Since then he has deeply dived into Apache Lucene and Solr internals to tackle challenging performance and scalability issues. He co-authored the Activate 2018 talk “Query Hundreds of Fields at scale”. He authored search patents around federated search, result clustering, semi-supervised machine learning, data structures for hash collision resolution and subset matching. He contributed open source improvements to Solr's query elevation component.

Bruno Roustant is speaking at the following session/s

A Journey to Write a New Lucene PostingsFormat

Thursday | 11:35AM - 12:15PM | Columbia 8

Some hard technical challenges can be more adequately solved if you are willing to change the foundations. We had a use case of searching many fields with strong constraints on memory and performance. A custom PostingsFormat allowed for a solution with greater efficiencies than our prior solution. We developed a new Lucene PostingsFormat (a low level and technical part of Lucene) deployed it at a very large scale, open-sourced it, and learned a lot in the process! We've seen other use-cases for a custom PostingsFormat shared with the open-source community, like one optimized for both leading and trailing wildcard queries.This presentation shares our experience, and showcases ours — “UniformSplit”. Having a posting format supporting efficiently a massive number of fields opens the path to machine learned ranking models based on numerous fields influence at a massive scale.

Attendee Takeaway
We learned a lot during the journey to develop the posting format, especially about micro-benchmarking, java memory consumption, compact data representation and high performance Lucene indices This presentation is a good medium to share what we learned, the tips and the pitfalls we encountered

Intended Audience
It is indeed a very technical low level theme, so we will share our experience with step backwards, about the learnings on the Lucene mechanisms and the performance of accesses to the index in general.

All Levels