Juan Camilo Rodriguez Duran

Paul Anderson photo

Software Engineer

Juan Camilo Rodriguez Duran is speaking at the following session/s

A Journey to Write a New Lucene PostingsFormat

Thursday | 11:35AM - 12:15PM | Columbia 8

Some hard technical challenges can be more adequately solved if you are willing to change the foundations. We had a use case of searching many fields with strong constraints on memory and performance. A custom PostingsFormat allowed for a solution with greater efficiencies than our prior solution. We developed a new Lucene PostingsFormat (a low level and technical part of Lucene) deployed it at a very large scale, open-sourced it, and learned a lot in the process! We've seen other use-cases for a custom PostingsFormat shared with the open-source community, like one optimized for both leading and trailing wildcard queries.This presentation shares our experience, and showcases ours — “UniformSplit”. Having a posting format supporting efficiently a massive number of fields opens the path to machine learned ranking models based on numerous fields influence at a massive scale.

Attendee Takeaway
We learned a lot during the journey to develop the posting format, especially about micro-benchmarking, java memory consumption, compact data representation and high performance Lucene indices This presentation is a good medium to share what we learned, the tips and the pitfalls we encountered

Intended Audience
It is indeed a very technical low level theme, so we will share our experience with step backwards, about the learnings on the Lucene mechanisms and the performance of accesses to the index in general.

