Juan Camilo Rodriguez Duran
Juan Camilo Rodriguez Duran is speaking at the following session/s
A Journey to Write a New Lucene PostingsFormat
Some hard technical challenges can be more adequately solved if you are willing to change the foundations. We had a use case of searching many fields with strong constraints on memory and performance. A custom PostingsFormat allowed for a solution with greater efficiencies than our prior solution. We developed a new Lucene PostingsFormat (a low level and technical part of Lucene) deployed it at a very large scale, open-sourced it, and learned a lot in the process! We've seen other use-cases for a custom PostingsFormat shared with the open-source community, like one optimized for both leading and trailing wildcard queries.This presentation shares our experience, and showcases ours — “UniformSplit”. Having a posting format supporting efficiently a massive number of fields opens the path to machine learned ranking models based on numerous fields influence at a massive scale.
We learned a lot during the journey to develop the posting format, especially about micro-benchmarking, java memory consumption, compact data representation and high performance Lucene indices This presentation is a good medium to share what we learned, the tips and the pitfalls we encountered
It is indeed a very technical low level theme, so we will share our experience with step backwards, about the learnings on the Lucene mechanisms and the performance of accesses to the index in general.