Staff Software Engineer
Suyash is working in search for the last six years. He currently leads the search data platform team at Wayfair. His areas of interest include NLP, search relevance, scaling SolrCloud, Solr performance tuning and ingesting data at large scale.
Suyash Sonawane is speaking at the following session/s
Reduce Update Latency and Build a Robust Pipeline Architecture for Ingesting at Scale
These days it is not uncommon for search engines to ingest billions of data updates per day. As our data scales, it becomes challenging to keep update delays in Solr to the minimum at such volume without affecting search accuracy. At Wayfair, upon facing the same problem, we came up with an architecture that cut down our update latency from hours to seconds. We are replacing our traditional ETL pipeline by building a new data ingestion platform for Solr based on an event-driven streaming architecture. In this session, we will explain how the data pipelines built on this new platform are helping us push updates to SolrCloud much faster than ever. This new platform has fundamentally changed the way we looked at data ingestion challenge in Solr. However, it did not happen without its own challenges. We will share best practices, talk about lessons learned from our experience of building this platform to help you understand the practical aspects of implementing it.
We will introduce the interesting problem of updating multiple search indexes at scale when multiple data sources are changing at different rates. We will discuss different approaches we considered to solve this problem, the challenges we faced and why at Wayfair, we picked this architecture.
This session is for anyone using Solr who cares about data update latency at scale. Are you ingesting data from different sources that produce at different rates? Are you pulling data from a monolithic source using traditional ETL pipeline? Chances are, you may be playing catch-up, as we did at one point, with constantly changing data in order to make it searchable instantaneously. If you could relate to any of these situations, this session will have many takeaways including useful ideas and architectural patterns that you can apply to your own situation.