Search Principal Architect
Yonik Seeley is speaking at the following session/s
SolrCloud in Public Cloud: Scaling Compute Independently from Storage
To support elastic and cost-effective running of SolrCloud at scale in public cloud, Salesforce is implementing a separation of Compute (Solr servers) from Storage (search indexes persisted in S3 or GCS). Under low load a SolrCloud cluster can then be shrunk by shutting down servers while maintaining access to all indexes. When load is high, additional servers brought up can spin up additional replicas on demand, even for shards not present on any other server, by fetching the index from Storage. Indexing updates are pushed to the shared storage and from there are available to other replicas.
This talk describes the work needed for SolrCloud to support shared replica storage and highlights a few major challenges in this implementation, including guaranteed writes and overwrites to a shared storage by multiple writers, dealing with large numbers of collections, shards and replicas, and implementing efficient indexing without data loss.
Running SolrCloud in Public Cloud is the future. This presentation and the code that will be contributed back to the community will allow such clusters to be highly efficient, scalable and elastic. Attendees will understand the challenges and potential of sharing index data between servers.
Technical audience and Solr developers will be interested to learn the details about the approach. Users of SolrCloud might be interested in the opportunities such a setup opens and will be able to understand what it would mean for them (and also contribute use cases).