Search and AI at Bloomberg
Bloomberg is building the world's most trusted information network for financial professionals. With this vast network comes the challenge of making that information is easily accessible for readers through an efficient search engine. Built with Apache Solr, Bloomberg’s search must unite information spread over a thousand Solr clouds across hundreds of machines.
In advance of Bloomberg’s talk at ACTIVATE, we spoke with Ken LaPorte, Team Lead, Search Infrastructure, on how overcoming these challenges at Bloomberg has led to innovations for the Solr community.
Why is search important at Bloomberg?
Bloomberg’s ability to quickly and accurately deliver data, news, and analytics to our clients -- finance professionals around the globe -- is critical to the company, as well as the broader global capital markets as a whole. Using Apache Solr to create highly-performant and accurate information retrieval systems in a low latency environment where time really equals money is a crucial piece of our overall strategy.
How is the search team at Bloomberg growing?
There are a number of search teams at Bloomberg. The Search Infrastructure team that I run has quadrupled in size over the past few years. In addition, a variety of new search application teams were created to handle the complexities of specific use cases. For example, our News Search team in London has very challenging indexing SLAs that they have deployed Solr to meet, while our New York-based Bloomberg Law (BLAW) team is using NLP and query intent classification to improve the user experience for legal professionals conducting legal research.
What new technologies are you most excited about?
Our team has been focusing on two challenges. The first is how to run Solr effectively within Kubernetes. This includes how to stand up Solr Clouds, manage persistent storage, collect metrics, and ensure effective security. On a separate plane, we’re working with our Communication Channels engineering team to index Bloomberg’s email and Instant Bloomberg (IB) chat messages in a secure and efficient manner.
Can you tell us about search-related innovations that you have implemented?
A lot of our focus this year has been on how to run Solr in Kubernetes. To that end, we published an open source Solr Operator in June that helps with the management and deployment of Solr Clouds within Kubernetes clusters. We’ve also contributed to the Kubernetes operator for Apache ZooKeeper. Separately, we’ve rewritten the Solr metrics collector, as we found it didn’t work well at our scale. Of course, these are in addition to the numerous contributions Bloomberg has made to Solr over the years, including the Learning-to-Rank (LTR) plugin, streaming expressions, and the distributed analytics component.
Can you tell us your plans, if any, to implement AI technologies?
AI and machine learning are a core focus for many engineering teams at Bloomberg. From a search perspective, we frequently use entity recognition and extraction on the indexing side, along with NLP with query intent classification and Learning-to-Rank on the query side. My Search Infrastructure team is working with our Data Science team to expose our Solr datasets to our engineers and data scientists via Jupyter notebooks. From there, they will be able to do all sorts of interesting manipulations and analyses, including merging disparate data sets across different data stores.
How do you engage with and support the open source community?
Our chief engagement is through our open source commitments. Bloomberg has made a major commitment to the open source community, especially with regards to Solr.Our goal is to listen to the ideas and business requirements coming from our clients across Engineering and turn them into Solr features. Some of those features include Solr streaming expressions, the LTR plugin, and the distributed analytics component -- all of which are serving live traffic to Bloomberg’s clients around the globe. We also make it a point to try and share our experiences through technical talks and presentations at various meetups and conferences across the search community.
How have you seen the ACTIVATE community evolve over the years?
This is an interesting question. Considering how Solr itself has evolved over the years, I think ACTIVATE & Lucene/Solr Revolution evolved in parallel. When I first started attending this conference, there were a lot of presentations about new features or use cases which were especially challenging to tackle. More recently, it seems like those lower-level problems seem less interesting and talks are more centered around higher-level concepts and technologies, such as Kubernetes.
What are you excited to learn about at ACTIVATE this year?
One area we are really excited about is running Solr on Kubernetes. We created the Kubernetes Operator for Solr and published it as an open source tool back in June and we’ve received some great feedback about it so far. I’d love to hear from other people about what their experiences have been, what technical challenges they’ve run into, and how they overcame them so we can bring those lessons back to Bloomberg.
Want to learn more from Bloomberg? Join Software Engineer Houston Putman’s talk at ACTIVATE Running Solr within Kubernetes at Scale, Wednesday Sept 11 at 2:20pm.
Ken Laporte, Team Lead, Search Infrastructure, Bloomberg
Ken leads the search infrastructure team at Bloomberg, providing a search as a service platform for 400+ applications. Ken has been active in the search domain for 8 years and has worked on a wide variety of search problems including e-commerce, geospatial, analytical, and free text.
Achieve Digital Transformation With CADnection