Thursday, October 13 • 2:20pm - 3:00pm
Tuning Solr and its Pipeline for Logs
This is an updated talk about how to use Solr for logs and other time-series data, like metrics and social media. In 2016, Solr, its ecosystem, and the operating systems it runs on have evolved quite a lot, so we can now show new techniques to scale and new knobs to tune.

We'll start by looking at how to scale SolrCloud through a hybrid approach using a combination of time- and size-based indices, and also how to divide the cluster in tiers in order to handle the potentially spiky load in real-time. Then, we'll look at tuning individual nodes. We'll cover everything from commits, buffers, merge policies and doc values to OS settings like disk scheduler, SSD caching, and huge pages.

Finally, we'll take a look at the pipeline of getting the logs to Solr and how to make it fast and reliable: where should buffers live, which protocols to use, where should the heavy processing be done (like parsing unstructured data), and which tools from the ecosystem can help.

Radu Gheorghe

Software Engineer, Sematext Group
Search consultant and software engineer at Sematext. On the consulting side, working mainly with Solr, Elasticsearch and logging-related projects. Engineering work goes mostly to Logsene, our logging SaaS. Authored the Working with Elasticsearch video course and co-authored Elasticsearch in Action.
Rafał Kuć

Software Engineer, Sematext Group
Rafał, in his professional life is a Sematext trainer, consultant and a software engineer, http://solr.pl co-founder and the Solr Cookbook and Elasticsearch Server books author. In his personal life Rafał is a father and a husband.

Thursday October 13, 2016 2:20pm - 3:00pm
Commonwealth Sheraton Boston