Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, October 11
 

9:00am

Preconference Training: Solr and Big Data
NOTE: Only those registered for the "Two-Day Solr Training plus Conference" registration option will be permitted to attend this course. Training course must be selected at time of registration.

Solr and Big Data

Solr has been integrated into the stack of almost every major Hadoop and NoSQL vendor—including Hortonworks, Cloudera, MapR, DataStax, and Couchbase. This two-day course provides hands-on experience with the interplay of Solr and big data technologies using specific use cases such as enterprise search with large document sizes and volumes, and log analytics with near real-time requirements. Topics covered include:

  • High-level introductions to Solr, Hadoop, and NoSQL
  • Indexing to Solr from Hadoop using the Lucidworks connector
  • The MapReduceIndexer tool that creates and stores Solr/Lucene indexes on HDFS
  • High-volume indexing using Flume
  • The Couchbase connector
  • Applications involving Solr and other NoSQL databases
Pre-requisites Solr Unleashed or equivalent experience, as well as some high-level familiarity with Hadoop and NoSQL. Hands-on labs will abstract the setting up of Hadoop and NoSQL databases, and focus more on how they connect and interplay with Solr.

Tuesday October 11, 2016 9:00am - Wednesday October 12, 2016 5:00pm
TBA

9:00am

Preconference Training: Solr Under the Hood
NOTE: Only those registered for the "Two-Day Solr Training plus Conference" registration option will be permitted to attend this course. Training course must be selected at time of registration.

Solr Under the Hood: Everything You Always Wanted to Know About Solr

To build and troubleshoot large-scale search applications, developers need to understand topics of increasing complexity within Solr. This two-day course covers topics such as:

  • Underlying architecture
  • Performance, sizing, and scaling of a Solr cluster
  • Garbage collection
  • Sharding
  • Caching, latency, and near real-time requirements
  • Field-level analysis chains
  • Commonly encountered scenarios with troubleshooting guidelines
Who Should Attend?

This hands-on course is for advanced professionals who want to move from Solr developers to Solr rock stars.

Pre-requisite

Solr Unleashed or at least 1-2 years of hands-on experience developing applications in Solr.


Tuesday October 11, 2016 9:00am - Wednesday October 12, 2016 5:00pm
TBA

9:00am

Preconference Training: Solr Unleashed
NOTE: Only those registered for the "Two-Day Solr Training plus Conference" registration option will be permitted to attend this course. Training course must be selected at time of registration.

Solr Unleashed: A Hands-On Workshop for Building Killer Search Apps

Starting with the fundamentals of search and Apache Solr, this two-day course covers:

  • Setting up a Solr cluster
  • Creating and updating schemas
  • Indexing data
  • Searching
  • Tuning relevance
  • Extended features such as geospatial search, spell checking, highlighting, etc.
  • Analytics and visualizations using Solr

Solr Unleashed is intended to provide the grounding required to build rock solid and scalable search applications that provide relevant results.

Who Should Attend?

This hands-on course is intended for developers, and is primarily designed for people who have experience developing web applications. However, by providing an excellent introduction to Solr, it brings value to all professionals who will be involved in architecting, evaluating, deploying, or managing applications based on Solr. It serves as a pre-requisite for most of our other courses.


Tuesday October 11, 2016 9:00am - Wednesday October 12, 2016 5:00pm
TBA
 
Wednesday, October 12
 

9:00am

Preconference Training: Getting Started with Lucidworks Fusion
NOTE: Only those registered for the "One-Day Fusion Training plus Conference" registration option will be permitted to attend this course.

Getting Started With Lucidworks Fusion

This one-day course provides a technical overview of Fusion– Lucidworks new product that makes getting the most out of Solr that much easier. Attendees will learn:

  • How Fusion overlays on top of Solr
  • How to set up collections and data connectors using Fusion
  • How to modify the data stream going into Solr via Fusion index pipelines
  • How queries can be improved and marked up prior to being sent to Solr using Fusion’s query pipelines
  • How to set up security and group authentication using Fusion’s built-in security infrastructure
  • How to use and interpret Fusion’s Signals feature to close the feedback loop between search users and the search engine
  • How to use the Relevancy Workbench, designed to give business users an easy entry into search engine customization
Pre-requisites

Solr Unleashed or equivalent experience. This course is targeted at developers and power users who know Solr, but want to learn Fusion.



Wednesday October 12, 2016 9:00am - 5:00pm
TBA

5:00pm

Welcome Reception
Join us in the Grand Ballroom Prefunction area to check out our sponsors, network with committers and attendees, and enjoy hors d'oeuvres and drinks to kick off the conference.

Wednesday October 12, 2016 5:00pm - 7:00pm
Grand Ballroom Prefunction
 
Thursday, October 13
 

8:30am

Opening Remarks
Speakers
avatar for Will Hayes

Will Hayes

CEO, Lucidworks
Will Hayes is CEO of Lucidworks. Previously, Will has worked at big data analysis company Splunk and biotech firm Genentech.


Thursday October 13, 2016 8:30am - 9:00am
Grand Ballroom Sheraton Boston

9:00am

Search++: Cognitive transformation of human-system interaction
As developers, we build and optimize search based applications using Solr at scale on a daily basis.  Users demand that we think differently about how they want to engage with their favorite applications, browsers, and embodied systems to retrieve information that is relevant and personalized.  In this session, we'll discuss experiences on approaches we've taken using IBM Watson to filter signal from noise, tailor ranking of results, influence search results through learning, and close explicit and implicit feedback loops through the use of cognitive services.

Speakers
avatar for Sridhar Sudarsan

Sridhar Sudarsan

CTO, IBM Watson Product Management and Partnerships, IBM Watson
Sridhar Sudarsan is a Distinguished Engineer and CTO, IBM Watson Product Management and Partnerships. He is responsible for the technology strategy of the IBM Watson platform, solutions and partnerships.  Prior to this, he launched and drove the technical strategy for the Watson Ecosystem - a partnership program bringing Watson services, tools and expertise to ISVs, start ups, universities and other partners across industries. He provides... Read More →


Thursday October 13, 2016 9:00am - 9:45am
Grand Ballroom Sheraton Boston

9:45am

Break
Thursday October 13, 2016 9:45am - 10:00am
TBA

10:00am

Leveraging the power of Solr with Spark
Solr is a distributed NoSQL database with impressive search capabilities. Spark is the new megastar in the distributed computing universe. In this code-intense session we show you how to combine both to solve real-time search and processing problems. We will show you how to set up a Solr/Spark combination from scratch and develop first jobs with runs distributed on shared Solr data. We will also show you how to use this combination for your next-generation BI platform.

Speakers
avatar for Johannes Weigend

Johannes Weigend

CTO, QAware GmbH
Johannes works as a software architect with Java since 1999 and was honoured as "Java Rockstar" at JavaOne 2015. He is a lecturer at the University of Applied Sciences in Rosenheim, Germany and technical director at QAware, a decorated software engineering company located in Munich, Germany. QAware works for enterprises like BMW, Allianz, German Telecom and others.


Thursday October 13, 2016 10:00am - 10:40am
Commonwealth Sheraton Boston

10:00am

Why is my Solr slow?! (An HTrace Case Study)
Attempting to diagnose distributed system slowness can be one of the most challenging and headache inducing activities for an operations team. Armed with a suite of low level metrics and host monitoring, the first and only option often available is a lengthy process of elimination of each component in the hardware and software stack.

In this session we will learn how to instrument Solr to send distributed tracing data to HTrace. We will then look at some sample traces and learn how to identify slowness in different parts of the entire stack, looking for trends and outliers in the Solr operation. We will complete the session by discussing how to add tracing to your existing client applications for true end-to-end visibility into the performance of your cluster.

Speakers
MD

Mike Drob

Software Engineer, Cloudera
Mike has been immersed in Big Data for over 5 years, previously with the US Government and now with Cloudera. | | His current role is to provide operational support for Apache Solr, a world-class search engine built on top of Apache Lucene. He is also a hobbyist contributor to several other open source projects including Apache Curator, Apache Accumulo, Apache NiFi, JUnit, JLine, and JCommander. | | When not coding, he likes to mentor... Read More →


Thursday October 13, 2016 10:00am - 10:40am
Gardner

10:00am

Hidden Gems of Apache Solr
Every day billions of documents are searched, sorted, faceted and highlighted by millions of users who have no idea that behind the scenes, Apache Solr is hard at work, making life simple for developers like you. But what else can Solr do for you?

In this session, we'll dive into some of the less well known, less understood, features of Apache Solr that even seasoned Solr developers may not be aware of -- features that can be useful in ways you might not have considered even if you do know about them, so you can take your Solr powered applications to the next level.

Speakers
avatar for Chris Hostetter

Chris Hostetter

Software Engineer, Lucidworks
Chris 'Hoss' Hostetter is a Member of the Apache Software Foundation, and a committer on the Lucene/Solr Project. Prior to joining Lucidworks in 2010 to work full time on Solr development, he spent 11 years as a Principal Software Engineer for CNET Networks thinking about searching "structured data" that was never as structured as it should have been. Hoss has presented on Apache Solr numerous times over the last 10 years including local MeetUps... Read More →


Thursday October 13, 2016 10:00am - 10:40am
Back Bay B Sheraton Boston

10:00am

Smart Facets at Rakuten
Facets are critical for the user shopping experience. Many sites only sort facets based on counts or names, and not necessarily what is the best facet for the query. The Rakuten Big Data search team has implemented a solution that will give users the best possible facets for their queries. We will step through our solution and other solutions we have tried/could try and go over the pros and cons of each.

Speakers
avatar for Mike Pellegrini

Mike Pellegrini

Software Developer, Rakuten
Mike is a software developer at Rakuten who works on core search platform development. His work involves supporting & extending Solr to meet the needs of multinational e-commerce. He also works on tools to support the management of large Solr clusters. Previously, Mike worked as a consultant for companies interested in implementing custom search solutions based on Solr. In that role, he helped guide companies in creating a search... Read More →
avatar for Keith Thoma

Keith Thoma

Senior Software and Delivery Engineer, Rakuten
Keith Thoma is a Senior Software and Delivery Engineer at Rakuten USA. He has been working with SOLR for over three years, search for 6 years, and big data for over 8 years. His primary role is to develop search and data solutions for Rakuten subsidiaries as part of the Americas Big Data team. This includes tasks such as relevancy tuning, NLP, and platform migrations. The team has successfully launched search and data projects in the United... Read More →


Thursday October 13, 2016 10:00am - 10:40am
Back Bay A Sheraton Boston

10:00am

Solr Highlighting at Full Speed
Searching over a large corpus of legal documents brings about a number of unique challenges in search. In legal search, recall matters. Users often enter broad queries and leverage digests to help them determine the relevancy of a result before committing to reading a long document. This has made highlighting quality and speed with minimal memory use a key requirement for Bloomberg Law. In this talk, attendees will learn about Bloomberg Law's efforts to improve highlighting performance dramatically via the introduction of a new highlighter for Solr that uses your index to the best of its advantage.

Speakers
avatar for Timothy Rodriguez

Timothy Rodriguez

Team Lead, Bloomberg
Timothy Rodriguez leads the Verticals Search Platform team at Bloomberg which provides the underlying search platform for several Bloomberg products in the areas of law, government, and new energy finance. He works on a number of areas related to search such as query grammars, distributed search, large scale indexing, scoring models, text analysis, and more.
avatar for David Smiley

David Smiley

Search Developer & Consultant, D W Smiley LLC
David Smiley is a well recognized Apache Lucene/Solr expert. He wrote the first book on Solr (currently in 3rd edition), he's a Lucene/Solr committer and PMC member that improves Lucene and Solr, he speaks at conferences about it, he does training, and he offers part time independent consulting services / development. Much of Lucene's spatial-extra's module was developed by David.


Thursday October 13, 2016 10:00am - 10:40am
Independence Sheraton Boston

10:50am

How Verizon Listens to its customer: Trends dashboard on Voc Data
At Verizon we get millions of calls, chats sessions, NPS feedback, surveys, online feedback, product reviews, emails and complaints. This session will go into the details of how we built a highly scalable, searchable, realtime dashboard on Sentiment scored and categorized customer feedback verbatim, using Hadoop, Lucidworks Fusion, Clarabridge, and Cloud foundry and real-time data flow. The Dashboards provide valuable insight into customer needs, pain points, intent, broken processes, brand insights, product insights, social insight etc. It helps us react to issues faster and provide the best possible service to our clients by listening to the Customer.

Speakers
avatar for Kaustubh Mishra

Kaustubh Mishra

IT Systems Architect for Voice of Customer, Verizon Wireless
Responsible to building data architecture for Verizon Corporate Initiatives for Voice of Customer, Consumer and Social insights. Also an architect on web analytics for Verizon wireless commerce team.  Previously worked on numerous reporting and data warehousing projects at Verizon. Advanced specialization in Statistics and Data Mining for Business and Marketing.


Thursday October 13, 2016 10:50am - 11:30am
Commonwealth Sheraton Boston

10:50am

Rebalance API for SolrCloud
SolrCloud with large data sets and collections usually run into unevenly balanced clusters. This causes skewed data and replica distribution across SolrCloud nodes. Automatic Node Discovery in an existing cluster does not cause data to be re-distributed. Dynamically scaling up or scaling down collections based on index/config size or cluster size is non trivial and poses operational overhead.

Rebalance API offers a flexible way of redistributing data in SolrCloud while guaranteeing zero downtime. It offers multiple scaling strategies that aid in smarter/faster index manipulation and multiple allocation strategies that offer smarter collection placement in the cluster. It also offers a platform for users to write their own scaling strategy as they see fit. The api has been built in an open source friendly fashion.

Speakers
avatar for Suruchi Shah

Suruchi Shah

Engineer, BloomReach
Suruchi is currently working on BloomReach's Commerce Search Platform. She recently graduated from her master's program from Carnegie Mellon University focusing work on Big Data Machine Learning problems, Information Retrieval and Natural Language Processing. Her past experience includes working as a software developer at Bank of America.
avatar for Nitin Sharma

Nitin Sharma

Senior Software Engineer, Netflix
Nitin has been working on solr based search platforms for over 6 years. His specializations include Distributed Systems, Performance Engineering, Search Platform.


Thursday October 13, 2016 10:50am - 11:30am
Gardner

10:50am

Anyone Can Build a Recommendation Engine with Solr
You don't need a PhD to get started with recommenders! You just need Solr! In this talk, you'll get several examples of building different recommendation strategies on top of Solr. You'll see how to deliver recommendations using user behavior, and how to combine that with content-specific signals. We'll cover:

- Folks who purchased this products also purchased
- Personalized recommendations based on past browsing history
- How Solr makes tuning relevance and scaling straight-forward

We'll also touch on many of the classic problems with recommenders, including the cold start problem and the Oprah Book Club problem. Come if you've got some Solr experience and would like to learn to build a recommender!

Speakers
avatar for Doug Turnbull

Doug Turnbull

Lead Relevance Consultant, OpenSource Connections
Lead search relevance consultant at OpenSource Connections. Author of Relevant Search. Doug impacts business's bottom line through better search, discovery, and recommendations. Doug wants to humanize search and recommendations, making it less intimidating for organizations to make impactful relevance investments. To do this, Doug leads a team of Solr, NLP, and machine learning experts that optimize relevance for clients. He also loves writing... Read More →


Thursday October 13, 2016 10:50am - 11:30am
Back Bay B Sheraton Boston

10:50am

It's Just Search
Think *inside* the box. Inside the *search* box, that is.

The "best"* search results incorporate many more factors than (just) textual matching and relevancy. Search experience owners manage query context _rules_, signals automatically feed back machine learned factors, users implicit and explicit behaviors filter and weight future interactions. Synergy emerges with several cooperating (just) searches.

This talk will showcase and detail several (just) search examples including rules, typeahead/suggest, signals, and location awareness, bringing them all together into a cohesive search experience.

Speakers
avatar for Erik Hatcher

Erik Hatcher

Lucidworks
Erik Hatcher is the co-author of “Lucene in Action” as well as co-author of “Java Development with Ant”. Erik is a cofounder of Lucidworks, an active member of the Lucene community, a leading Lucene/Solr committer, member of the Lucene/Solr Project Management Committee, member of the Apache Software Foundation as well as a frequent invited speaker at various industry events. Erik earned his B.S. in Computer Science from... Read More →


Thursday October 13, 2016 10:50am - 11:30am
Independence Sheraton Boston

10:50am

Loading 350M documents into a large Solr cluster in 8 hours or less
This session is a Case Study that shows you how a large set of xml documents can be loaded into a multi-collection Solr cluster in a fast, efficient and controlled way.

The presenter will show how Solr is used within his organization and then explains how his team started out with loading content into their SolrCloud using the standard post.jar tool, which has some concealed limitations.

You will see how this led to their current solution that exists of multiple cloud-aware "content posting" worker-processes, controlled by a clever master-less queuing system in ZooKeeper. Also, the presenter will cover how to load content into a busy Solr cluster, without affecting the response times of running queries too much.

Speakers
avatar for Dion Olsthoorn

Dion Olsthoorn

Senior Software Engineer, Wolters Kluwer
Dion Olsthoorn works as a Software Engineer for Wolters Kluwer, a publisher for professional content. | He’s currently working on Ovid®, an online information delivery platform for medical research, were he and his team are responsible for building, enhancing and maintaining a large Solr cluster. | Dion has 20+ years of experience in software development (mostly web) and is specialized in enterprise search systems.


Thursday October 13, 2016 10:50am - 11:30am
Back Bay A Sheraton Boston

11:40am

Building a Solr Continuous Delivery Pipeline With Jenkins
In this session, I will demonstrate how to build a secure continuous delivery pipeline for Solr using Jenkins and various Jenkins plugins using the installation scripts that are packaged with Solr. I'll cover how to (optionally) build Solr and deploy it using Solr's own scripts and why one might want to do this. I'll cover the fundamentals of continuous delivery and what a CD pipeline looks like using Jenkins plugins. Finally, I'll will discuss the files comprising the Solr configuration that should be version controlled separately from Solr itself and how to configure various environments using core properties.

Speakers
avatar for James Strassburg

James Strassburg

Senior Software Architect, Direct Supply
Jim Strassburg is an experienced software engineer, architect, and researcher. He has been building distributed software systems for over 15 years. In late 2012 he replaced the search engine for his company's e-commerce application with Apache Solr and got bit by the search bug. Lately he has been doing research and development in search, artificial intelligence, and big data.


Thursday October 13, 2016 11:40am - 12:20pm
Gardner

11:40am

Creating New Streaming Expressions
Streaming Expressions can be used in a wide range of applications from order tracking to bioinformatics. They can be used to do massively parallel processing across huge datasets or simple joins between sharded collections. They can be used to update collections or alert when certain documents are seen. And streaming expressions are fully extensible.

In this talk, attendees will learn the ins and outs of creating new Stream classes and how to make use of those in their own searches. We'll cover the differences between stream sources and stream decorators. We'll discuss the minimum set of functionality a stream must provide. And we'll see how to add new streaming classes to your collections' configurations. In the end, attendees will see how simple it is to create and use new Streaming classes.

Speakers
avatar for Dennis Gove

Dennis Gove

Senior Software Engineer, Bloomberg LP
Dennis Gove is a member of the Search Infrastructure team at Bloomberg LP in New York. He is a Lucene/Solr Committer and lives in Massachusetts with his wife and two kids.


Thursday October 13, 2016 11:40am - 12:20pm
Back Bay B Sheraton Boston

11:40am

SolrCloud: High Availability and Fault Tolerance
Committer Mark Miller will discuss the current SolrCloud architecture for handling disaster and recovery. This talk will cover how SolrCloud was designed to protect your data in the face of failure, some of the growing pains the system has gone through, and what is left to do in the near future when it comes to fault tolerance and recovery. Learn about the low level details that help keep your data safe as well as what choices and decisions you should make as a SolrCloud user that cares about data integrity.

Speakers
avatar for Mark Miller

Mark Miller

Software Engineer, Cloudera
Mark Miller is a Lucene / Solr committer and Apache member. After starting with Lucene in 2006, Mark has spent most his time getting paid to work on the open source software projects that he loves. Mark has given many talks on Lucene/Solr at various conferences and meet-ups around the world and is currently learning all about Hadoop as a software engineer at Cloudera.


Thursday October 13, 2016 11:40am - 12:20pm
Independence Sheraton Boston

11:40am

PlayStation and Lucene: Indexing 1 Million documents per second on 18 servers
What if I tell you that PlayStation4 is a not just a gaming console? What if I tell you that the PlayStation Network is a system that handles more than 70 million active users? What if I tell you that in order to create an awesome gaming experience, we support personalized search at scale? Finally, what if I tell you that the system that provides this personalized experience currently indexes up to 1 million documents per second using Lucene and only uses 18 mid-sized Amazon instances?

Intrigued? Join the talk to learn how it is possible!

Speakers
avatar for Alexander Filipchik

Alexander Filipchik

Principal Software Engineer, Sony Interactive Entertainment
Alex spent the last 4 years of his life building the next generation of the PlayStation Network. He is honored to be a part of the small team of engineers who managed to build a platform that scaled from 0 to 1 million users in just 1 day. This platform has been adding 1.5 million new devices per month to reach tens of millions of active users. Alex is passionate about technology, innovations, walking his dog and building scalable software... Read More →


Thursday October 13, 2016 11:40am - 12:20pm
Back Bay A Sheraton Boston

12:20pm

Lunch
Thursday October 13, 2016 12:20pm - 1:20pm
TBA

1:30pm

Searching the Enterprise Data Lake with Solr - Watch us do it!
People talk a lot about building enterprise 'data lakes'or data hubs to knock down data silos and democratize data access to different types of users. These are abstract topics. Maybe it's time to stop talking and see, practically, how this can be done!

We are currently processing and searching data from 'data lake' for a large life sciences customer and in this demo we'll show you, step by step, how this is accomplished. We'll take disparate data sources like document files and data tables; we'll show how these records can be combined, processed, prepared and indexed; and then we'll show search and visualizations on this content to provide business insight into this 'data lake'. All of this will be done with Solr Cloud.

Speakers
avatar for Paul Nelson

Paul Nelson

Chief Architect, Search Technologies
Paul was an early pioneer in the field of text retrieval and has worked on search engines for over 25 years. He was the architect and inventor of RetrievalWare, a ground-breaking natural-language based statistical text search engine which he started in 1989 and grew to $50 million in annual sales worldwide. RetrievalWare is now owned by Microsoft Corporation. During his many years in the industry, Paul has been involved in hundreds of text search... Read More →


Thursday October 13, 2016 1:30pm - 2:10pm
Commonwealth Sheraton Boston

1:30pm

Microsoft's Use of Solr to Deliver a Multitenant Log Analytics SAAS Service
We will present architecture of Search service backing Microsoft Operations Management Suite's Log Analytics Solution. With Microsoft Operations Management Suite, you can now empower operations teams to effortlessly collect, store and analyze log data from virtually any Windows Server and Linux source-regardless of volume, format or location. Separate the signal from the noise with simple, powerful log management tools and access real-time operational intelligence with improved troubleshooting, operational visibility and fast search to explore, investigate and fix incidents quickly.

Join us as we share our experience and learnings in resolving issues all over the spectrum like scalability, COGS, compliance requirements, customer data isolation, data persistence, query response streaming. Learn what it takes to run SOLR on commodity hardware as well over scaled up architecture.

Speakers
avatar for Chirag Gupta

Chirag Gupta

Software Engineer, Microsoft Corporation
Chirag Gupta is a software engineer at Microsoft Corporation. In his current role, he is responsible for building and monitoring scalable, performant, reliable, multitenant and COGs efficient platform for Microsoft Log Analytics SAAS service (Microsoft OMS). Previously, he has 15+ years of working in management in areas ranging from mobile(3G) customer management, embedded devices, and server management including extensive experience in... Read More →
avatar for Srivatsan Parthasarathy

Srivatsan Parthasarathy

Partner Software Engineer, Microsoft Corporation
Srivatsan is a Partner Software Engineer at Microsoft Corporation. In his current role, he is responsible for architecture of Operations Management Suite, a management as a Saas offering that enables customers to manage their Linux or Windows assets on any cloud.


Thursday October 13, 2016 1:30pm - 2:10pm
Gardner

1:30pm

Cross Data Center Replication for the Enterprise
This presentation is meant to explore the use of cross data center replication, now available in Solr 6, to show a real-world example running in production. Iron Mountain has now been running cross data center replication (CDCR) for over a year. We have over 100,000 users and indexes supporting 26 clouds (5 billion documents) with rapid/continuous indexing. We rely on cross data center replication for disaster recovery and backups. This allows us to maintain a 'hot' standby environment for failover. We spent considerable effort performance testing and tuning CDCR as well as determining the hardware / storage required to support the system. There are some gotchas that need to be considered, such as the amount of disk space to allow for backups, network performance, adjusting configurations and monitoring.

CDCR works much like mirroring approaches for databases, yet there are some distinct differences in how this works for Solr. The implementation of CDCR was performed by several committers which Iron Mountain engaged to develop the capability. Representatives from the team will speak to the technical approach CDCR uses, the CdcrRequestHandler and versioning approach used.

Lastly, we will cover some possible future enhancements for CDCR, including improving throughput and extending the current code base to support active/active replication between multiple data centers.

Speakers
avatar for Adam Williams

Adam Williams

Search Lead, Iron Mountain
Adam Williams has 17 years of experience as a software developer. Three years ago Adam began his journey with Solr after spending 10 years working on DOD modeling and simulation as well as Digital Asset Management projects for global pharmaceutical companies. Adam is currently working at Iron Mountain on their Record Center project allowing users to locate and order their assets. The solr cloud search system Adam architected runs on all... Read More →


Thursday October 13, 2016 1:30pm - 2:10pm
Back Bay B Sheraton Boston

1:30pm

Automotive Information Research driven by Apache Solr
We are searching the unknown. How can you find hidden and unknown relationships in unrelated relational data silos? How can you search the relevant information in a 10^56 dimensional space? How do you create a consistent yet up-to-date information network for over 20 languages on a daily basis? And how on earth do you convice IT governance to let you use Solr for this kind of job? All this sound impossible? This talk will give the answers and present a detailed case study and success story about how we used Apache Solr to build a search based business intelligence and automotive information research application for a major German car manufacturer.

Speakers
avatar for Mario-Leander Reimer

Mario-Leander Reimer

Chief Technologist, QAware GmbH
M.-Leander Reimer has studied computer science at Rosenheim and Staffordshire University and is now working as a chief technologist for QAware GmbH. He is a senior Java developer with several years of experience in designing complex and large scale system architectures. He is continuously looking for innovations and ways to combine state of the art technology and open source software components to be successfully applied in real world customer... Read More →


Thursday October 13, 2016 1:30pm - 2:10pm
Back Bay A Sheraton Boston

1:30pm

SearchHub or How to Spend Your Summer Keeping it Real
Dogfooding.  Cobbler’s shoes.  Whatever you want to call it, there’s nothing like building a real application on your own product to see the good, bad, and ugly of your own code.  In this talk, we’ll walk through SearchHub, Lucidworks’ community powered site for Apache and other open source projects that indexes hundreds of different public data sources to showcase Fusion and Solr capabilities ranging from the simple (search, faceting) to the complex (Word2Vec, Recommenders, Random Forests).  The talk will highlight key integration points between Spark and Solr and how they are leveraged to do search, recommendations, and machine learning on email and user feedback.  We’ll also cover some interesting crawling use cases as well as how to leverage Fusion’s experiment management framework to run multi-arm bandit tests.

Speakers
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant is the CTO and co-founder of Lucidworks, co-author of Taming Text, co-founder of Apache Mahout and a long-standing committer on the Apache Lucene and Solr open source projects. Grant’s experience includes engineering a variety of search, question answering, and natural language processing applications for a variety of domains and languages. He earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer... Read More →


Thursday October 13, 2016 1:30pm - 2:10pm
Independence Sheraton Boston

2:20pm

Evolving the Optimal Relevancy Scoring Model at Dice.com
A popular conference topic in recent years is using machine learned ranking (MLR) to re-rank the top results of a Solr query to improve relevancy. However, such approaches fail to first ensure that they have the optimal query configuration for their search engine, without which the re-ranked results may fail to contain the most relevant items for each query (lowering recall). Solr offers many configuration options to control how documents are ranked and scored in terms of relevancy to a user's query, including what boosts to assign to each field, and how strongly to boost phrasal matches. It is common for companies to manually tune these parameters to optimize relevancy, but this process is highly subjective and not guaranteed to produce the optimal results. We will show a data-driven approach to relevancy tuning that uses optimization algorithms, such as evolutionary algorithms, to evolve a query configuration that optimizes the relevancy of the results returned using data captured from our query logs. We will also discuss how we experimented with evolving a custom similarity algorithm to out-perform BM25 and tf.idf similarity on our dataset. Finally, we'll discuss the dangers of positive feedback loops when training machine learned ranking models.

Speakers
avatar for Simon Hughes

Simon Hughes

Chief Data Scientist, Dice.com
I am currently the Chief Data Scientist at Dice.com, the technology professional recruiting site. I am also a PhD candidate at DePaul university, getting a PhD in machine learning and natural language processing. At Dice, I have developed multiple recommender engines using Solr for recommending jobs and candidates that are currently live on our site, as well as optimizing the accuracy and relevancy of our jobs and candidates search. I have also... Read More →


Thursday October 13, 2016 2:20pm - 3:00pm
Gardner

2:20pm

Tuning Solr and its Pipeline for Logs
This is an updated talk about how to use Solr for logs and other time-series data, like metrics and social media. In 2016, Solr, its ecosystem, and the operating systems it runs on have evolved quite a lot, so we can now show new techniques to scale and new knobs to tune.

We'll start by looking at how to scale SolrCloud through a hybrid approach using a combination of time- and size-based indices, and also how to divide the cluster in tiers in order to handle the potentially spiky load in real-time. Then, we'll look at tuning individual nodes. We'll cover everything from commits, buffers, merge policies and doc values to OS settings like disk scheduler, SSD caching, and huge pages.

Finally, we'll take a look at the pipeline of getting the logs to Solr and how to make it fast and reliable: where should buffers live, which protocols to use, where should the heavy processing be done (like parsing unstructured data), and which tools from the ecosystem can help.

Speakers
avatar for Radu Gheorghe

Radu Gheorghe

Software Engineer, Sematext Group
Search consultant and software engineer at Sematext. On the consulting side, working mainly with Solr, Elasticsearch and logging-related projects. Engineering work goes mostly to Logsene, our logging SaaS. Authored the Working with Elasticsearch video course and co-authored Elasticsearch in Action.
avatar for Rafał Kuć

Rafał Kuć

Software Engineer, Sematext Group
Rafał, in his professional life is a Sematext trainer, consultant and a software engineer, http://solr.pl co-founder and the Solr Cookbook and Elasticsearch Server books author. In his personal life Rafał is a father and a husband.


Thursday October 13, 2016 2:20pm - 3:00pm
Commonwealth Sheraton Boston

2:20pm

State of Solr Security 2016
Apache Solr has, over the past 1-2 years, developed lots of security related features. This talk focuses on exploring all features available to Solr users to help them secure their Solr installations, including authentication, authorization, storage level security, Zookeeper security, security against eavesdropping network packets, document level security, etc. This talk willll consist of simple examples of how to use these security features and also explore the current challenges users, esp. enterprise users, face in securing their Solr clusters, as well as future needs of the Solr users and the road ahead.

Speakers
avatar for Ishan Chattopadhyaya

Ishan Chattopadhyaya

Software Engineer, Lucidworks
Ishan Chattopadhyaya is an engineer at Lucidworks and a contributor to Apache Solr project. Prior to working at Lucidworks, Ishan has worked on Yahoo! Search team at Multimedia Search team and Shopping Vertical Search team. Ishan started his career with MapQuest (Aol)'s search, building their single line search backend with Apache Lucene. Ishan has contributed to the development of the authentication and authorization framework of Apache Solr.


Thursday October 13, 2016 2:20pm - 3:00pm
Back Bay B Sheraton Boston

2:20pm

Near Real Time Indexing in Search
Imagine the frustration of the user, when they found their perfect wish while browsing, only to realize it later (when they clicked it) that it was out of stock or the price switched or it was not delivered at their location. This happens when the search index doesn't have the real-time availability, price and seller information. Hence it is a core challenge that an E-Commerce marketplace search engine has to solve. Regular document search index technologies (like Solr/Lucene) have trouble dealing with attributes which are in high constant flux (like availability, price) which are typically seller/listing specific attributes. In this talk, we present the challenges and our solutions for a customized search index for e-commerce addressing these challenges.

Speakers
avatar for Thejus V M

Thejus V M

Data Architect, Flipkart
Thejus is a software engineer working on the search systems at Flipkart. His work has spanned across multiple aspects of Search such as high throughput indexing, managing large scale distributed infrastructure, semantic identification, auto suggestion, scoring models and more.
UP

Umesh Prasad

SDE III, Flipkart
Umesh is a SDE -3 in Flipkart . He is the resident solr/lucene expert in Flipkart and has been instrumental in building critical frameworks and solutions for search team. Previously he built and evolved vertical search & content aggregation service for Verse Innovation. Currently busy building Flipkart's Data Platform.


Thursday October 13, 2016 2:20pm - 3:00pm
Back Bay A Sheraton Boston

2:20pm

Solr Cross-Datacenter Replication and Consistency at Scale
Replicating a SolrCloud index to multiple availability zones serves two primary purposes: redundancy for rapid disaster recovery and data locality for minimizing request latencies across different regions. These features are particularly interesting for services where SolrCloud is used as the primary data store or when availability of the search index is directly linked to uptime. In our view, a reliable cross-availability zone replication system possesses two qualities. First, it should be fault-tolerant and provide certain guarantees about data loss and availability, even in the event of a complete datacenter outage. Second, it should be capable of detecting and resolving any eventual inconsistencies. To the first of these ends, we developed a Solr plugin which uses a distributed queue to achieve non-blocking, failure-tolerant writes without compromising local indexing performance. To the second, we are currently working on a time-based Merkle tree comparison technique to detect and resolve inconsistencies during online indexing. In this talk, we will present the design of these components as well as the overall system architecture, and discuss their guarantees and limitations in the context of similar efforts in the community.

Speakers
OB

Oliver Bates

Software Engineer, Cloud Infrastructure, Apple Inc.
Oliver started his career in biomedical engineering, working on numerical modeling of complex biological systems. After several years in academia, collecting and crunching huge swaths of data, he turned his attention to distributed systems.


Thursday October 13, 2016 2:20pm - 3:10pm
Independence Sheraton Boston

3:10pm

Improving Enterprise find-ability with custom relevance models
On the surface search on the web and within the enterprise share some common characteristics. However, there are key differences that makes enterprise search a specialized domain. For each of salesforce's 150,000 customers, we enable search over highly diverse and custom data sets spanning 3 distinct forms -- CRM data in relational systems, unstructured data in content management systems and enterprise social data.

This talk shares some of the insights we’ve gleaned from building a relevance engine for the enterprise from the ground up. Specifically, we lay out the components that enable us to machine learn our ranking function from training to evaluation. We showcase the customizations applied on various boosts and query functions provided by Solr based on data type being searched for. Finally, we touch upon some of the metrics that are used to measure and optimize search relevance by document type.

Speakers
avatar for Jayesh Govindarajan

Jayesh Govindarajan

Senior Director, Search Relevance, Salesforce
Jayesh is the Senior Director of Search and Data Science at Salesforce. He joined Salesforce through the acquisition of MinHash, a data science startup he founded to focus on solving problems in entity extraction, topic classification, and trend detection on an enterprise platform that brings together machine learning, search and large scale data processing. He is also the creator of AILA, an AI driven marketing assistant that detects fast... Read More →


Thursday October 13, 2016 3:10pm - 3:50pm
Commonwealth Sheraton Boston

3:10pm

Parallel SQL and Analytics with Solr
Analytics has increasingly become a major focus for Apache Solr, the primary search engine in the Hadoop stack. This talk will cover recent Solr developments in the areas of faceting and analytics, including parallel SQL, streaming expressions, distributed join, and distributed graph queries. Given the increasing number of APIs and techniques that can be brought to bear, we'll also cover which approach should be preferred in different situations, including how to maximize scalability.

Speakers
avatar for Yonik Seeley

Yonik Seeley

Solr Dude, Cloudera
Yonik Seeley is the creator of Solr. He works at Cloudera integrating and leveraging "Big Search" technologies into the many components comprising the Cloudera enterprise data hub (EDH). Yonik was previously a co-founder of LucidWorks, and he holds a master's degree from Stanford University.


Thursday October 13, 2016 3:10pm - 3:50pm
Independence Sheraton Boston

3:10pm

Aggregations: Solrcloud/Elasticsearch, Druid or HBase
You need to build a highly scalable system for executing aggregation-queries in real-time on big-data. But you do not have several weeks to try each and every available technology that supports such queries and you are not sure which one to pick. We have taken time to build fully functional prototypes and have learned important lessons that can serve as precious time-saving guidelines while deciding about the architecture of your system.

To have an unbiased comparison, we installed each built prototype on a cluster of machines having exactly the same hardware configuration. We estimated the ingestion performance by measuring the time that each prototype needs in order to make the imported records become available for querying. We executed real-user aggregation-queries to measure the response time while simulating various ingestion loads. By increasing the number of machines that are used to run the built prototypes, we were able to estimate the ability of each technology to scale. Finally as a bonus, we will also share our subjective opinion regarding the easiness to use, flexibility, customizability and available community support for each evaluated technology.

Speakers
avatar for Dragan Milosevic

Dragan Milosevic

Chief Search Architect, Zanox AG
Dr. Dragan Milosevic is a certified Solr/Lucene, Hadoop and HBase developer and currently works as Chief Search Architect at Zanox AG. The firm has successfully implemented several Apache open-source projects for building a world-class reporting framework. He is also author of a book "Beyond Centralized Search Engines: An Agent-Based Filtering Framework," which describes the application of various machine-learning techniques for solving... Read More →


Thursday October 13, 2016 3:10pm - 3:50pm
Gardner

3:10pm

Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries
Autocomplete presents some challenges for search in that users' search intent must be matched from incomplete token queries. Many non-Latin character based languages have additional complications. The following are some of the examples of unique language-specific issues which must be addressed in search systems in order to support these languages:

- Japanese and Chinese multiple scripts (Hiragana, Katakana, Romaji, Zhuyin, Paoding)
- No token-delimiters for Japanese and Chinese

- Korean character composition

- Arabic spelling variations of the transliterated foreign words

I will talk about these challenges in detail, describe our approaches to solving them, and share some tools (queries testing framework) we used to help addressing these issues.

Speakers
avatar for Ivan Provalov

Ivan Provalov

Software Engineer, Netflix
Software engineer specializing in information retrieval. Currently on search team at Netflix, previously, worked at Lucidworks, Cengage Learning on various search systems.


Thursday October 13, 2016 3:10pm - 3:50pm
Back Bay B Sheraton Boston

3:10pm

Solr at the Core of a Search Engine for the Cuban Web
In this talk we'll cover the transition of Solr from "just the inverted index for search" into the core technology of a web search engine for the Cuban Web. The main purpose is to show how some of the more common features of today's web search engines could be fulfilled by the use of Apache Solr, which makes Solr the hearth of our system. Integration with several Apache projects will be covered and how these systems work together to build a basic web search engine, an image search engine and a real time news search engine with alert capabilities, all powered by the features offered by Solr. Also the use of Solr itself to help monitor and run the different components of the system will be discussed.

Speakers
avatar for Jorge Luis Betancourt Gonzalez

Jorge Luis Betancourt Gonzalez

Developer, University of Informatic Sciences
Software Engineer with more than 5 years of experience using Java. Working with search engines for over 3 years, specially Apache Solr. Have done some consultancy work in the field of Web Crawling and NLP/Text Processing. Currently building a search engine for the Cuban Web. Interested in information retrieval, Solr/Elasticsearch/Lucene, relevance tunning and web mining.


Thursday October 13, 2016 3:10pm - 3:50pm
Back Bay A Sheraton Boston

3:50pm

Happy Hour
Thursday October 13, 2016 3:50pm - 4:30pm
TBA

4:30pm

Stump the Chump
Got a tough problem with your Solr based application? Facing challenges that you'd like some advice on? Looking for new approaches to overcome a Lucene/Solr issue? Not sure how to get the results you expected? Don't know where to get started? Then this session is for you.

Now, you can get your questions answered live, in front of an audience of hundreds of Lucene/Solr Revolution attendees!  Back again by popular demand, "Stump the Chump" at Lucene Revolution 2016 finds Chris "Hoss" Hostetter in the hot seat once again to tackle questions live.

Questions can be submitted via stump@lucenerevolution.org at any time prior until October 12 - even if you won’t be able to attend the conference.  Please describe in detail the challenge you have faced and possible approach you have taken to solve the problem. Anything related to Solr/Lucene is fair game.

Our moderator, Cassandra Targett, will read the questions, and Hoss will have to formulate a solution on the spot. A panel of judges will decide if he has provided an effective answer. Prizes will be awarded by the panel for the best question - and for those deemed to have "Stumped the Chump".

Moderators
avatar for Cassandra Targett

Cassandra Targett

Director of Engineering, Lucidworks
Cassandra has nearly 20 years experience in search and knowledge management and became a Lucene/Solr committer in 2013 and a member of the PMC in 2016. As Director of Engineering at Lucidworks, she manages partner and open source development.

Speakers
avatar for Chris Hostetter

Chris Hostetter

Software Engineer, Lucidworks
Chris 'Hoss' Hostetter is a Member of the Apache Software Foundation, and a committer on the Lucene/Solr Project. Prior to joining Lucidworks in 2010 to work full time on Solr development, he spent 11 years as a Principal Software Engineer for CNET Networks thinking about searching "structured data" that was never as structured as it should have been. Hoss has presented on Apache Solr numerous times over the last 10 years including local MeetUps... Read More →


Thursday October 13, 2016 4:30pm - 5:30pm
Grand Ballroom Sheraton Boston

6:00pm

Conference Party
Experience breathtaking 360-degree panoramic views of the Greater Boston area and beyond accompanied by drinks, appetizers, games, music, and of course great conversations with conference attendees, speakers, sponsors, and Solr committers.

Thursday October 13, 2016 6:00pm - 8:30pm
Skywalk Observatory
 
Friday, October 14
 

9:00am

Day 2 Opening Remarks
Speakers
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant is the CTO and co-founder of Lucidworks, co-author of Taming Text, co-founder of Apache Mahout and a long-standing committer on the Apache Lucene and Solr open source projects. Grant’s experience includes engineering a variety of search, question answering, and natural language processing applications for a variety of domains and languages. He earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer... Read More →


Friday October 14, 2016 9:00am - 9:10am
Grand Ballroom Sheraton Boston

9:10am

Using Solr to Activate Data
Speakers
avatar for Rajiv Kottomtharayil

Rajiv Kottomtharayil

Head of Development, Commvault
Rajiv is the vice president of Commvault Engineering. He has been with Commvault architecting the  Commvault Data Management product.


Friday October 14, 2016 9:10am - 9:40am
Grand Ballroom Sheraton Boston

9:30am

Challenges and Thrills of Enterprise Search, Cathy Polinsky, Salesforce
You may understand the complexity of web search, social networking search or database search but what if you had to do all three? And not for just one company but for hundreds of thousands of companies? Welcome to the world of enterprise search. From multi-tenancy requirements, complex permission models, real-time search expectations, and ranking models challenges, learn about the scale and deep technical challenges that Salesforce is tackling to build a world-class enterprise search architecture on top of Solr.

Speakers
avatar for Cathy Polinsky

Cathy Polinsky

SVP of Engineering for Search, Salesforce
Cathy Polinsky is SVP of Engineering for Search at Salesforce. She is responsible for the large scale search infrastructure that powers the world’s #1 CRM company. Polinsky joined Salesforce in 2009 and brings more than a dozen years of advertising technology, platform and search engineering experience in consumer and enterprise software. Prior to Salesforce, Polinsky held positions at Yahoo!, Oracle, and Amazon. | | Cathy Polinsky is... Read More →


Friday October 14, 2016 9:30am - 10:15am
Grand Ballroom Sheraton Boston

10:15am

Break
Friday October 14, 2016 10:15am - 10:30am
TBA

10:30am

Using a Query Classifier to dynamically boost Solr ranking
About 40% of our queries are ambiguous, which can result with products from many categories. For example, the query "red apple" can match the following products:

1. red apple ipod (electronic category)
2. red apple fruit ( fresh produce )
3. red apple iphone case ( accessories)

It is desirable to have a classifier to instruct Solr to boost items from the desire category. In addition, for a search engine with small index, a good percentage of the queries may have little or no results. Is it possible to use the classifier to solve both problems?

This talk discusses a classifier built from behavior data which can dynamically re-classify the query to solve both problems.

Speakers
avatar for Howard Wan

Howard Wan

Dev Manager, Target
How is a Dev manager of the Query Intent team at Target. His prior experience included Ebay and Microsoft search engine development. He has several search patent application for all three companies. His passion is search relevance development and monetization prediction models.


Friday October 14, 2016 10:30am - 11:10am
Commonwealth Sheraton Boston

10:30am

Building a Search UI with Lucidworks View
Learn how to create a compelling branded search application with Lucidworks View. This session will be a tutorial on how to create a simple search UI, including the challenges that may be encountered.

Speakers
avatar for Josh Ellinger

Josh Ellinger

Senior UX Engineer, Lucidworks
Josh is a Senior UX Engineer at Lucidworks, he also works on Lucidworks Fusion. In the past he has worked on complex problems such as building a tool to fact check the internet.


Friday October 14, 2016 10:30am - 11:10am
Gardner

10:30am

Solr JDBC
One of the new features of Solr 6 is a JDBC driver that can be hooked up to various SQL clients and database visualization tools. Solr JDBC opens up a whole new set of use cases and lowers the barrier to entry for many users. I'll highlight the Solr JDBC feature, explain some use cases, and demonstrate connecting SQL clients. You will learn how Solr JDBC can unlock more potential from your Solr environment.

Speakers
avatar for Kevin Risden

Kevin Risden

Apache Lucene/Solr Committer; Hadoop and Search Tech Lead, Avalon Consulting, LLC
Kevin Risden, an Apache Lucene/Solr committer, has been consulting on search and Hadoop for over 3 years at Avalon Consulting, LLC. He has helped organizations successfully transform their big data into business results.


Friday October 14, 2016 10:30am - 11:10am
Back Bay B Sheraton Boston

10:30am

Building and running a Solr-as-a-Service for IBM Watson
Running a managed Solr service brings fun challenges with it, to both the users and the service itself. Users typically do not have access to all components of the Solr system (e.g. the ZK ensemble, the actual nodes that Solr runs on etc.). On the other hand the service must ensure high-availability at all times, and handle what is often user-driven tasks such as version upgrades, taking nodes offline for maintenance and more. In this talk I will describe how we tackle these challenges to build a managed Solr service on the cloud, which currently hosts few thousands of Solr clusters. I will focus on the infrastructure that we chose to run the Solr clusters on, as well how we ensure high-availability, cluster balancing and version upgrades.

Speakers
avatar for Shai Erera

Shai Erera

STSM, Social Analytics & Technologies, IBM
Shai Erera is a Researcher at IBM Research, Haifa, Israel. Shai earned his M.Sc in Computer Science from the University of Haifa in 2007. Shai’s work experience includes the development of search-based systems over Lucene and Solr and he is also a Lucene/Solr committer.


Friday October 14, 2016 10:30am - 11:10am
Back Bay A Sheraton Boston

11:20am

Working with deeply nested documents in SolrCloud
Until recently, Solr did not support deeply nested documents, but that has changed over the past few releases. While still not a popular use-case, Solr can now be used to handle deeply nested documents to perform search and faceting on them, like nested email threads, comments and replies on social media etc.

This talk would cover pointers around pre-processing of data so that it can not only be consumed by Solr but also make it possible to perform complex search and statistical aggregations on top of it. It would also cover query formation for sample use cases of nested data and multiple options and features that Solr provides for faceting or aggregation of the documents. By the end of this talk, Solr users would have a better understanding of both the features that Solr provides and how to work with them to find answers to interesting questions from deeply nested documents and the limitations that currently exist and how to work around to accomplish tasks indirectly.

Speakers
avatar for Anshum Gupta

Anshum Gupta

Sr. Software Engineer, IBM Watson
Anshum Gupta is a Lucene/Solr committer and PMC member with over 10 years of experience with search. He is a part of the search team at IBM Watson, where he works on extending the limits and improving SolrCloud. Prior to this, he was a part of the open source team at Lucidworks and also the co-creator of AWS CloudSearch - the first search as a service offering by AWS. He has spoken at multiple international conferences, including Apache Big Data... Read More →
avatar for Alisa Zhila

Alisa Zhila

Software Engineer, IBM Watson
Alisa Zhila is a software engineer in IBM Watson Core Technology. She has graduated from Moscow Institute of Physics and Technology, Russia, and received PhD in Computer Science from National Polytechnic Institute, Mexico. In IBM she has been working on transition of linguistically enriched text data storage from a proprietary format to Solr format. For her PhD, she worked on open information extraction from text and analysis of the extracted... Read More →


Friday October 14, 2016 11:20am - 12:00pm
Commonwealth Sheraton Boston

11:20am

Time Series Processing with Solr and Spark
A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.

We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.

Chronix is open source under the Apache License (http://chronix.io).

Speakers
avatar for Josef Adersberger

Josef Adersberger

CTO, QAware GmbH
Josef Adersberger has been a software engineering fanatic for over 10 years. He studied computer science in Rosenheim and Munich and holds a doctoral degree in software engineering. He's the founder and CTO of QAware, a German software development company, and is a lecturer at several German universities. His main area of interest is cloud computing.


Friday October 14, 2016 11:20am - 12:00pm
Gardner

11:20am

The Evolution of Lucene & Solr Numerics from Strings to Points
Numeric representations in Lucene and Solr have evolved to be more efficient, performant, and capable as greater demands are made on indexes, from sorting, filtering and faceting to complex geo-spatial queries. In this presentation I’ll show how this evolution has impacted performance and resource usage, thereby enabling use cases that previously were better suited to other software. The major focus will be on the multi-dimensional Points field support added in Lucene 6.0.

Speakers
avatar for Steve Rowe

Steve Rowe

Senior Software Engineer, Lucidworks
Steve is a Lucene/Solr committer and works on Lucene and Solr at Lucidworks. Steve is also a committer on the JFlex lexical scanner generator project.


Friday October 14, 2016 11:20am - 12:00pm
Back Bay B Sheraton Boston

11:20am

Building a Vibrant Search Ecosystem at Bloomberg
Search is a core technology that allows Bloomberg to deliver financial news and information quickly and reliably to our clients. The Search Infrastructure team has created a high performance, stable and scalable search ecosystem to support a large, complex and diverse set of search applications.

Providing search as a service to the thousands of developers in this demanding environment required us to take a holistic approach. In this talk we'll discuss both the organizational and technical challenges we've encountered and the approach we've taken to solve them. We'll dive into the details of our platform; from the way we engage with our tenants, interact with the Solr community, to the infrastructure and tools we use to manage, monitor and scale our platform.

Speakers
avatar for Steven Bower

Steven Bower

Team Lead, Search Infrastructure, Bloomberg LP
Steven has worked for 16 years in the search industry. First as part of the R&D/Services teams at FAST Search & Transfer and then as a principal engineer at Attivio, Inc. He has participated/lead the delivery of hundreds of search applications and now leads the Search Infrastructure team at Bloomberg LP, providing a search as a service platform for 200+ applications.
avatar for Ken LaPorte

Ken LaPorte

Senior Software Engineer, Search Infrastructure, Bloomberg LP
Ken is a senior software engineer in the Search Infrastructure department at Bloomberg where he works with client teams to leverage Solr to solve business problems. Ken has been active in the search domain for 7 years and has worked on a wide variety of  search problems, including e-commerce, geospatial, analytical & free text. He currently resides in Brooklyn with his wife & son where he can be found constantly working on his... Read More →


Friday October 14, 2016 11:20am - 12:00pm
Independence Sheraton Boston

11:20am

Challenges of e-commerce product search and the case study of the Home Depot enterprise search
In this talk, three independent but connected topics are discussed. On top of the baseline Solr and Fusion engine, how the Home Depot extends Solr to solve their unique business rule requirements, implement intelligent type-ahead system, and tune search relevancy in a scientific way instead of using black magic. Along with many other unique solutions, The Home Depot established their enterprise search engine using Solr.

The Home Depot used to be one of the biggest Endeca customers. While developing extensions to the Endeca system, this proprietary system become more and more a barrier to the ever increasing business demands. Solr and Fusion offered flexibility and a certain amount of out-of-box functionalities, but the Home Depot search and personalization team still needed to extend the platform to come up with a comprehensive solution to meet their technical requirements. This talk provides a case study in hoping the audiences can benefit from the material to expand their search engine to the next level.

Speakers
avatar for Senthil Murugan

Senthil Murugan

Principal Consultant, Mindtree Ltd.
Senthil Murugan received B.Tech degree from Anna University, India in 2005 and currently designated as Principal Consultant at Mindtree Ltd. Senthil is currently working with HomeDepot's Search and Personalization team as a Contractor for Enterprise Search Implementation project. Senthil's interest are around building B2B/B2C ecommerce applications, micro services development and Endeca/SOLR search engine integration.
avatar for Rongkai (Alfred) Zhao

Rongkai (Alfred) Zhao

IT Architect, The Home Depot
Rongkai (Alfred) Zhao received Ph.D in computer science from University of Illinois at Urbana-Champaign in 2005 and currently is an architect for The Home Depot. Rongkai works in the search and personalization team, his responsibility is to provide architectural solution and lead data science effort in the team. Rongkai’s research interest is information retrieval, deep learning, natural language processing and computer vision.


Friday October 14, 2016 11:20am - 12:00pm
Back Bay A Sheraton Boston

12:00pm

Lunch
Friday October 14, 2016 12:00pm - 1:00pm
TBA

1:10pm

Large Scale Solr at FullStory
Come see how we're using Solr to make search FullStory's central feature. Learn about some of the problems we've run into scaling up a large Solr cluster at FullStory, and how we've solved them. And finally, I'll briefly introduce Solrman, the open source service we've released that monitors a Solr cluster and automatically optimizes how data is distributed across a Solr cluster.

Speakers
avatar for Scott Blum

Scott Blum

Staff Software Engineer, FullStory, Inc.
Scott Blum is a committer on Apache Solr and Apache Curator. His background includes compiler work on Google Web Toolkit and distributed systems experience at Square and most recently FullStory.


Friday October 14, 2016 1:10pm - 1:40pm
Back Bay A Sheraton Boston

1:10pm

Combining Content and Collaboration in Recommenders
Recommender Systems are typically built on two different types of training data: historical user-engagement, and the textual content of the items themselves (either descriptive text, tags, structured metadata, or the actual raw content of text items on their own). This talk is an introductory overview of how to build a recommender system which uses both types of inputs to build a “mixed-mode” recommender, where you can parameterize (at request time, in some cases!) how much you want to rely on content, and how much on collaborative filtering. We’ll walk through building a horizontally scalable parameterized recommender service from just three components: Solr, Spark, and of course: training data.

Speakers
avatar for Jake Mannix

Jake Mannix

Lead Data Engineer, Lucidworks
Living at the intersection of IR and applied ML, Jake likes to build “data-driven products” like personalized search engines, recommender systems, and relevance-enhancing substrate like user-interest classifiers. Currently the lead data engineer at Lucidworks, doing relevance R&D in the office of the CTO, Jake previously was a tech lead and founding member of the user interest modeling and account search teams at Twitter, and... Read More →


Friday October 14, 2016 1:10pm - 1:50pm
Commonwealth Sheraton Boston

1:10pm

Build a great application in minutes
Talk about the importance of user experience in the context of search applications. Why UX should not be left till last. How use of search technologies like Solr has undergone a paradigm shift from simple keyword search to advanced analytics and discovery. Show how an application that works on mobile devices and meets these requirements can be built in minutes.

Speakers
avatar for Stefan Olafsson

Stefan Olafsson

CEO, Twigkit
Stefan is the founder and CEO at Twigkit which is actively changing the game for its partners allowing beautiful, user centric search-driven applications to be built, in a fraction of the time of traditional methods. Specialising in search applications, he has been creating and bringing to market enterprise-scale search software products for over 15 years. Prior to Twigkit, he held the position of Principal Engineer at FAST (now a Microsoft... Read More →


Friday October 14, 2016 1:10pm - 1:50pm
Gardner

1:10pm

Customizing Ranking Models in Solr to improve relevance for Enterprise Search
Solr provides a suite of built-in capabilities that offers a wide variety of relevance related parameter tuning. Index and/or query time boosts along with function queries can provide a great way to tweak various relevance related parameters to help improve the search results ranking. In the enterprise space however, given the diversity of customers and documents, there is a much greater need to be able to have more control over the ranking models and be able to run multiple custom ranking models.

At Salesforce, we have a multi-level ranking pipeline, first ranker (L1), is the basic lucene scoring based on tf-idf and the second ranker (L2), implements more complex ranking models ranging from something as trivial as a linear regression to the more complex models such as a boosted decision tree. This L2 ranker inside Solr enables us to extract features for every document from within the Solr Index and leverage them during ranking model execution. This talk discusses the motivation behind creating an L2 ranker and the use of Solr Search Component for running different types of ranking models.

Speakers
avatar for Ammar Harris

Ammar Harris

Lead Member Technical Staff, Search Relevance, Salesforce
Ammar is a member of the Search Relevance team at Salesforce for over two years. He has been working with the team to build out a new framework for Salesforce Search that would enable the team to train\test, experiment and ship multiple relevance models to production. Prior to joining Salesforce he worked for about six years in Microsoft, primarily in the BingAds and Xbox Games Studio teams.
avatar for Joe Zeimen

Joe Zeimen

Senior Member of Technical Staff, Salesforce
Joe currently works on the Search Relevance team at Salesforce. He is helping to build out a new framework for Salesforce Search that enables the team to train, test and ship different relevance models to production. Prior to joining Salesforce over 3 years ago he earned his BS/MS in Computer Science at the Colorado School of Mines.


Friday October 14, 2016 1:10pm - 1:50pm
Back Bay B Sheraton Boston

1:10pm

HHypermap: Heatmap Analytics of a Billion Tweets
The Harvard Center for Geographic Analysis has established the HHypermap (Harvard Hypermap) system, comprised of multiple open-source projects aimed at searching vast amounts of spatial data. This talk centers on a system based on SolrCloud that can do realtime search on a billion Twitter tweets with heatmap analytics of sentiment analysis. The open-source system is designed to be suitable for social media data sets or sensor data.

Harvard CGA commissioned Apache Lucene/Solr's heatmap faceting capability in 2015 and this work now continues in 2016. The first new part is computing numeric stats per cell (not just doc counts), which can be used for a variety of applications. The second part is improving Lucene's grid cell indexing scheme to cater to heatmaps, thus allowing heatmap generation to be very fast for large data sets.

This talk discusses the system design/architecture as well as the spatial details on how Lucene/Solr was improved.

Speakers
avatar for David Smiley

David Smiley

Search Developer & Consultant, D W Smiley LLC
David Smiley is a well recognized Apache Lucene/Solr expert. He wrote the first book on Solr (currently in 3rd edition), he's a Lucene/Solr committer and PMC member that improves Lucene and Solr, he speaks at conferences about it, he does training, and he offers part time independent consulting services / development. Much of Lucene's spatial-extra's module was developed by David.


Friday October 14, 2016 1:10pm - 1:50pm
Independence Sheraton Boston

2:00pm

Reflected Intelligence: Lucene/Solr as a self-learning data system
What if your search engine could automatically tune its own domain-specific relevancy model? What if it could learn the important phrases and topics within your domain, automatically identify alternate spellings (synonyms, acronyms, and related phrases) and disambiguate multiple meanings of those phrases, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain?

In this presentation, you’ll learn you how to do just that - to evolving Lucene/Solr implementations into self-learning data systems which are able to accept user queries, deliver relevance-ranked results, and automatically learn from your users’ subsequent interactions to continually deliver a more relevant experience for each keyword, category, and group of users.

Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the relevance signals present in the collective feedback from every prior user interaction with the system. Come learn how to move beyond manual relevancy tuning and toward a closed-loop system leveraging both the embedded meaning within your content and the wisdom of the crowds to automatically generate search relevancy algorithms optimized for your domain.

Speakers
avatar for Trey Grainger

Trey Grainger

SVP of Engineering, Lucidworks
Trey is the SVP of Engineering at Lucidworks, where he leads their engineering efforts around both Apache Lucene/Solr, as well as Lucidwork’s commercial product offerings. Trey is also the co-author of the book Solr in Action, as well as a published researcher and frequent public speaker on topics related to search, analytics, recommendation systems, and natural language processing. Trey previously served as Director of Engineering at... Read More →


Friday October 14, 2016 2:00pm - 2:40pm
Independence Sheraton Boston

2:00pm

How to run Solr on Docker. And why.
Docker is all the rage these days. While one doesn't hear much about Solr on Docker, we're here to tell you not only that it can be done, but also share how it's done.

We'll quickly go over the basic Docker ideas - containers are lighter than VMs, they solve "but it worked on my laptop" issues - so we can dive into the specifics of running Solr on Docker.

We'll do a live demo showing you how to run Solr master - slave as well as SolrCloud using containers, how to manage CPU assignments, constraint memory and use Docker data volumes when running Solr in containers. We will also show you how to create your own containers with custom configurations.

Finally, we'll address one of the core Solr questions - which deployment type should I use? We will demonstrate performance differences between the following deployment types:

- Single Solr instance running on a bare metal machine
- Multiple Solr instances running on a single bare metal machine
- Solr running in containers
- Solr running on virtual machine
- Solr running on virtual machine using unikernel

For each deployment type we'll address how it impacts performance, operational flexibility and all other key pros and cons you ought to keep in mind.

Speakers
avatar for Radu Gheorghe

Radu Gheorghe

Software Engineer, Sematext Group
Search consultant and software engineer at Sematext. On the consulting side, working mainly with Solr, Elasticsearch and logging-related projects. Engineering work goes mostly to Logsene, our logging SaaS. Authored the Working with Elasticsearch video course and co-authored Elasticsearch in Action.
avatar for Rafał Kuć

Rafał Kuć

Software Engineer, Sematext Group
Rafał, in his professional life is a Sematext trainer, consultant and a software engineer, http://solr.pl co-founder and the Solr Cookbook and Elasticsearch Server books author. In his personal life Rafał is a father and a husband.


Friday October 14, 2016 2:00pm - 2:40pm
Commonwealth Sheraton Boston

2:00pm

Participating in the Community: Beyond Code
So, you've learned a bit about how to use Solr, just enough to be dangerous (as they say), but you're not really a Java programmer. Can you still participate in the community? Absolutely!

Much of the work we see going on in the Solr community is in the form of patches to the core codebase (or in contribs!), but any good open source community is larger than that. Good documentation, a professional website, a solid UI, and a strong base of users who help each other are all factors in a robust community.

Speakers
avatar for Cassandra Targett

Cassandra Targett

Director of Engineering, Lucidworks
Cassandra has nearly 20 years experience in search and knowledge management and became a Lucene/Solr committer in 2013 and a member of the PMC in 2016. As Director of Engineering at Lucidworks, she manages partner and open source development.


Friday October 14, 2016 2:00pm - 2:40pm
Back Bay B Sheraton Boston

2:00pm

Coffee, Danish & Search: How to build a Solr-powered news search engine
We'll show how we have worked with Denmark's leading media analysis company on a successful project to migrate their entire search framework from Autonomy IDOL & Verity to one based on Solr Cloud and our own Luwak stored search library, itself based on Lucene. We'll describe how we helped the client translate thousands of existing queries to their own query language; enhanced Solr wildcard search performance; built custom highlighting; extended Solr logging, and developed a framework to handle multiple languages (including one spoken by only 66,000 people). We'll show how the migration achieved practically zero negative change in precision/recall and how the continuing partnership with our client enables further feature development as necessary.

Speakers
avatar for Charlie Hull

Charlie Hull

Managing Director, Flax
Charlie is the co-founder of Flax, the UK's leading specialists in open source search. The Flax team have decades of experience in delivering accurate, fast and scalable solutions to a wide range of UK and international clients. Charlie runs the London Lucene/Solr Meetup, is known as a regular commentator on the search ecosystem and regularly speaks at search conferences worldwide.
avatar for Alan Woodward

Alan Woodward

Director, Flax
Alan Woodward worked for many years at Proquest on a large scale multinode installation of the FAST ESP search engine, and gained skills in managing search applications over hundreds of millions of documents. Alan is a Lucene/Solr Committer. | | At Flax Alan has worked on the development of new search technology for a leading UK newspaper publisher and in media monitoring applications. Alan has experience with FAST ESP, Apache Lucene/Solr... Read More →


Friday October 14, 2016 2:00pm - 2:40pm
Back Bay A Sheraton Boston

2:50pm

Your Big Data Stack is Too Big
While technologies such as Spark, Hadoop, and Solr have come a long way over the past couple of years, companies continue to struggle to convert all this innovation into successful business outcomes. Too often, big data projects run over budget and fail to deliver ROI. Instead, companies are left with a bloated stack of complex technologies that are cumbersome to maintain and are slow to adapt to new business requirements. Once the consultants have left the building, the big data platform fails to keep up with demands for better access to larger and more complex enterprise data sets.

In this talk, Tim presents a better way to go about big data analytics using Lucidworks Fusion. Attendees will come away with actionable insights to solving common big data problems such as scaling data ingest from any source, providing both full-text search and SQL query capabilities for the same data set, and leveraging machine learning. The goal of this talk is to parse through the hype of big data and show how a lean, tightly integrated stack built on Solr and Spark provides all you need to do big data right.

Speakers
avatar for Timothy Potter

Timothy Potter

Senior Software Engineer, Lucidworks
Timothy Potter is a senior member of the engineering team at Lucidworks and PMC member of the Apache Lucene/Solr project. At Lucidworks, Tim leads a team that builds tools to empower business analysts and data scientists to search, analyze, and visualize large-scale enterprise data sets using Fusion. Tim is the original designer of the Spark-Solr open source project and actively contributes to a number of open source projects for integrating Solr... Read More →


Friday October 14, 2016 2:50pm - 3:30pm
Back Bay B Sheraton Boston

2:50pm

Rebuilding Solr 6 Examples - layer by layer
Did you know that Solr 6 ships with 10! different examples? Do you know where they are and how to best learn from each one of them? This session will do a quick yet thorough explanation of all the shipped examples. Then, we will take one of the examples and rebuild it from a basic schema up, so you understand how it ticks. At the end, you will know how to read the examples that ship with Solr and how to transfer relevant parts of those examples to your own schema and configuration.

Speakers
avatar for Alexandre Rafalovitch

Alexandre Rafalovitch

Founder, Search Stack Solutions
Alexandre is a full-stack IT specialist with more than 20 years of industry and non-profit experience, including in Java, C# and HTML/CSS/JavaScript. He develops projects on Windows, Mac and Linux. His current focus is a consultancy specialized on popularizing Apache Solr. Alex has written one book about Solr already (Apache Solr for Indexing Data How-to). He has presented at Lucene/Solr Revolution 2014 and 2015, as well as multiple times at... Read More →


Friday October 14, 2016 2:50pm - 3:30pm
Gardner

2:50pm

Solr Graph Query
This is an overview of the new Solr Graph Query. We will discuss the semantics of this new query operator in Lucene/Solr and how it can be used to solve real world knowledge graph problems. We will discuss how to handle data that is a graph in nature and cover items such as social networking search, recommendation engines, security filtering, and how to use knowledge graphs and ontologies to draw conclusions, all using the Solr Graph query.

Speakers
avatar for Kevin Watters

Kevin Watters

Founder, KMW Technology
Kevin Watters has been working with search engines since 2002 where he was a Senior Solution Architect at FAST Search & Transfer. Since 2010, Kevin has been running KMW Technology, a professional services organization based in Boston that provides search and big data consulting services. Kevin is the author of the Solr Graph query and he also contributes to an open source robotics platform called MyRobotLab.


Friday October 14, 2016 2:50pm - 3:30pm
Back Bay A Sheraton Boston

2:50pm

The Path to Universal Search
Too many search boxes? You don’t remember which one to use? You know the document you are looking for exists, but you just can’t find it. Come join us to get an understanding on how Lucidworks Fusion helped Allstate traverse disparate data sources and consolidate scattered search boxes to create a better user experience.

Speakers
avatar for Nery Encarnacion Padro

Nery Encarnacion Padro

Lead Systems Analyst (search jedi), Allstate Insurance Company
Lead Systems Analyst at Allstate Insurance Company with 11 years of experience in Enterprise Search and related technologies.  I’m a photographer aficionado and avid cyclist when I’m not behind a keyboard.
avatar for Sean Rasmussen

Sean Rasmussen

Systems Analyst, Allstate Insurance Company
Systems Analyst and Search Engineer at Allstate Insurance Company. I have four years in enterprise search experience, with two of those working with Solr.  I’m also a father, husband and all around technologist!


Friday October 14, 2016 2:50pm - 3:30pm
Commonwealth Sheraton Boston

2:50pm

Using Apache Solr for Images As Big Data: A Case Study
Images as big data' is an especially interesting topic in the era of high-performance systems based on Solr, Hadoop, and Apache Spark. Machine learning and image analysis packages are readily available to apply to this problem, and high quality industrial applications may be built from off-the-shelf third party components. In this talk, we will discuss a case study based on an 'image as big data' analytical system --- the Image as Big Data Toolkit (IABDT). IABDT uses Lucene, distributed Lucene, Solr, and Hadoop as key component technologies. We will present examples of IABDT in action, using Solr as a key search technology in its implementation, show a medical image case study, and discuss future work and extensions of the IABDT system.

Speakers
avatar for Kerry Koitzsch

Kerry Koitzsch

Project Lead / Principal Software Engineer, Kildane Software Technologies Inc.
Kerry Koitzsch has had more than twenty years of experience in the computer science, image processing, and software engineering fields, and has worked extensively with Apache Lucene and Solr technologies in particular. Kerry specializes in software consulting involving customized big data applications including distributed search, image analysis, stereo vision, and intelligent image retrieval systems. Kerry currently works for Kildane Software... Read More →


Friday October 14, 2016 2:50pm - 3:30pm
Independence Sheraton Boston

3:45pm

Closing Keynote
Speakers
avatar for Grant Ingersoll

Grant Ingersoll

CTO, Lucidworks
Grant is the CTO and co-founder of Lucidworks, co-author of Taming Text, co-founder of Apache Mahout and a long-standing committer on the Apache Lucene and Solr open source projects. Grant’s experience includes engineering a variety of search, question answering, and natural language processing applications for a variety of domains and languages. He earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer... Read More →


Friday October 14, 2016 3:45pm - 4:30pm
Grand Ballroom Sheraton Boston