Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Thursday, October 13 • 3:10pm - 3:50pm
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries
Autocomplete presents some challenges for search in that users' search intent must be matched from incomplete token queries. Many non-Latin character based languages have additional complications. The following are some of the examples of unique language-specific issues which must be addressed in search systems in order to support these languages:

- Japanese and Chinese multiple scripts (Hiragana, Katakana, Romaji, Zhuyin, Paoding)
- No token-delimiters for Japanese and Chinese

- Korean character composition

- Arabic spelling variations of the transliterated foreign words

I will talk about these challenges in detail, describe our approaches to solving them, and share some tools (queries testing framework) we used to help addressing these issues.

Speakers
avatar for Ivan Provalov

Ivan Provalov

Software Engineer, Netflix
Software engineer specializing in information retrieval. Currently on search team at Netflix, previously, worked at Lucidworks, Cengage Learning on various search systems.


Thursday October 13, 2016 3:10pm - 3:50pm
Back Bay B Sheraton Boston