AI-Powered Search

By Trey Grainger

(author of AI-Powered Search and co-author of Solr in Action books)

Time and Location

April-18, 2020 at 9:00am - 5:00pm EDT

Cobb Galleria Center

Workshop Summary

The search box has become the de-facto “user interface” for interacting with data in the modern era. Virtually every website, app, and modern software interface relies upon or could benefit heavily from relevant search, and the Apache Lucene/Solr open source search project is the most deployed solution for delivering search across today’s top companies.
 
Come learn how to add automatically-learning and highly-relevant search to your applications from Trey Grainger, author of the books AI-Powered Search and Solr in Action and recognized industry expert in building intelligent search applications. You’ll learn how to use the Apache Lucene/Solr project to implement AI-powered search, though most the techniques learned in this workshop will also be applicable to other modern core search engine (Elasticsearch, Lucidworks Fusion, Open Distro for ES, Vespa, etc.)

This workshop assumes no prior knowledge of Lucene/Solr (we’ll provide an overview in the morning session), and we will approach learning from a capabilities and relevancy standpoint that is ideally suited for data scientists and application developers looking to implement intelligent features in their software (keyword matching, machine-learned ranking/learning to rank, word embeddings and dense vector search, personalized search, recommendations, semantic search, smart autocomplete, etc.). More technical product managers interested in these capabilities will also find this workshop quite helpful.
 

Agenda

Morning Session

● Introduction to Search

○ Search and the Inverted Index

○ Text-based relevancy ranking

○ Apache Lucene: High-performance text search engine library

○ Apache Solr: Modern, Distributed, Horizontally-Scalable, Search-first NoSQL database optimized for Information Retrieval

○ Overview of Key Features: Multilingual text analysis, Faceting & Analytics, Highlighting, Spelling Correction, Autocomplete, Sorting and Grouping, Geospatial Search, Complex Function Queries, Recommendations, Graph Queries and Traversals, Streaming Aggregations and SQL Query Support, Plug-ins, and many more
 

● Getting started with Solr

○ Installing and Running Solr

○ Overview of Admin UI, APIs, and Documentation

○ Indexing Data into Solr

○ Lab
 

● Text Analysis

○ Analyzers, Tokenization, and Token Filters

○ Natural Language Processing

○ Handling Language-specific and Multilingual Content
 

● Querying Basics

○ Lucene/Solr Query Syntax

○ Keyword, Boolean, Phrase and Proximity Queries

○ Range Queries

○ Filter Queries

○ Faceted Search

 

● Ranking Functions

○ Sparse vs. Dense Vector Search

○ Text Similarity Scoring with Cosine Similarity, TF-IDF, and BM25

○ Function Queries

○ Complex Ranking Functions

○ Domain and User-specific Ranking Functions

○ Lab 


Lunch Break ​

 

Afternoon Session

● Balancing the Dimensions of User Intent

○ Content, User, and Domain Relevance

○ Semantic Search

○ Recommendations

○ Personalized Search

○ Knowledge Graphs

 

● Reflected Intelligence

○ Capturing and using user signals for relevance tuning

○ Collaborative Filtering for Recommendations

○ Learning to Rank (Machine-learned Ranking)

○ Automated Learning to Rank with Click Models

○ Lab

 

● Semantic Search

○ NLP for Search

○ Semantic Knowledge Graphs

○ Content-based Recommendations

○ Semantic Query Parsing and Entity Extraction

○ Concept Expansion and Disambiguation (term embeddings)

○ Natural Language Search with Knowledge Graphs

○ Machine learning strategies

■ Spelling corrections

■ Phrase detection 

■ Head-tail analysis 

■ Synonym detection

○ Lab

 

● Thought Vectors and Word Embeddings

○ Working with Embeddings

○ Implementing Dense Vector Scoring

○ Quantized Vectors and Hash Functions

○ Using Bert and Deep-learning-based Encoders

○ Lab
 

● Taking AI-powered Search to Production

○ Intro to Lucidworks Fusion

○ Search Relevancy Testing: Techniques and Metrics

○ The Continuous Learning Cycle

○ Lab

 

● Question / Answer Session on topics relevant to the attendees

Who should Attend?

●    Data Scientists, Software Developers, and Product Managers interested in learning about information retrieval and how ML and AI can drive contextual, domain-aware, personalized results to users of search-driven applications
●    People looking for an introduction to Apache Lucene/Solr, the most widely used search engine technology on the planet. Note that many of the concepts in the training also apply to other search engines, such as Elasticsearch, Lucidworks Fusion, Open Distro for ES, and Vespa, but this training and associated labs will target Lucene/Solr.
●    Attendees will not be required to write code, but the workshop will contain labs that the instructor will walk through while attendees follow along. Programming knowledge, as well as familiarity with REST and HTTP, are needed if you want to go beyond the concepts and get hands-on experience with the labs yourself. The labs will primarily be implemented in Python in Jupyter Notebooks in order to appeal to both software developers and data scientists and to enable all attendees to easily follow along.

Who should NOT Attend?

  • Those looking for a detailed training on DevOps with Lucene/Solr or a deep dive into Lucene/Solr under the hood. This course is instead primarily focused on relevant search and automating the process of creating relevant search though machine-learning and feedback loops.

About the Instructor

Trey Grainger is the Chief Algorithms Officer at Lucidworks, where he drives vision and practical application of intelligent data science algorithms to power relevant search experiences for hundreds of the worlds biggest and brightest companies.

He is the author of AI-Powered Search and the co-author of Solr in Action, plus more than a dozen additional books, journal articles, and research publications covering industry-leading approaches to semantic search, recommendation systems, and intelligent information retrieval systems. Trey received his Masters in Management of Technology from Georgia Tech, studied Computer Science, Business, and Philosophy at Furman University, and studied Information Retrieval and Web Search at Stanford University.
 

  • White Facebook Icon
  • White Twitter Icon

© All rights reserved 2018, Southern Data Science, LLC

Southern Data Science Conference Logo