The search box has become the de-facto “user interface” for interacting with data in the modern era. Virtually every website, app, and modern software interface relies upon or could benefit heavily from relevant search.
Come learn how to add automatically-learning and highly-relevant search to your applications from Trey Grainger, author of the books AI-Powered Search and Solr in Action and recognized industry expert in building intelligent search applications. You’ll learn how to use the open source Apache Lucene/Solr ecosystem to implement AI-powered search, though most the techniques learned in this workshop will also be applicable to other modern core search engine (Elasticsearch, OpenSearch, Vespa, etc.)
This workshop assumes no prior knowledge of Lucene/Solr (we’ll provide an overview in the morning session), and we will approach learning from a capabilities and relevancy standpoint that is ideally suited for data scientists and application developers looking to implement intelligent features in their software (keyword matching, machine-learned ranking/learning to rank, semantic knowledge graphics, word embeddings and dense vector search, personalized search, recommendations, semantic search, etc.). More technical product managers interested in these capabilities will also find this workshop
Introduction to Search
○ Search and the Inverted Index
○ Text-based relevancy ranking
Search Engine Frameworks: Apache Lucene, Apache Solr, OpenSearch, Vespa, vector databases, and commercial options.
○ Overview of key search engine features
Getting started with Solr
○ Installing and Running Solr
○ Indexing Data into Solr
○ Analyzers, Tokenization, and Token Filters
○ Natural Language Processing and Handling Multilingual Content
○ Lucene/Solr Query Syntax
○ Keyword, Boolean, Phrase, Range, and Proximity Queries
○ User Queries vs. Filter Queries
○ Specialized query needs (autocomplete, geospatial queries, etc.)
○ Faceted Search
○ Sparse vs. Dense Vector Search
○ Text Similarity Scoring with Cosine Similarity, TF-IDF, and BM25
○ Function Queries
○ Complex Ranking Functions
○ Domain and User-specific Ranking Functions
Balancing the Dimensions of User Intent
○ Content, User, and Domain Relevance
○ Semantic Search
○ Personalized Search
○ Knowledge Graphs
Signals Boosting (Popularized Relevance)
○ Normalizing signals
○ Fighting Signal Spam
○ Combining multiple signal types
○ Time decays and short-lived signals
○ index-time vs. query-time boosting
Semantic Search (content-based)
○ NLP for Search
○ Semantic Knowledge Graphs
○ Content-based Recommendations
○ Concept Expansion and Disambiguation (term embeddings)
Semantic Search (user-signals-based)
○ Spelling corrections
○ Phrase detection
○ Synonym detection
○ Semantic Query Parsing and Entity Extraction
○ Natural Language Search with Knowledge Graphs
Learning to Rank Intro
○ Generating judgements
○ Implicit Judgements from user signals
○ Training a model
○ Deploying a model
○ Overcoming Bias
Thought Vectors and Word Embeddings
○ Working with Embeddings
○ Implementing Dense Vector Scoring
○ Quantized Vectors and Hash Functions
○ Using Deep-learning-based Transformers/Encoders
○ Question / Answering as a search problem
○ Multi-modal search
Q & A for other topics relevant to attendees
Who should Attend?
Data Scientists, Software Developers, and Product Managers interested in learning about information retrieval and how ML and AI can drive contextual, domain-aware, personalized results to users of search-driven applications
People looking for an introduction to Apache Solr, but with a focus on the relevance side of search as opposed to the operational side.
Attendees will not be required to write code, but the workshop will contain labs that the instructor will walk through while attendees follow along. Programming knowledge, as well as familiarity with REST and HTTP, are needed if you want to go beyond the concepts and get hands-on experience with the labs yourself. The labs will primarily be implemented in Python in Jupyter Notebooks in order to appeal to both software developers and data scientists and to enable all attendees to easily follow along.
Who should NOT Attend?
Those looking for a detailed training on DevOps with Lucene/Solr or a deep dive into Lucene/Solr under the hood. This course is instead primarily focused on relevant search and automating the process of creating relevant search though machine-learning and feedback loops.
About the Instructor
Trey Grainger is the CTO at Presearch, where he leads the development of their decentralized, Web3 search engine. He is also the Founder of Searchkernel, a search consulting company focused on building the next generation of search algorithms and for intelligent search.
He is an author of AI-Powered Search and Solr in Action, plus more than a dozen additional books, journal articles, and research publications covering industry-leading approaches to semantic search, recommendation systems, and intelligent information retrieval systems. Trey received his Masters in Management of Technology from Georgia Tech, studied Computer Science, Business, and Philosophy at Furman University, and studied Information Retrieval and Web Search at Stanford University.