How AI Is Built – Details, episodes & analysis

Podcast details

Technical and general information from the podcast's RSS feed.

How AI Is Built

Nicolay Gerold

Technology

Frequency: 1 episode/7d. Total Eps: 37

How AI is Built dives into the different building blocks necessary to develop AI applications: how they work, how you can get started, and how you can master them. Build on the breakthroughs of others. Follow along, as Nicolay learns from the best data engineers, ML engineers, solution architects, and tech founders.

Site

RSS

Apple

Recent rankings

Latest chart positions across Apple Podcasts and Spotify rankings.

Apple Podcasts

🇫🇷 France - technology
03/12/2024
#95

Spotify

No recent rankings available

Shared links between episodes and podcasts

Links found in episode descriptions and other podcasts that share them.

See all

https://www.linkedin.com/search/results/all/?fetchDeterministicClustersOnly=true&amp
427 shares
https://www.linkedin.com/in/nicolay-gerold/
36 shares
https://www.linkedin.com/in/kirkmarple/
14 shares

https://twitter.com/nicolaygerold
36 shares
https://twitter.com/KirkMarple
4 shares
https://twitter.com/lancedb
3 shares

RSS feed quality and score

Technical evaluation of the podcast's RSS feed quality and structure.

See all

RSS feed quality

Good

Score global : 89%

Publication history

Monthly episode publishing history over the past years.

Year

Episodes published by month in

Latest published episodes

Recent episodes with titles, durations, and descriptions.

See all

RAG's Biggest Problems & How to Fix It (ft. Synthetic Data) | S2 E16

Season 2 · Episode 16

jeudi 28 novembre 2024 • Duration 51:26

RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation.

Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG.

To build a good RAG system, you need three things: ways to test it, methods to create training data, and plans to make it better over time. Testing starts with a set of example searches that users might make. These should include common searches that happen often, medium-rare searches, and rare searches that only happen now and then. This mix helps you measure if changes make your system better or worse.

Creating synthetic data helps make the system stronger, especially in spotting wrong answers that look right. Think of someone searching for a "gluten-free chocolate cake." A "sugar-free chocolate cake" might look like a good answer because it shares many words, but it's wrong.

These tricky examples help the system learn the difference between similar but different things.

When creating synthetic data, you need rules. The best way is to show the AI a few real examples and give it a list of topics to work with. Most teams find that using half real data and half synthetic data works best. This gives you enough variety while keeping things real.

Getting user feedback is hard with RAG. In normal search, you can see if users click on results. But with RAG, the system creates an answer from many pieces. A good answer might come from both good and bad pieces, making it hard to know which parts helped. This means you need smart ways to track which pieces of information actually helped make good answers.

One key rule: don't make things harder than they need to be. If simple keyword search (called BM25) works well enough, adding fancy AI search might not be worth the extra work.

Success with RAG comes from good testing, careful data creation, and steady improvements based on real use. It's not about using the newest AI models. It's about building good systems and processes that work reliably.

"It isn’t a magic wand you can place on your catalog and expect results you didn’t get before."

“Most of our users are enterprise users who have seen the most success in their RAG systems are the ones that very early implemented a continuous feedback mechanism.“

“If you can't tell in real time usage whether an answer is a bad answer or a right answer because the LLM just makes it look like the right answer then you only have your retrieval dataset to blame”

Saahil Ognawala:

Nicolay Gerold:

00:00 Introduction to Retrieval Augmented Generation (RAG) 00:29 Interview with Saahil Ognawala 00:52 Synthetic Data in Language Generation 01:14 Understanding the E5 Mistral Instructor Embeddings Paper 03:15 Challenges and Evolution in Synthetic Data 05:03 User Intent and Retrieval Systems 11:26 Evaluating RAG Systems 14:46 Setting Up Evaluation Frameworks 20:37 Fine-Tuning and Embedding Models 22:25 Negative and Positive Examples in Retrieval 26:10 Synthetic Data for Hard Negatives 29:20 Case Study: Marine Biology Project 29:54 Addressing Errors in Marine Biology Queries 31:28 Ensuring Query Relevance with Human Intervention 31:47 Few Shot Prompting vs Zero Shot Prompting 35:09 Balancing Synthetic and Real World Data 37:17 Improving RAG Systems with User Feedback 39:15 Future Directions for Jina and Synthetic Data 40:44 Building and Evaluating Embedding Models 41:24 Getting Started with Jina and Open Source Tools 51:25 The Importance of Hard Negatives in Embedding Models

From Ambiguous to AI-Ready: Improving Documentation Quality for RAG Systems | S2 E15

Season 2 · Episode 15

jeudi 21 novembre 2024 • Duration 46:37

Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's finding them.

Today we are talking to Max Buckley on how to find and fix these errors.

Max works at Google and has built a lot of interesting experiments with LLMs on using them to improve knowledge bases for generation.

We talk about identifying ambiguities, fixing errors, creating improvement loops in the documents and a lot more.

Some Insights:

A single ambiguous sentence can systematically corrupt an entire knowledge base's responses. Fixing these "documentation poisons" often requires minimal changes but identifying them is challenging.
Large organizations develop their own linguistic ecosystems that evolve over time. This creates unique challenges for both embedding models and retrieval systems that need to bridge external and internal vocabularies.
Multiple feedback loops are crucial - expert testing, user feedback, and system monitoring each catch different types of issues.

Max Buckley: (All opinions are his own and not of Google)

Nicolay Gerold:

00:00 Understanding LLM Hallucinations 00:02 Challenges with Temporal Inconsistencies 00:43 Issues with Document Structure and Terminology 01:05 Introduction to Retrieval Augmented Generation (RAG) 01:49 Interview with Max Buckley 02:27 Anthropic's Approach to Document Chunking 02:55 Contextualizing Chunks for Better Retrieval 06:29 Challenges in Chunking and Search 07:35 LLMs in Internal Knowledge Management 08:45 Identifying and Fixing Documentation Errors 10:58 Using LLMs for Error Detection 15:35 Improving Documentation with User Feedback 24:42 Running Processes on Retrieved Context 25:19 Challenges of Terminology Consistency 26:07 Handling Definitions and Glossaries 30:10 Addressing Context Misinterpretation 31:13 Improving Documentation Quality 36:00 Future of AI and Search Technologies 42:29 Ensuring Documentation Readiness for AI

Beyond Embeddings: The Power of Rerankers in Modern Search | S2 E6

Season 2 · Episode 6

jeudi 26 septembre 2024 • Duration 42:29

Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of rerankers, looking at how they can classify, deduplicate documents, prioritize LLM outputs, and delve into models like ColBERT.

We discuss:

The role of rerankers in retrieval pipelines
Advantages of late interaction models like ColBERT for interpretability
Training rerankers vs. embedding models and their impact on performance
Incorporating metadata and context into rerankers for enhanced relevance
Creative applications of rerankers beyond traditional search
Challenges and future directions in the retrieval space

Still not sure whether to listen? Here are some teasers:

Rerankers can significantly boost your retrieval system's performance without overhauling your existing setup.
Late interaction models like ColBERT offer greater explainability by allowing token-level comparisons between queries and documents.
Training a reranker often yields a higher impact on retrieval performance than training an embedding model.
Incorporating metadata directly into rerankers enables nuanced search results based on factors like recency and pricing.
Rerankers aren't just for search—they can be used for zero-shot classification, deduplication, and prioritizing outputs from large language models.
The future of retrieval may involve compound models capable of handling multiple modalities, offering a more unified approach to search.

Aamir Shakir:

Nicolay Gerold:

00:00 Introduction and Overview 00:25 Understanding Rerankers 01:46 Maxsim and Token-Level Embeddings 02:40 Setting Thresholds and Similarity 03:19 Guest Introduction: Aamir Shakir 03:50 Training and Using Rerankers (Episode Start) 04:50 Challenges and Solutions in Reranking 08:03 Future of Retrieval and Recommendation 26:05 Multimodal Retrieval and Reranking 38:04 Conclusion and Takeaways

Limits of Embeddings: Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It) | S2 E5

Season 2 · Episode 5

jeudi 19 septembre 2024 • Duration 46:06

Text embeddings have limitations when it comes to handling long documents and out-of-domain data.

Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted the field of dense embeddings, developed sentence transformers, started HuggingFace’s Neural Search team and now leads the development of search foundational models at Cohere. Tbh, he has too many accolades to count off here.

We talk about the main limitations of embeddings:

Failing out of domain
Struggling with long documents
Very hard to debug
Hard to find formalize what actually is similar

Are you still not sure whether to listen? Here are some teasers:

Interpreting embeddings can be challenging, and current models are not easily explainable.
Fine-tuning is necessary to adapt embeddings to specific domains, but it requires careful consideration of the data and objectives.
Re-ranking is an effective approach to handle long documents and incorporate additional factors like recency and trustworthiness.
The future of embeddings lies in addressing scalability issues and exploring new research directions.

Nils Reimers:

Nicolay Gerold:

text embeddings, limitations, long documents, interpretation, fine-tuning, re-ranking, future research

00:00 Introduction and Guest Introduction 00:43 Early Work with BERT and Argument Mining 02:24 Evolution and Innovations in Embeddings 03:39 Constructive Learning and Hard Negatives 05:17 Training and Fine-Tuning Embedding Models 12:48 Challenges and Limitations of Embeddings 18:16 Adapting Embeddings to New Domains 22:41 Handling Long Documents and Re-Ranking 31:08 Combining Embeddings with Traditional ML 45:16 Conclusion and Upcoming Episodes

RAG at Scale: The problems you will encounter and how to prevent (or fix) them | S2 E4

Season 2 · Episode 4

jeudi 12 septembre 2024 • Duration 50:09

Hey! Welcome back.

Today we look at how we can get our RAG system ready for scale.

We discuss common problems and their solutions, when you introduce more users and more requests to your system.

For this we are joined by Nirant Kasliwal, the author of fastembed.

Nirant shares practical insights on metadata extraction, evaluation strategies, and emerging technologies like Colipali. This episode is a must-listen for anyone looking to level up their RAG implementations.

"Naive RAG has a lot of problems on the retrieval end and then there's a lot of problems on how LLMs look at these data points as well."

"The first 30 to 50% of gains are relatively quick. The rest 50% takes forever."

"You do not want to give the same answer about company's history to the co-founding CEO and the intern who has just joined."

"Embedding similarity is the signal on which you want to build your entire search is just not quite complete."

Key insights:

Naive RAG often fails due to limitations of embeddings and LLMs' sensitivity to input ordering.
Query profiling and expansion:
- Use clustering and tools like latent Scope to identify problematic query types
- Expand queries offline and use parallel searches for better results
Metadata extraction:
- Extract temporal, entity, and other relevant information from queries
- Use LLMs for extraction, with checks against libraries like Stanford NLP
User personalization:
- Include user role, access privileges, and conversation history
- Adapt responses based on user expertise and readability scores
Evaluation and improvement:
- Create synthetic datasets and use real user feedback
- Employ tools like DSPY for prompt engineering
Advanced techniques:
- Query routing based on type and urgency
- Use smaller models (1-3B parameters) for easier iteration and error spotting
- Implement error handling and cross-validation for extracted metadata

Nirant Kasliwal:

Nicolay Gerold:

query understanding, AI-powered search, Lambda Mart, e-commerce ranking, networking, experts, recommendation, search

From Keywords to AI (to GAR): The Evolution of Search, Finding Search Signals | S2 E3

Season 2 · Episode 3

jeudi 5 septembre 2024 • Duration 52:16

In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, including large language models (LLMs) and semantic search, contribute to relevant search results.

Key Highlights:

Defining relevance is challenging and depends heavily on user intent and context
Combining multiple search techniques (keyword, semantic, etc.) in tiers can improve results
LLMs are emerging as a powerful tool for augmenting traditional search approaches
Operational concerns often drive architectural decisions in large-scale search systems
Underappreciated techniques like LambdaMART may see a resurgence

Key Quotes:

"There's not like a perfect measure or definition of what a relevant search result is for a given application. There are a lot of really good proxies, and a lot of really good like things, but you can't just like blindly follow the one objective, if you want to build a good search product." - Doug Turnbull

"I think 10 years ago, what people would do is they would just put everything in Solr, Elasticsearch or whatever, and they would make the query to Elasticsearch pretty complicated to rank what they wanted... What I see people doing more and more these days is that they'll use each retrieval source as like an independent piece of infrastructure." - Doug Turnbull on the evolution of search architecture

"Honestly, I feel like that's a very practical and underappreciated thing. People talk about RAG and I talk, I call this GAR - generative AI augmented retrieval, so you're making search smarter with generative AI." - Doug Turnbull on using LLMs to enhance search

"LambdaMART and gradient boosted decision trees are really powerful, especially for when you're expressing your re-ranking as some kind of structured learning problem... I feel like we'll see that and like you're seeing papers now where people are like finding new ways of making BM25 better." - Doug Turnbull on underappreciated techniques

Doug Turnbull

Nicolay Gerold:

Chapters

00:00 Introduction and Guest Introduction 00:52 Understanding Relevant Search Results 01:18 Search Behavior on Social Media 02:14 Challenges in Defining Relevance 05:12 Query Understanding and Ranking Signals 10:57 Evolution of Search Technologies 15:15 Combining Search Techniques 21:49 Leveraging LLMs and Embeddings 25:49 Operational Considerations in Search Systems 39:09 Concluding Thoughts and Future Directions

Data-driven Search Optimization, Analysing Relevance | S2 E2

Season 2 · Episode 2

vendredi 30 août 2024 • Duration 51:14

In this episode, we talk data-driven search optimizations with Charlie Hull.

Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source search companies in the UK, has written “Searching the Enterprise”, and is one of the main voices on data-driven search.

We discuss strategies to improve search systems quantitatively and much more.

Key Points:

Relevance in search is subjective and context-dependent, making it challenging to measure consistently.
Common mistakes in assessing search systems include overemphasizing processing speed and relying solely on user complaints.
Three main methods to measure search system performance:
- Human evaluation
- User interaction data analysis
- AI-assisted judgment (with caution)
Importance of balancing business objectives with user needs when optimizing search results.
Technical components for assessing search systems:
- Query logs analysis
- Source data quality examination
- Test queries and cases setup

Resources mentioned:

Quepid: Open-source tool for search quality testing
Haystack conference: Upcoming event in Berlin (September 30 - October 1)
Relevance Slack community
OpenSource Connections

Charlie Hull:

Nicolay Gerold:

search results, search systems, assessing, evaluation, improvement, data quality, user behavior, proactive, test dataset, search engine optimization, SEO, search quality, metadata, query classification, user intent, search results, metrics, business objectives, user objectives, experimentation, continuous improvement, data modeling, embeddings, machine learning, information retrieval

00:00 Introduction
01:35 Challenges in Measuring Search Relevance
02:19 Common Mistakes in Search System Assessment
03:22 Methods to Measure Search System Performance
04:28 Human Evaluation in Search Systems
05:18 Leveraging User Interaction Data
06:04 Implementing AI for Search Evaluation
09:14 Technical Components for Assessing Search Systems
12:07 Improving Search Quality Through Data Analysis
17:16 Proactive Search System Monitoring
24:26 Balancing Business and User Objectives in Search
25:08 Search Metrics and KPIs: A Contract Between Teams
26:56 The Role of Recency and Popularity in Search Algorithms
28:56 Experimentation: The Key to Optimizing Search
30:57 Offline Search Labs and A/B Testing
34:05 Simple Levers to Improve Search
37:38 Data Modeling and Its Importance in Search
43:29 Combining Keyword and Vector Search
44:24 Bridging the Gap Between Machine Learning and Information Retrieval
47:13 Closing Remarks and Contact Information

Query Understanding: Doing The Work Before The Query Hits The Database | S2 E1

Season 2 · Episode 1

jeudi 15 août 2024 • Duration 53:02

Welcome back to How AI Is Built.

We have got a very special episode to kick off season two.

Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in the field of information retrieval, recommender systems, and AI-powered search. He worked for Canva, Algolia, Cisco, Gartner, Handshake, to pick a few.

His core focus is query understanding.

**Query understanding is about focusing less on the results and more on the query.** The query of the user is the first-class citizen. It is about figuring out what the user wants and than finding, scoring, and ranking results based on it. So most of the work happens before you hit the database.

**Key Takeaways:**

- The "bag of documents" model for queries and "bag of queries" model for documents are useful approaches for representing queries and documents in search systems.
- Query specificity is an important factor in query understanding. It can be measured using cosine similarity between query vectors and document vectors.
- Query classification into broad categories (e.g., product taxonomy) is a high-leverage technique for improving search relevance and can act as a guardrail for query expansion and relaxation.
- Large Language Models (LLMs) can be useful for search, but simpler techniques like query similarity using embeddings can often solve many problems without the complexity and cost of full LLM implementations.
- Offline processing to enhance document representations (e.g., filling in missing metadata, inferring categories) can significantly improve search quality.

**Daniel Tunkelang**

- [LinkedIn](https://www.linkedin.com/in/dtunkelang/)
- [Medium](https://queryunderstanding.com/)

**Nicolay Gerold:**

- [⁠LinkedIn⁠](https://www.linkedin.com/in/nicolay-gerold/)
- [⁠X (Twitter)](https://twitter.com/nicolaygerold)
- [Substack](https://nicolaygerold.substack.com/)

Query understanding, search relevance, bag of documents, bag of queries, query specificity, query classification, named entity recognition, pre-retrieval processing, caching, large language models (LLMs), embeddings, offline processing, metadata enhancement, FastText, MiniLM, sentence transformers, visualization, precision, recall

[00:00:00] 1. Introduction to Query Understanding

Definition and importance in search systems
Evolution of query understanding techniques

[00:05:30] 2. Query Representation Models

The "bag of documents" model for queries
The "bag of queries" model for documents
Advantages of holistic query representation

[00:12:00] 3. Query Specificity and Classification

Measuring query specificity using cosine similarity
Importance of query classification in search relevance
Implementing and leveraging query classifiers

[00:19:30] 4. Named Entity Recognition in Query Understanding

Role of NER in query processing
Challenges with unique or tail entities

[00:24:00] 5. Pre-Retrieval Query Processing

Importance of early-stage query analysis
Balancing computational resources and impact

[00:28:30] 6. Performance Optimization Techniques

Caching strategies for query understanding
Offline processing for document enhancement

[00:33:00] 7. Advanced Techniques: Embeddings and Language Models

Using embeddings for query similarity
Role of Large Language Models (LLMs) in search
When to use simpler techniques vs. complex models

[00:39:00] 8. Practical Implementation Strategies

Starting points for engineers new to query understanding
Tools and libraries for query understanding (FastText, MiniLM, etc.)
Balancing precision and recall in search systems

[00:44:00] 9. Visualization and Analysis of Query Spaces

Discussion on t-SNE, UMAP, and other visualization techniques
Limitations and alternatives to embedding visualizations

[00:47:00] 10. Future Directions and Closing Thoughts - Emerging trends in query understanding - Key takeaways for search system engineers

[00:53:00] End of Episode

Season 2 Trailer: Mastering Search

Season 2 · Episode 1

jeudi 8 août 2024 • Duration 04:16

Today we are launching the season 2 of How AI Is Built.

The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. And we will be applying the learnings to season 2.

This season will be all about search.

We are trying to make it better, more actionable, and more in-depth. The goal is that at the end of this season, you have a full-fleshed course on search in podcast form, which mini-courses on specific elements like RAG.

We will be talking to experts from information retrieval, information architecture, recommendation systems, and RAG; from academia and industry. Fields that do not really talk to each other.

We will try to unify and transfer the knowledge and give you a full tour of search, so you can build your next search application or feature with confidence.

We will be talking to Charlie Hull on how to systematically improve search systems, with Nils Reimers on the fundamental flaws of embeddings and how to fix them, with Daniel Tunkelang on how to actually understand the queries of the user, and many more.

We will try to bridge the gaps. How to use decades of research and practice in iteratively improving traditional search and apply it to RAG. How to take new methods from recommendation systems and vector databases and bring it into traditional search systems. How to use all of the different methods as search signals and combine them to deliver the results your user actually wants.

We will be using two types of episodes:

Traditional deep dives, like we have done them so far. Each one will dive into one specific topic within search interviewing an expert on that topic.
Supplementary episodes, which answer one additional question; often either complementary or precursory knowledge for the episode, which we did not get to in the deep dive.

We will be starting with episodes next week, looking at the first, last, and overarching action in search: understanding user intent and understanding the queries with Daniel Tunkelang.

I am really excited to kick this off.

I would love to hear from you:

What would you love to learn in this season?
What guest should I have on?
What topics should I make a deep dive on (try to be specific)?

Yeah, let me know in the comments or just slide into my DMs on Twitter or LinkedIn.

I am looking forward to hearing from you guys.

I want to try to be more interactive. So anytime you encounter anything unclear or any question pops up in one of the episode, give me a shout and I will try to answer it to you and to everyone.

Enough of me rambling. Let’s kick this off. I will see you next Thursday, when we start with query understanding.

Shoot me a message and stay up to date:

Unlocking Value from Unstructured Data, Real-World Applications of Generative AI | ep 17

Season 1 · Episode 17

mardi 16 juillet 2024 • Duration 36:28

In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured data using AI, discussing challenging projects, the impact of ChatGPT, and the future of generative AI. From weather prediction to legal tech, Jonathan provides valuable insights into the practical applications of AI across various industries.

Key Takeaways

Generative AI projects often require less data cleaning due to the models' tolerance for "dirty" data, allowing for faster implementation in some cases.
The success of AI projects post-delivery is ensured through monitoring, but automatic retraining of generative AI applications is not yet common due to evaluation challenges.
Industries ripe for AI disruption include text-heavy fields like legal, education, software engineering, and marketing, as well as biotech and entertainment.
The adoption of AI is expected to occur in waves, with 2024 likely focusing on internal use cases and 2025 potentially seeing more customer-facing applications as models improve.
Synthetic data generation, using models like GPT-4, can be a valuable approach for training AI systems when real data is scarce or sensitive.
Evaluation frameworks like RAGAS and custom metrics are essential for assessing the quality of synthetic data and AI model outputs.
Jonathan’s ideal tech stack for generative AI projects includes tools like Instructor, Guardrails, Semantic Routing, DSPY, LangChain, and LlamaIndex, with a growing emphasis on evaluation stacks.

Key Quotes

"I think we're going to see another wave in 2024 and another one in 2025. And people are familiarized. That's kind of the wave of 2023. 2024 is probably still going to be a lot of internal use cases because it's a low risk environment and there was a lot of opportunity to be had."

"To really get to production reliably, we have to have these tools evolve further and get more standardized so people can still use the old ways of doing production with the new technology."

Jonathan Yarkoni

Nicolay Gerold:

Chapters

00:00 Introduction: Extracting Value from Unstructured Data
03:16 Flexible Tailoring Solutions to Client Needs
05:39 Monitoring and Retraining Models in the Evolving AI Landscape
09:15 Generative AI: Disrupting Industries and Unlocking New Possibilities
17:47 Balancing Immediate Results and Cutting-Edge Solutions in AI Development
28:29 Dream Tech Stack for Generative AI

unstructured data, textual data, automation, weather prediction, data cleaning, chat GPT, AI disruption, legal, education, software engineering, marketing, biotech, immediate results, cutting-edge solutions, tech stack