chapter ten

10 Building a Voice-Enabled AI Chat Application

 

This chapter covers

  • The architecture of a voice AI pipeline: microphone to LLM response
  • Connecting MLX Whisper to Streamlit using st.audio_input()
  • Refactoring the app into clean, reusable functions
  • Adding a text fallback so the app works with keyboard input too
  • Testing the full voice conversation loop end to end
  • Understanding what makes this application challenging and how to extend it

This is the chapter where everything comes together. Over the previous chapters, you learned how to use the terminal, install Ollama, pull AI models, write Python, call the Ollama API, build web interfaces with Streamlit, and transcribe speech with MLX Whisper. In this chapter, you combine all of those skills into a voice-enabled AI chat application: speak into your microphone, watch your words become text, and receive a streaming AI response -- all running locally on your Mac, with complete privacy. Continue working in the same my-ai-chatbot folder you used earlier; this chapter adds voice_chat.py beside the voice_input.py file from Chapter 9.

10.1 The Voice AI Pipeline

10.2 Setting Up the Project

10.3 Building the Application Step by Step

10.3.1 Stage 1: Transcription in Streamlit

10.3.2 Why a Temporary File?

10.3.3 Stage 2: Connecting Transcription to the LLM

10.3.4 Stage 3: The Complete Application

10.4 The Complete Voice Chat Application

10.4.1 Running the Application

10.4.2 Updating the Ollama Model Menu

10.5 Understanding the Complete Code

10.5.1 Page Configuration

10.5.2 The Sidebar: Controlling Two Models

10.5.3 `transcribe_audio()` -- The Bridge Between Streamlit and MLX Whisper

10.5.4 `stream_response()` -- A Reusable Streaming Function

10.5.5 `handle_user_message()` -- The Unified Entry Point

10.5.6 Dual Input: Voice and Text

10.6 Running and Testing the Application

10.6.1 Start-up Checklist

10.6.2 Testing Checklist

10.6.3 Troubleshooting

10.7 Understanding the Latency Budget

10.8 Summary

10.9 Exercises