4 LLM-driven data science
This chapter covers
- Find and explore business data using AI
- Avoid the pitfalls of synthetic data in analytics
- Build data notebooks with zero coding
- Direct prompting vs. open exploration with LLMs
- Solve business problems using simplified machine learning
- The trade-off between precision and recall
Let’s imagine that we need to answer a simple question: Which manufacturer dominates a given market? To answer this question, we first need to identify the datasets that provide the necessary information.
The internet is full of data sources, both synthetic and non-synthetic, but selecting the right one is challenging. Once we’ve met that challenge and have the right dataset, we need to perform data cleaning, parsing, and normalization. Only after that can we conduct a simple statistical analysis to answer our previous question. Before LLMs, such a task would involve a data analyst trained in the appropriate technology for the problem, most often Python and Jupyter Notebook, and a substantial amount of time. Fortunately, with the power of LLMs, anyone with basic programming knowledge (you don’t even need to know Python) can create an analytics notebook that answers virtually any question about a given set of data. You only need to focus on what is most importan—the actual data and business domain.