12 Using LLMs to Query Your Local Data
This chapter covers
- Using GPT4All to query your own private data
- Using PDF documents for querying by a LLM
- Loading CSV and JSON files for querying
- Using LLMs to analyze your own data files
Up to this point, you've explored the capabilities of LLMs and their usage through platforms like OpenAI and Hugging Face. While these services ease the burden of hosting models, they come at a cost. Alternatively, running powerful models locally requires significant setup effort and cost.
Developers often face the common challenge of utilizing LLMs to answer questions about their data, while businesses emphasize the need to maintain data privacy. In Chapter 8, we discussed sending data to OpenAI for embedding and querying with LangChain and LlamaIndex.
Let’s delve deeper into the topic, focusing on querying local private documents without compromising data privacy. Two approaches are:
- Local LLM Querying for Text-based Data: We'll utilize a model from GPT4All to perform local embedding of your text-based data and querying. This approach is particularly useful for querying content like PDF documents.
- LLM Querying for Structured Tabular Data: Whether running locally or hosted by third parties like OpenAI or Hugging Face, LLMs can be employed to return answers on querying tabular data (e.g., CSV or JSON). Instead of feeding LLMs with tabular data directly, we'll instruct them to provide queries programmatically for analysis.