12 Using LLMs to Query Your Local Data

 

This chapter covers

  • Using GPT4All to query your own private data
  • Using PDF documents for querying by a LLM
  • Loading CSV and JSON files for querying
  • Using LLMs to analyze your own data files

Up to this point, you've explored the capabilities of LLMs and their usage through platforms like OpenAI and Hugging Face. While these services ease the burden of hosting models, they come at a cost. Alternatively, running powerful models locally requires significant setup effort and cost.

Developers often face the common challenge of utilizing LLMs to answer questions about their data, while businesses emphasize the need to maintain data privacy. In Chapter 8, we discussed sending data to OpenAI for embedding and querying with LangChain and LlamaIndex.

Let’s delve deeper into the topic, focusing on querying local private documents without compromising data privacy. Two approaches are:

  • Local LLM Querying for Text-based Data: We'll utilize a model from GPT4All to perform local embedding of your text-based data and querying. This approach is particularly useful for querying content like PDF documents.
  • LLM Querying for Structured Tabular Data: Whether running locally or hosted by third parties like OpenAI or Hugging Face, LLMs can be employed to return answers on querying tabular data (e.g., CSV or JSON). Instead of feeding LLMs with tabular data directly, we'll instruct them to provide queries programmatically for analysis.

12.1 Using GPT4All to Query with Your Own Data

12.1.1 Installing the Required Packages

12.1.2 Importing the Various Modules from the LangChain Package

12.1.3 Loading the PDF Documents

12.1.4 Splitting the Text into Chunks

12.1.5 Embedding

12.1.6 Loading the Embeddings

12.1.7 Downloading the Model

12.1.8 Asking Questions

12.1.9 Loading Multiple Documents

12.1.10 Loading CSV Files

12.1.11 Loading JSON Files

12.2 Using LLMs to Write Code to Analyze Your Own Data

12.2.1 Preparing the JSON File

12.2.2 Loading the JSON file

12.2.3 Asking the Question using the Mistral 7B Model

12.2.4 Asking questions using OpenAI

12.3 Summary