This book is a collection of blog posts I’ve written about how you can use Large Language Models (LLMs) from within R. You know, so that you can keep up with the cool AI stuff.
But all kidding aside, LLMs are an incredible tool that can help you with all kinds of things. Personally, I’ve found LLMs to be useful for
formulating or cleaning up texts,
extracting data from unstructured PDF reports,
running named entity recognition with text data,
categorize text data snippets into pre-defined categories, or
making data queries accessible to non-programmers via chat interfaces.
And the point of this whole book is that you don’t have to interact with LLMs through chat interfaces in your web browser. This would make it incredibly hard to use LLMs at scale. Think about it:
If you had to categorize 1000 customer reviews in terms of their sentiment (positive or negative reviews), then you’d spend A LOT of time on sticking that into ChatGPT manually. But if you can just talk to ChatGPT from within your R session, then this task simply becomes a for-loop.
Of course, this is just one way to use LLMs for data analysis. Ultimately, the choice of how to use LLMs is yours. The aim of this book is to equip you with lots of tools and strategies to use them however you want.
Structure
This book is currently still in development mode. So things might change as I make more progress on this book. Nevertheless, here’s a structure that I envision for this project:
1 Sending data.frames to LLMs with {mall}: In this chapter, we talk about the {mall} package which is a nice package focused on passing data.frames through LLMs. Its main focus is to run analysis like text classifications, extractions or sentiment analysis. We start with this because it’s a nice package to get your feet wet with AI by using data.frames that all data scientists are familiar with. Also, we will use talk about ollama which is a tool to run smaller LLMs locally on your computer. That way, you can start with LLMs without the monetary costs associated with talking to LLM vendors like OpenAI or Anthropic. But you might get worse results or your computer might take a long time to run this intensive LLM calculations.
2 Getting Started With {ellmer}: This chapter will explain how to setup the {ellmer} package to talk to LLMs using your R console. In contrast to the {mall} package, this package isn’t concerned with sending data.frames to LLMs. Instead, it gives you a programmatic interface to the LLMs APIs. This will be like talking to an AI in a web browser but only through the R console.
3 Using AI Functions: We learn how to enhance our chats with so-called “tools”. This is a great way to spice up your LLM with traditional functions.
4 Structured Output: In this chapter, we learn how to let an LLM extract specific data from a given text. This will use the {ellmer} package and enforce the output style.
5 Building Shiny Chat Bots: In this chapter, we talk about how {ellmer} can be combined with Shiny to create a nice chat bot.
In the future, I might also extend this with some theoretical aspects about LLMs. Or things like chunking strategies for RAG applications. Or how to use LLMs in combinations with other web services like AWS Textract. But for now that part of this book is uncertain.
Session Info
In case you’re wondering what package and R versions I’m using, you can check out this info box.
This in-depth video course teaches you everything you need to know about becoming better & more efficient at cleaning up messy data. This includes Excel & JSON files, text data and working with times & dates. If you want to get better at data cleaning, check out the course page.
Insightful Data Visualizations for "Uncreative" R Users
This video course teaches you how to leverage {ggplot2} to make charts that communicate effectively without being a design expert. Course information can be found on the course page.