From 2b3842cd51fa4668b424f3f210322411fd26a7d3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Olaf=20G=C3=B3rski?= Date: Sun, 29 Oct 2023 10:40:56 +0100 Subject: [PATCH] app(basic-rag): add advanced app idea - basic RAG --- .../Basic-Retrieval-Augmented-Generation.md | 60 +++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 Projects/3-Advanced/Basic-Retrieval-Augmented-Generation.md diff --git a/Projects/3-Advanced/Basic-Retrieval-Augmented-Generation.md b/Projects/3-Advanced/Basic-Retrieval-Augmented-Generation.md new file mode 100644 index 00000000..540ca37f --- /dev/null +++ b/Projects/3-Advanced/Basic-Retrieval-Augmented-Generation.md @@ -0,0 +1,60 @@ +# Basic Retrieval Augmented Generation API + +**Tier:** 3-Advanced + +bRAG is an example of how to build a basic Retrieval Augmented Generation API from scratch. This project should teach you how to interact with LLMs/Vector Databases. +All LLMs have limited training data, limited knowledge, training data cutoff point and so on. + +In order to give the model access to more recent data, or maybe data that's private/unique to you, we've developed a +RAG - Retrieval Augmented Generation. All it means is that during either Completion or Chat Completion, inside your request, +you provide some additional information that is Retrieved from somewhere, and then make the model generate the answer based on that particular +information, which we put inside the request's context, or simply the string that we send to the model. + +Now, on a basic level, this is all there's to it. We Retrieve some information, to Augment our prompt/inquiry/request before the Generation. +More often than not, because of various reasons, the Retrieval part happens through some kind of a Vector Database + +When we talk about LLM-related applications, there are platitude of different usages and examples. +Most cases and applications however, orbit around or consist of the following terms/things: + +1. **Completions** - completion is a simple one off message where you provide an input and the model generates a completion based on that. Completion is, as mentioned, in basic form, one off. So in this case we assume that there is no context, no nothing, no past messages. Just one single string and it's completion. +2. **Chat Completions** - similar to plain Completion, except here you usually have contextual conversation enabled, so you keep the history of the conversation in the context and as it is a conversation, you also often have different roles/authors of the message. I mean - if it's a conversation you need to have at least 2 different sides. In LLM world it's usually user, system, assistant. + + +Now, both of these are plain ways of interacting with the LLMs. + + +In the RAG type of service, we can single out the following types: + +**Document Q&A** - this one is usually a simple Q&A service, something akin to Completions. It can have contextual conversation capabilities, but can be also one-off question just like completion. usually the latter. The data over which we do the Q&A is not persisted. It's just a temporary Q&A. + +**Knowledge Base Q&A** - it's basically Document Q&A, but the data we query is persisted somewhere. It's properly ingested and persisted. In our terminology. So Document Q&A without Contextual Conversation but over persisted data as opposed to temporary data in the Document Q&A. + +**Contextual Knowledge Base Q&A** - same as above, but rather than being a one-off completion, we keep the context of the conversation in mind and have proper Chat Completions with contextual conversations. Not only we can ask questions over some persisted data, but the app also remembers our previous questions/answers (up to a certain point) and keeps that in mind. + +What we want to do in this project is to create an API that will implement Completions API. +Meaning we will need an api, that will accept a simple query/question and return answer generated by the LLM, generated while being augmented with the information we retrieved from our Vector Database. + +Two main endpoints will be /completions and /ingest. + +You will need to create an API service, set up Database locally and a Vector Database. + +## User Stories + +- [ ] User can send a request to an api endpoint with a question that will return LLM generated answer to the question, augmented with the information retrieved from the Vector Database using Semantic Search. +- [ ] User can list the questions they've asked and the answers they got. +- [ ] User can ask new questions and get new answers. +- [ ] User can add .txt/.md data that needs to be ingested to the vector database through an endpoint that will ingest the data and index it in the vector database, overwriting previously ingested data. + +## Bonus features + +- [ ] User can register & log in. +- [ ] User can only view their own questions/answers. +- [ ] The generation of the answers is done with the context of previous messages - we implement Chat Message history context. +- [ ] User can add .pdf/.docx/.pptx data that needs to be ingested to the vector database. + +## Useful links and resources + +[How to bRAG - building a basic Retrieval Augmented Generation API from scratch using openai, qdrant](https://grski.pl/pdf-brag) +## Example projects + +[bRAG](https://github.com/grski/brag) \ No newline at end of file