in

OpenAI Vector Embeddings – Talk to any book or document; Retrieval-Augmented Generation!



With the increasing usefulness of ChatGPT and its interfacing with external tools, vector databases can be the key to performing memory-augmented search. This enables applications like imbuing ChatGPT with more domain-specific knowledge, or be able to chat with a document such as a PDF or book.

This week, we have a special guest. Manas, an undergraduate from Nanyang Technological University (NTU) – Singapore – will be showcasing his Talk to Book applications, such as Atomic Habits ( https://www.gptbook.club/atomic-habits ).

We will also be discussing how vector embeddings are generated and how they can be compared to one another.

Also, we have a special appearance by Tim Scarfe! He is the main creator of the Machine Learning Street Talk channel, which features insightful commentaries and interviews on popular Machine Learning topics. Check it out here: https://www.youtube.com/@MachineLearningStreetTalk

~~~~~~~~~~~~~~~~~~~~~~~

References:

GPT Book Club (Talk to any book by Manas): https://www.gptbook.club/atomic-habits
My own tutorial on using vector embeddings and retrieval from Pinecone: https://www.youtube.com/watch?v=rh-WNG4yJag
Slides: https://github.com/tanchongmin/TensorFlow-Implementations/blob/main/Paper_Reviews/OpenAI%20Vector%20Embeddings.pdf

OpenAI Embedding Paper: https://arxiv.org/abs/2201.10005
OpenAI Embeddings Page: https://platform.openai.com/docs/guides/embeddings

BERT Paper: https://arxiv.org/abs/1810.04805

Generative AI Paper (Tim Scarfe and I were talking about agents being able to interact with one another with different personality): https://arxiv.org/abs/2304.03442

Pinecone Vector Database: https://www.pinecone.io/

~~~~~~~~~~~~~~~~~~~~~~~

0:00 Introduction
1:03 Sharing by Manas on Atomic Habits GPT
8:38 Free-flow Discussion between Manas, Tim Scarfe, Mehul and I
28:50 Embedding Space
32:02 Traditional Approach: TF-IDF
35:43 Modern Approach: Vector Embeddings
37:03 Token Embeddings
39:58 3D Embedding Visualization
41:44 OpenAI Embedding Paper + my opinion of how embeddings can be trained
57:13 Issues with Contrastive Learning
1:04:36 Distance Metrics
1:09:04 Will we lose any meaning by normalizing vectors? (Note: Cosine Similarity is not affected)
1:14:13 External Vector Database (e.g. Pinecone)
1:15:22 Use Cases
1:16:06 Discussion
1:37:15 Conclusion

~~~~~~~~~~~~~~~~~~~~~~~~~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord: https://discord.gg/fXCZCPYs
Online AI blog: https://delvingintotech.wordpress.com/.
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Twitch: https://www.twitch.tv/johncm99
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin

Discussing GPT-3, OpenAI Codex, and DALL-E with YouTube Creator Sandra Kublik

The Man Who Turned Stanley Tumblers Into a $750m Product

The Man Who Turned Stanley Tumblers Into a $750m Product