OpenAI’s CLIP is “multi-modal” model capable of understanding the relationships and concepts between both text and images. As we’ll see, CLIP is very capable, and when used via the Hugging Face library, could not be easier to work with.
Article:
https://towardsdatascience.com/quick-fire-guide-to-multi-modal-ml-with-openais-clip-2dad7e398ac0
Friend Link (free access):
https://towardsdatascience.com/quick-fire-guide-to-multi-modal-ml-with-openais-clip-2dad7e398ac0?sk=89bb2d8b8e583ed109d8a05e00366645
70% Discount on the NLP With Transformers in Python course:
https://bit.ly/3DFvvY5
Subscribe for Article and Video Updates!
https://jamescalam.medium.com/subscribe
https://medium.com/@jamescalam/membership
Discord:
https://discord.gg/c5QtDB9RAP
00:00 Intro
00:15 What is CLIP?
02:13 Getting started
05:38 Creating text embeddings
07:23 Creating image embeddings
10:26 Embedding a lot of images
15:08 Text-image similarity search
21:38 Alternative image and text search