Meta AI’s One other Revolutionary Massive Scale Mannequin — DINOv2 for Picture Characteristic Extraction | by Gurami Keretchashvili | Jun, 2023

On this half, I’ll attempt to display how DINOv2 works in a real-case situation. I’ll create fine-grained picture classification job.

Classification workflow:

  • Obtain the Food101 dataset from PyTorch datasets.
  • Extract options from prepare and take a look at datasets utilizing small DINOv2
  • Prepare ML classifier fashions (SVM, XGBoost and KNN) utilizing extracted options from coaching dataset.
  • Make a prediction on extracted options from take a look at dataset.
  • Consider every ML mannequin’s accuracy and F1score.

Information: Food 101 is a difficult information set of 101 meals classes with 101,000 photos. For every class, 250 manually reviewed take a look at photos are offered in addition to 750 coaching photos.

Mannequin: small DINOv2 model (ViT-S/14 distilled)

ML fashions: SVM, XGBoost, KNN.

Step 1 — Arrange (You should use Google Colab to run the code and switch GPU on)

import torch
import numpy as np
import torchvision
from torchvision import transforms
from torch.utils.information import Subset, DataLoader
import matplotlib.pyplot as plt
import time
import os
import random
from tqdm import tqdm

from xgboost import XGBClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, f1_score
import pandas as pd

def set_seed(no):
os.environ['PYTHONHASHSEED'] = str()
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True


Step 2 — Create Transformation, obtain and create Food101 Pytorch datasets, create prepare and take a look at dataloader objects.

batch_size = 8

transformation = transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

trainset = torchvision.datasets.Food101(root='./information', break up='prepare',
obtain=True, remodel=transformation)

testset = torchvision.datasets.Food101(root='./information', break up='take a look at',
obtain=True, remodel=transformation)

# train_indices = random.pattern(vary(len(trainset)), 20000)
# test_indices = random.pattern(vary(len(testset)), 5000)

# trainset = Subset(trainset, train_indices)
# testset = Subset(testset, test_indices)

trainloader = torch.utils.information.DataLoader(trainset, batch_size=batch_size,

testloader = torch.utils.information.DataLoader(testset, batch_size=batch_size,

lessons = trainset.lessons

print(len(trainset), len(testset))
print(len(trainloader), len(testloader))

[out] 75750 25250

[out] 9469 3157

Step 3 (Optionally available) — Visualize coaching dataloader batch

# Get a batch of photos
dataiter = iter(trainloader)
photos, labels = subsequent(dataiter)

# Plot the photographs
fig, axes = plt.subplots(1, len(photos),figsize=(12,12))
for i, ax in enumerate(axes):
# Convert the tensor picture to numpy format
picture = photos[i].numpy()
picture = picture.transpose((1, 2, 0)) # Transpose to (peak, width, channels)

# Normalize the picture
imply = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
normalized_image = (picture * std) + imply
# Show the picture
ax.set_title(f'Label: {labels[i]}')

# Present the plot

batch of photos

Step 4 — load small DINOv2 mannequin and extract options from coaching and take a look at dataloaders.

machine = torch.machine("cuda:0" if torch.cuda.is_available() else "cpu")
dinov2_vits14 = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14').to(machine)

train_embeddings = []
train_labels = []

with torch.no_grad():
for information, labels in tqdm(trainloader):
image_embeddings_batch = dinov2_vits14(


test_embeddings = []
test_labels = []

with torch.no_grad():
for information, labels in tqdm(testloader):
image_embeddings_batch = dinov2_vits14(


#concatinate outcome
train_embeddings_f = np.vstack(train_embeddings)
train_labels_f = np.concatenate(train_labels).flatten()

test_embeddings_f = np.vstack(test_embeddings)
test_labels_f = np.concatenate(test_labels).flatten()

train_embeddings_f.form, train_labels_f.form, test_embeddings_f.form, test_labels_f.form

[out] ((75750, 384), (75750,), (25250, 384), (25250,))

Step 5 — Construct a perform for SVM, XGBoost and KNN classifiers.

def evaluate_classifiers(X_train, y_train, X_test, y_test):
# Help Vector Machine (SVM)
svm_classifier = SVC()
svm_classifier.match(X_train, y_train)
svm_predictions = svm_classifier.predict(X_test)

# XGBoost Classifier
xgb_classifier = XGBClassifier(tree_method='gpu_hist')
xgb_classifier.match(X_train, y_train)
xgb_predictions = xgb_classifier.predict(X_test)

# Ok-Nearest Neighbors (KNN) Classifier
knn_classifier = KNeighborsClassifier()
knn_classifier.match(X_train, y_train)
knn_predictions = knn_classifier.predict(X_test)

# Calculating High-1
top1_svm = accuracy_score(y_test, svm_predictions)
top1_xgb = accuracy_score(y_test, xgb_predictions)
top1_knn = accuracy_score(y_test, knn_predictions)

# Calculating F1 Rating
f1_svm = f1_score(y_test, svm_predictions, common='weighted')
f1_xgb = f1_score(y_test, xgb_predictions, common='weighted')
f1_knn = f1_score(y_test, knn_predictions, common='weighted')

return pd.DataFrame({
'SVM': {'High-1 Accuracy': top1_svm, 'F1 Rating': f1_svm},
'XGBoost': {'High-1 Accuracy': top1_xgb,'F1 Rating': f1_xgb},
'KNN': {'High-1 Accuracy': top1_knn, 'F1 Rating': f1_knn}

X_train = train_embeddings_f # Coaching information options
y_train = train_labels_f # Coaching information labels
X_test = test_embeddings_f # Check information options
y_test = test_labels_f # Check information labels

outcomes = evaluate_classifiers(X_train, y_train, X_test, y_test)


Results of small DINOv2 + SVM/XGBoost/KNN (picture by the creator)

Wow, the outcomes are nice! As demonstrated, SVM mannequin skilled on small DINOv2 extracted options outperformed different ML fashions and achieved virtually 90% accuracy.

Although we used small DINOv2 mannequin to extract options, ML fashions (particularly SVM) skilled on extracted options demonstrated nice efficiency on the superb grained classification job. The mannequin can classify objects with virtually 90% accuracy out of 101 completely different lessons.

The accuracy would enhance if it was used massive, massive or large DINOv2 fashions. You simply want to vary the dinov2_vits14 in step 4 with dinov2_vitb14, dinov2_vitl14 or dinov2_vitg14. You may have a try to be happy to share the accuracy outcome within the remark part 🙂

Deep Studying in Recommender Methods: A Primer | by Samuel Flender | Jun, 2023

The PATH Variable For the Confused Knowledge Scientist: Find out how to Handle It | by Bex T. | Jun, 2023