AI-Generated Artificial Knowledge. Defined the very best manner: with… | by Cassie Kozyrkov | Jul, 2023

Defined the very best manner: with cats!

Why is AI-generated synthetic data all the fashion today? On this article, I’ll clarify my favourite manner: with cats!

Let’s say I wish to prepare a cat-not-cat classifier from scratch, however I solely have one picture to work with:

The creator’s cat, Huxley.

(Every part that follows is an analogy for what individuals do with tabular knowledge and textual content knowledge, so it applies past picture knowledge.)

Ideally, I’m going to wish a dataset consisting of hundreds of cat and not-cat images. If I’ve a digital camera and plentiful entry to cats, I can take a bunch of images just like the one I have already got, guaranteeing that I get precisely the dataset I designed:

A photograph I took in a park in Istanbul.

However what if I don’t have a digital camera and I reside catless on the moon? I may get the photographs I would like from a vendor, although I should watch out since inherited data is extra harmful than main knowledge.

Thanks, Pixabay, for being a superb (free) vendor of cat images.

However what if there’s no vendor who’ll promote me some cat images? (Sure, operating out of cat images on the web is a state of affairs that’s extra sci-fi than dwelling on the moon, however bear with me.)

Nicely, if I can’t acquire them and I can’t purchase them, then I’ll must make them myself. Behold, my creation:

Your creator is a veritable Michelangelo.

No good? Yeah, drawing was by no means my sturdy go well with. One other technique to make faux knowledge is to repeat present datapoints, besides this isn’t going to be a lot use for offering educational selection.

This method fools nobody. I’ve nonetheless solely successfully bought one datapoint.

It’ll be like educating a human scholar by giving them the identical instance over and over, so all they study is that one factor. If my dataset is 30,000 copies of this Huxley picture…

