Cats Vs Dogs? Let’s make an AI to settle this: Crash Course Ai #19

Hey, John-Green-bot. I’ve been thinking really hard about a HUGE
life decision. I want to adopt a pet, and I’ve
narrowed it down to either a cat or a dog. But there are so many great cats and dogs
on adoption websites. John Green Bot: The Grey Parrot (Psittacus
erithacus) has an average lifespan in captivity of 40 to 60 years. Jabril: Yeah, birds are great and all but I was thinking maybe a cat or a dog. John Green Bot: Turtles will need a tank approximately
7.5 to 15 times their shell length in centimeters. Jabril: Yeah, you’re no help. Come on Spot and Mr. Cuddles. It looks like I’m going to have to
figure this out myself, and by myself I mean make an AI figure it out. Today we’re going to train an AI to go through
the list of pets and make the best decision for me based on data! That’ll make things less stressful… surely,
nothing will go wrong with this… right? INTRO Hey, I’m Jabril and welcome to Crash Course AI. Today we’re going to build a fairly simple
AI program to find out if adopting a cat or a dog will make me happier. This is a pretty subjective question, and
if I use data from the internet, I’ll have a lot of strong opinions. So, I’ll conduct my own survey where I collect
data about people’s cats and dogs and their happiness. I don’t care what pet I get, as long as it
makes me happy, so I won’t even include cat and dog labels in the model. Like in previous labs, I’ll be writing all
of my code using a language called Python in a tool called Google Colaboratory. And as you watch this video, you can follow
along with the code in your browser from the link we put in the description. In these Colaboratory files, there’s some
regular text explaining what I’m trying to do, and pieces of code that you can run
by pushing the play button. These pieces of code build on each other,
so keep in mind that you have to run them in order from top to bottom, otherwise you
might get an error. To actually run the code or make changes to
it, you’ll have to either click “open in playground” at the top of the page or
open the File menu and click “Save a Copy to Drive”. And one last time, I’ll give you this fyi:
you’ll need a Google account for this. Creating this AI to help me decide between
a cat and a dog should be pretty simple, so there are only a couple of steps:
First, I have to gather the data. I have to decide on a few features that could
predict if a cat or dog makes people happy. Then, I’ll make a survey that asks about
these features, and go out in the world and ask people if their pet fits these features
and makes them happy. It might be a little biased or imperfect,
but I think it’ll be juuust finnne to help me make my decision. Second, I have to build an AI model to predict
if a specific pet makes people happy. Because I’m not collecting a massive amount
of data, it’s helpful to use a small model to prevent overfitting. So I’ll plan on using a neural network with
just one hidden layer. And for our final step, I can go through an
adoption website of adorable cats and dogs, put in their features, and let the AI decide
which pet will make me happy. No more stressing about this tough decision,
the machines have my back! Step 1. Instead of importing a dataset this time,
we’ve got to create our own! So browsing through some adoption websites,
the most common features I saw represented, that are important to me are cuddly, soft,
quiet (especially when I’m trying to sleep), and energetic (because playing with an energetic
pet might remind me to get up from my computer a little more). In the AI I’m programming, I’ll use these
four values to predict their answer to “does your pet make you happy most of the time:
yes or no?” For the data collection part of this process,
I gave this five-question survey of yes/no questions to 30 people who own one cat or
one dog. I want to avoid bias based on the kind of
pet, so I put everyone’s answers into one big list. Every row is one person’s response, and
yes’s are represented as 1 and no’s as 0. By representing the answers as numbers, I
can use them directly as features in my model. The first four questions are my input features
and the last question about happiness is my label. And I’m not using cat or dog labels anywhere
in my model. I also have to split this dataset into the
training set and the testing set. The training set is used to train the neural
network, and the testing set is kept hidden from the neural network during training, so
I can use it to check the network’s accuracy later. Step 2. Now that I have a dataset, I need to build
a neural network to help make predictions. And if you did episode 5’s Neural Network
Lab (when I digitized John-Green-bot’s handwriting), this step will sound familiar because I’m
using the same tools. I’m going to use a multi-layer perceptron
neural network or MLP. As a refresher, this neural network has an
input layer for features, some number of hidden layers to learn representations, and a final
output layer to make a prediction. The hidden layers find relationships between
the features that help it make accurate predictions. Like in the Neural Networks Lab, we’re going
to import a library called SKLearn (which is short for Sci Kit Learn). SKLearn includes a bunch of different machine
learning algorithms, but I’ll just be using its Multi-Layer Perceptron algorithm. You can easily change the number of hidden
layers and other parts of the model, but I’ll start with something simple: four input features,
one hidden layer, and two outputs. We’ll set our hidden layer to four neurons,
the same size as our input. SKLearn will actually take care of counting
the size of my input and output automatically, so I only have to specify the size of the
hidden layer. Over the span of one epoch of training this
neural network, the hidden layer will pick up on patterns in the input features, and
pass a prediction to one of two output neurons: yes, happiness OR no, unhappiness. The code in our Collab notebook calls this
an “iteration” because an iteration and an epoch are the same thing in the algorithm
we’re using. As the model loops through the data, it predicts
happiness based on the features, compares its guess to the actual survey results, and
updates its weights and biases to give a better prediction in the future. And over multiple epochs of the same training
dataset, the neural network’s predictions should keep getting better! We’ll just go with 1000 epochs for now. Now, I can test my AI on my original training
data to see how well it captured that information, and on the testing data I set aside. The output here lets us know how good our
neural network is at guessing if these pet features predict owner happiness. And it looks like our model got 100% correct
on the testing data and 85% correct on the training data! Well guys, thanks for tuning in, but I think
this project is almost over! Everything was easy to do, performance looks great. I’ll just put in some pet features and let
it help me with this big life decision! Man, AI really is awesome. Step 3. Let’s see… here’s a pet I could adopt. The description says it’s cuddly, soft,
quiet at night, and isn’t that energetic. Let’s put in those features and see what
the model says. What? Why not? It seemed nice… But I guess that’s why I programmed an AI,
so I wouldn’t be swayed by my FLAWED human judgment! Let’s move on to the next one. Let’s see, this pet isn’t cuddly, isn’t
soft, isn’t quiet, and is really energetic … but let’s see what my AI says. Yes?! I’m not so sure that pet would’ve made
me happy, but my AI model had 100% accuracy on the testing set! I think I’m gonna test a few more… Ok, so I’ve tested a bunch of animals and
something weird is happening. The AI rarely told me that adopting a cat
would make me happy, but it almost always said a dog would make me happy. Maybe everyone I surveyed hates their cats? But, that seems unlikely. Besides, I never even told my AI what a cat
is! I combined all the surveys into one big dataset
without “cat” or “dog” labels! And I only taught the model about if a pet
is soft, cuddly, quiet, or energetic. Both cats and dogs can have all of those traits,
right? Is there a war between cats and AIs that I
don’t know about, and THAT’S why it’s biased? Hey John-Green-bot…. Do you guys hate cats?! John-Green-bot: No, Jabril. We love hairy babies… Jabril: Ugh, I don’t understand!!!! So, obviously, AI doesn’t have a grudge
against cats. I collected the survey data and I built the
AI, so if something went wrong and introduced an anti-cat bias… it’s on me, and I can
figure out what it is. So I should go back to analyze the data and
my model design. First, I’ll look for patterns and correlations
in my data by hand and make sure there’s nothing fishy going on. This means a new step! Step 4. What’s weird is that the model’s predictions
don’t seem to make sense to me despite the high performance. Specifically, I’m noticing a bias towards
dogs. So there might be something strange about
the data. Earlier, I decided to just pool all the survey results together, but now I’ll split them apart. Now I can create plots that compare the percentage
of dog owners I surveyed who are happy, the percentage of cat owners who are happy, and
the percentage of all the people who are happy with their pet (no matter what kind). To do this, I just need to compute the number
of happy dog owners divided by the total number of dog owners, the same for cat owners, and
the same for everyone I surveyed. Interesting. According to my survey results, cats make
people really happy. But when I put in the features for a cat,
my AI usually says it won’t make the owner happy. How can I have such good accuracy at predicting
happiness and always be wrong about cats?! I still don’t have answers about why the
data is skewed towards dogs… so I guess I should look at who even filled out my survey? Let’s make a plot that compares the total
number of dog owners and the total number of cat owners in my dataset. Yikes! Why are there so few cat responses in here?! I guess when I surveyed random people to make
my dataset bigger, I was at a park, and… that’s where I might have accidentally biased
my data collection. A lot of people who responded to my survey in the park must have been dog owners. So the first mistake I made is that my data
doesn’t actually have the same distributions as the real world. Instead of collecting the true frequencies
of each feature from a large random group of pet owners, I sampled from a dog-biased
set. That’s definitely something that should
be fixed… but it still doesn’t answer why the model seems so biased against cats. Both cats and dogs can be energetic, cuddly,
quiet, and soft, or not. That’s why I chose those features, they
seemed like they’d be common for both pets. But we can test this. I’ll make a plot where I divide the number
of times each feature is true for each animal by the total number of survey responses I
have for each animal. It looks like there are lots of different types of dogs in my dataset. Some are energetic and some are cuddly, but none of the cats are energetic. So this is a correlated feature, which is
a feature that is (unintentionally) correlated to a specific prediction or hidden category. In this case, knowing if something is energetic
is a cheat for knowing it’s a dog even though I didn’t tell the model about dogs. My model might have then learned that if a
pet is energetic, it makes owners happy, just because there was no data to tell it otherwise. We can see this correlation if we plot pet
energy vs owner happiness. In my data, if a pet is energetic, a person
is likely to be happy with it… no matter what other features are true. But if the pet isn’t energetic, it’s a
mixed bag of happiness. This is my second mistake: the data had a
correlated feature, so my AI found patterns that I didn’t want. To fix the first mistake, I need to collect
new data and make sure I balance the number of cat owners and dog owners. So I’ll go to the park, the pet store, the
grocery store… you get the idea. And I’ll keep track if I end up with too
much of one pet or the other. To fix the second mistake, I should make sure
the features are actually the most important things I care about when it comes to happiness. Honestly, I don’t NEED my pet to be energetic. So I could just cut it out of my dataset,
and not worry about it becoming a correlated feature as I train my AI. Although, I will be more careful and make
sure the other three features don’t get biased either. It’s important to note that every problem
isn’t this easy. For some AI, we can’t just remove features
that don’t have a clear meaning, or we might need to keep features because they’re the
only measurable values. In either case it’s usually EXTRA important
to have a human checking the results and ask a few important questions to avoid bias: Does the data match my goals? Does the AI have the right features? And am I really optimizing the right thing? And these questions aren’t that easy to
answer… So far in our labs we’ve demonstrated the
amazing abilities that AI can grant you, but as you can see, it’s important to be cautious. As far as my dog-or-cat decision goes… I’m going to have to do more work on this
algorithm. And collect a lot more survey data. So I guess the main takeaway for this episode
(and the last of our labs) is that when building AI systems, there aren’t always straightforward
and foolproof solutions. You have to iterate on your designs and account
for biases whenever possible. So, our next and final episode for Crash Course AI is all about the future and our role in shaping where AI is headed. I’ll see ya then. Crash Course AI is produced in association
with PBS Digital Studios! If you want to help keep all Crash Course
free for everybody, forever, you can join our community on Patreon. And if you want to learn more about research
methods to build good surveys and datasets, check out this episode of Crash Course Sociology.

41 thoughts on “Cats Vs Dogs? Let’s make an AI to settle this: Crash Course Ai #19

  1. Dog-lovers base their whole case on these commonplace, servile, and plebeian qualities, and amusingly judge the intelligence of a pet by its degree of conformity to their own wishes. The dog appeals to cheap and facile emotions; the cat to the deepest founts of imagination and cosmic perception in the human mind.

  2. If you ask around at a park you’ll get a majority of you stats from dog people, if you ask on the internet you’ll get majority of stats from cat people. Different kinds of people do different things and own different pets to enhance their life happiness.

  3. Dogs

    -Usually more friendly and playful which can make it very easy to boost your happy hormones (endorphins)

    -Walks and playing make good exercise for both owner and dog so you both feel great after a play session.

    A lot more maintenance; grooming, walking, bathing, etc

    Needy, many dogs get depressed when left alone for a while and constantly want attention.



    -Self maintained, they self groom and clean themselves. Litter boxes make it so they can go to the bathroom on their own and they can fair well alone.
    -They are not as noisy for the most part and you can play indoors due to their size (yes there are dogs that are small but theres never going to be a domestic cat the size of a medium to big dog)


    -Their introverted nature often means they’re not available 24/7 so if you catch them on a bad day they might want to bite and scratch

    -Hard to get em to go on walks with you. They really like their independence.

    Summary: With dogs you have a clingy needy friend.
    With cats you ARE the clingy needy friend.

  4. Gawd, i thought this was about cute cuddly cats and dogs and AI that was already developed. you lost me at input, output. ???

  5. Umm. That's not what AI is.

    Using a simple predictive program from a data set to completely eliminate one of your original criteria (changing your entire success standard) is called, "HUMAN LEARNING."

    So congratulations on showing how the human brain and non-artificial intelligence works.

  6. You’ll never get a group of cats to pull a stupid sled across the snowy Alaskan tundra in below zero weather. ? + ? = ?

  7. Jabril needs to include a poop variable in addition to energetic, cuddly, etc. “Do you enjoy taking walks several times each day regardless of weather conditions?” “Are you willing to scoop clumps of pee and poop from boxes distributed around your residence on a daily basis?”

Leave a Reply

Your email address will not be published. Required fields are marked *