Previously, we were able to create a program that classifies between cats and dogs. Next we wanted to try to teach it a new species: ant.
The problem with our attempt is that in comparison to the amount of cats and dogs pictures (12500 each), we don’t have enough pictures of ants (less than 100). After running the program several times, it still identifies the ant as a dog. When we reviewed the pictures of dogs from the dataset, we noticed that it closely resembles the shape of the ant in the testing dataset (it is a zoomed-in image of an ant). What we learned here is that the model only identifies patterns. When it tried to identify the ant from a zoomed-in photo, the patterns are similar to the patterns from the dog pictures. Another factor is also that we did not train it enough with the ants data.
After trying to get the model to identify ant, we tried a different species: butterfly. For training, we also do not have 12500 pictures of butterflies. However, because the pattern of a butterfly is very different than dogs and cats, the program was able to identify the butterfly correctly.
From these attempts, we learned that a model is only as smart as the training data we give it. For the training data, it is not only the quantity of the data that matters, but also the quality.