What it covers
A benchmarked comparison of four deep learning approaches on a 74-class fine-grained dog breed dataset (12,891 images):
- CNN from scratch — VGG-style stacked convolutions + max pooling
- Tuned CNN — adds Dropout regularization and an additional conv block
- Transfer learning with InceptionV3 — frozen ImageNet backbone with a trainable head
- Data augmentation — flips and contrast on top of the scratch CNN
Headline result
- Scratch CNN: ~5-7% validation accuracy (barely above chance on 74 classes)
- Tuned CNN with Dropout: ~11% validation accuracy
- InceptionV3 fine-tuned: ~96% validation accuracy
Takeaway
On a 13k-image dataset with 74 classes, transfer learning isn’t an optimization — it’s the only viable approach. The scratch CNN simply doesn’t have enough capacity relative to the data to learn useful representations. ImageNet pretraining gives InceptionV3 the visual prior it needs to adapt quickly to the target task.
Data augmentation on the scratch CNN actually made things worse — because the model was too shallow to benefit from distributional variety. A cautionary tale on the “always augment” reflex.
Stack
- Framework: TensorFlow / Keras
- Data pipeline:
tf.data.Dataset - Environment: Google Colab with GPU