The goal of laconic image classification is for a model to correctly classify an image with the smallest amount of information (entropy) possible. We have compared four machine models and humans to see which can classify images with the least amount of information. We consider four types of information reduction: crop, colour, resolution, and all three combined. Below you can browse the minimal images.
For state-of-the-art image classification models, we extract "minimal entropy positive images" (aka simply "minimal images"), which are intuitively the smallest images for which the model gives a correct classification. We use PNG-compressed file-sizes as an estimated measure of entropy. We use off-the-shelf models trained on the ILSVRC2012 training set, where minimal images are computed from the corresponding test set. The information in the image is gradually reduced until any further reduction will cause an incorrect classification. We consider a simplified set of 20 classes to further allow for non-expert human classification. Human minimal images were computed in the other direction: starting from a void quality image, the human user could choose either to enhance the image or to guess the label. If the wrong label was guessed that image was skipped and a new (void) image presented. You can browse and download the minimal images above.
The human minimal images shown above serve as a benchmark of the robustness of image classifiers with respect to partial information and also serve as a yardstick to compare human and machine performance for the task. The images are based on the ILSVRC 2012 test dataset. We propose the following simple challenge: using the ILSVRC 2012 training set for training, classify the human images from the above set into the 20 classes shown with the least (top-1) classification error possible.
The mappings of the simplified 20 classes to the original ImageNet classes is available here in TSV format.
The pre-trained deep neural network models are available from the following locations.
We use images from the ILSVRC challenge: