The vast majority of searches on images are metadata-based. Although it is efficient to search for text-based metadata, such systems require humans to enter descriptions about each image in the repository. The consistency and completeness of metadata are always big problems.
For the past two decades, researchers have been working on various algorithms to retrieve desired images from a large collection by analyzing contents of images. This research area is generally called content-based image retrieval (CBIR). The basic idea of CBIR is to extract feature items from an image and make comparison with the target item or search reference. CBIR tools rank the images in the repository and present a few of the closest ones. The most commonly used feature items for comparison are color, texture, and shape, all of which are in pixel spaces. Stepping further, other researchers have tried to advance the searches into semantic spaces.
Color is the lowest level of physical characteristics that can be extracted from an image. It is relatively robust to measure and analyze. Texture refers to visual patterns in images. All surfaces, such as trees or clouds, can be modeled with texture. It contains essential information about the structure of surfaces and their relationship to surrounding environment. Shape refers to the boundary and region of a geometric object.
If digitization for CBIR is a major aim, post-scan processing is important in order to facilitate the feature extractions of color, texture, and shape. It is thus appropriate and desired to make the following adjustments which are recommended by NARA “Technical Guidelines for Digitizing Archival Materials for Electronic Access.”
1. Color and gamma correction
2. Tonal scale adaptation
3. Texture filtering to compensate for variations in originals
4. Sharpening to match appearance of the original
In addition to the technical considerations for scanning and image processing, carefully selecting materials for digitization is critical because the execution of CBIR is very time consuming as of today’s technologies.