Research Output
Beyond Hamming Distance: Exploring Spatial Encoding in Perceptual Hashes
  Forensic analysts are often tasked with analysing large volumes of data in modern investigations, and frequently make use of hashing technologies to identify previously encountered images. Perceptual hashes, which seek to model the semantic (visual) content of images, are typically compared by way of Normalised Hamming Distance, counting the ratio of bits which differ between two hashes. However, this global measure of difference may overlook structural information, such as the position and relative clustering of these differences. This paper investigates the relationship between localised/positional changes in an image and the extent to which this information is encoded in various perceptual hashes. Our findings indicate that the relative position of bits in the hash does encode useful information. Consequently, we prototype and evaluate three alternative perceptual hashing distance metrics: Nor-malised Convolution Distance, Hatched Matrix Distance, and 2-D Ngram Cosine Distance. Results demonstrate that there is room for improvement over Hamming Distance. In particular, the worst-case image mirroring transform for DCT-based hashes can be completely mitigated without needing to change the mechanism for generating the hash. Indeed, perceived hash weaknesses may actually be deficits in the distance metric being used, and large-scale providers could potentially benefit from modifying their approach.

Citation

McKeown, S. (2025, April). Beyond Hamming Distance: Exploring Spatial Encoding in Perceptual Hashes. Presented at DFRWS EU 2025, Brno, Czech Republic

Authors

Keywords

Perceptual Hashing, Semantic Approximate Matching, Distance Metrics, Hamming Distance, Image Forensics, Content Matching

Monthly Views:

Available Documents