DCT HASH MATCHING QUALITY FOR RESIZED IMAGES 2

pHash does it’s mathematical operations for every pixels for original image size. Therefore, when the image is resized, the result is slightly different depending on image size. My assumption is that if every image is resized to certain size when the image is bigger than the size, the general matching quality would be better.

I tested the same set of image samples with previous posting, however, because of the speed, the comparison performed for 3644 images.

To find which size is good for normalization, I resized images to 2000, 1500, and 1000 width. And hamming distance between resized image to from 90% to 10%.

 

Hamming Distance is bigger than 4

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 6 17   90% 0.00 0.00 0.00 0.00 0.08 0.23
80% 0 0 0 1 12 19   80% 0.00 0.00 0.00 0.01 0.16 0.25
70% 0 0 0 1 18 36   70% 0.00 0.00 0.00 0.01 0.24 0.48
60% 0 0 0 12 48 87   60% 0.00 0.00 0.00 0.16 0.64 1.16
50% 0 0 3 26 77 141   50% 0.00 0.00 0.04 0.35 1.03 1.89
40% 0 0 9 62 172 272   40% 0.00 0.00 0.12 0.83 2.30 3.64
30% 1 12 54 156 333 475   30% 0.01 0.16 0.72 2.09 4.45 6.35
20% 27 99 246 424 693 851   20% 0.36 1.32 3.29 5.67 9.27 11.38
10% 163 360 753 1093 1442 1636   10% 2.18 4.82 10.07 14.62 19.29 21.89

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 2 13   90% 0.00 0.00 0.00 0.00 0.03 0.17
80% 0 0 0 0 7 14   80% 0.00 0.00 0.00 0.00 0.09 0.19
70% 0 0 0 1 15 33   70% 0.00 0.00 0.00 0.01 0.20 0.44
60% 0 0 0 2 25 64   60% 0.00 0.00 0.00 0.03 0.33 0.86
50% 0 0 0 7 46 110   50% 0.00 0.00 0.00 0.09 0.62 1.47
40% 0 0 4 25 123 223   40% 0.00 0.00 0.05 0.33 1.65 2.98
30% 0 0 18 86 247 389   30% 0.00 0.00 0.24 1.15 3.30 5.20
20% 6 27 116 257 520 678   20% 0.08 0.36 1.55 3.44 6.96 9.07
10% 137 308 654 969 1313 1507   10% 1.83 4.12 8.75 12.96 17.57 20.16

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 0 11   90% 0.00 0.00 0.00 0.00 0.00 0.15
80% 0 0 0 0 0 7   80% 0.00 0.00 0.00 0.00 0.00 0.09
70% 0 0 0 0 5 23   70% 0.00 0.00 0.00 0.00 0.07 0.31
60% 0 0 0 0 6 45   60% 0.00 0.00 0.00 0.00 0.08 0.60
50% 0 0 0 0 26 90   50% 0.00 0.00 0.00 0.00 0.35 1.20
40% 0 0 0 3 56 156   40% 0.00 0.00 0.00 0.04 0.75 2.09
30% 0 0 2 17 132 274   30% 0.00 0.00 0.03 0.23 1.77 3.67
20% 0 4 39 122 354 512   20% 0.00 0.05 0.52 1.63 4.74 6.85
10% 61 161 406 679 999 1193   10% 0.82 2.15 5.43 9.08 13.36 15.96

Hamming Distance is bigger than 6

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 1 2 3   80% 0.00 0.00 0.00 0.01 0.03 0.04
70% 0 0 0 0 4 11   70% 0.00 0.00 0.00 0.00 0.05 0.15
60% 0 0 0 0 8 21   60% 0.00 0.00 0.00 0.00 0.11 0.28
50% 0 0 0 6 20 46   50% 0.00 0.00 0.00 0.08 0.27 0.62
40% 0 0 4 21 46 94   40% 0.00 0.00 0.05 0.28 0.62 1.26
30% 0 0 11 45 106 175   30% 0.00 0.00 0.15 0.60 1.42 2.34
20% 4 14 63 142 286 381   20% 0.05 0.19 0.84 1.90 3.83 5.10
10% 59 153 347 539 752 869   10% 0.79 2.05 4.64 7.21 10.06 11.63

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 2 9   70% 0.00 0.00 0.00 0.00 0.03 0.12
60% 0 0 0 0 6 19   60% 0.00 0.00 0.00 0.00 0.08 0.25
50% 0 0 0 1 10 36   50% 0.00 0.00 0.00 0.01 0.13 0.48
40% 0 0 0 8 28 76   40% 0.00 0.00 0.00 0.11 0.37 1.02
30% 0 0 3 26 81 150   30% 0.00 0.00 0.04 0.35 1.08 2.01
20% 1 4 30 88 221 316   20% 0.01 0.05 0.40 1.18 2.96 4.23
10% 39 99 257 433 639 756   10% 0.52 1.32 3.44 5.79 8.55 10.11

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 1 8   70% 0.00 0.00 0.00 0.00 0.01 0.11
60% 0 0 0 0 2 15   60% 0.00 0.00 0.00 0.00 0.03 0.20
50% 0 0 0 0 9 35   50% 0.00 0.00 0.00 0.00 0.12 0.47
40% 0 0 0 0 14 62   40% 0.00 0.00 0.00 0.00 0.19 0.83
30% 0 0 0 4 35 104   30% 0.00 0.00 0.00 0.05 0.47 1.39
20% 0 0 11 39 138 233   20% 0.00 0.00 0.15 0.52 1.85 3.12
10% 14 38 135 270 449 566   10% 0.19 0.51 1.81 3.61 6.01 7.57

 

 

Conclusion

According to the test result, in terms of matching percentage, resizing before hashing gives better results; this can be a solution for better matching. However, false positive matching percentage is important.

Advertisements

2 thoughts on “DCT HASH MATCHING QUALITY FOR RESIZED IMAGES 2

  1. Pingback: Perceptual Hash Image Search – Notes | Dev Blog
  2. Pingback: pHash Image Search – Notes | Anna's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s