DCT HASH MATCHING QUALITY FOR RESIZED IMAGES 2

pHash does it’s mathematical operations for every pixels for original image size. Therefore, when the image is resized, the result is slightly different depending on image size. My assumption is that if every image is resized to certain size when the image is bigger than the size, the general matching quality would be better.

I tested the same set of image samples with previous posting, however, because of the speed, the comparison performed for 3644 images.

To find which size is good for normalization, I resized images to 2000, 1500, and 1000 width. And hamming distance between resized image to from 90% to 10%.

 

Hamming Distance is bigger than 4

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 6 17   90% 0.00 0.00 0.00 0.00 0.08 0.23
80% 0 0 0 1 12 19   80% 0.00 0.00 0.00 0.01 0.16 0.25
70% 0 0 0 1 18 36   70% 0.00 0.00 0.00 0.01 0.24 0.48
60% 0 0 0 12 48 87   60% 0.00 0.00 0.00 0.16 0.64 1.16
50% 0 0 3 26 77 141   50% 0.00 0.00 0.04 0.35 1.03 1.89
40% 0 0 9 62 172 272   40% 0.00 0.00 0.12 0.83 2.30 3.64
30% 1 12 54 156 333 475   30% 0.01 0.16 0.72 2.09 4.45 6.35
20% 27 99 246 424 693 851   20% 0.36 1.32 3.29 5.67 9.27 11.38
10% 163 360 753 1093 1442 1636   10% 2.18 4.82 10.07 14.62 19.29 21.89

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 2 13   90% 0.00 0.00 0.00 0.00 0.03 0.17
80% 0 0 0 0 7 14   80% 0.00 0.00 0.00 0.00 0.09 0.19
70% 0 0 0 1 15 33   70% 0.00 0.00 0.00 0.01 0.20 0.44
60% 0 0 0 2 25 64   60% 0.00 0.00 0.00 0.03 0.33 0.86
50% 0 0 0 7 46 110   50% 0.00 0.00 0.00 0.09 0.62 1.47
40% 0 0 4 25 123 223   40% 0.00 0.00 0.05 0.33 1.65 2.98
30% 0 0 18 86 247 389   30% 0.00 0.00 0.24 1.15 3.30 5.20
20% 6 27 116 257 520 678   20% 0.08 0.36 1.55 3.44 6.96 9.07
10% 137 308 654 969 1313 1507   10% 1.83 4.12 8.75 12.96 17.57 20.16

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 1   100% 0.00 0.00 0.00 0.00 0.00 0.01
90% 0 0 0 0 0 11   90% 0.00 0.00 0.00 0.00 0.00 0.15
80% 0 0 0 0 0 7   80% 0.00 0.00 0.00 0.00 0.00 0.09
70% 0 0 0 0 5 23   70% 0.00 0.00 0.00 0.00 0.07 0.31
60% 0 0 0 0 6 45   60% 0.00 0.00 0.00 0.00 0.08 0.60
50% 0 0 0 0 26 90   50% 0.00 0.00 0.00 0.00 0.35 1.20
40% 0 0 0 3 56 156   40% 0.00 0.00 0.00 0.04 0.75 2.09
30% 0 0 2 17 132 274   30% 0.00 0.00 0.03 0.23 1.77 3.67
20% 0 4 39 122 354 512   20% 0.00 0.05 0.52 1.63 4.74 6.85
10% 61 161 406 679 999 1193   10% 0.82 2.15 5.43 9.08 13.36 15.96

Hamming Distance is bigger than 6

normalization size 2000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 1 2 3   80% 0.00 0.00 0.00 0.01 0.03 0.04
70% 0 0 0 0 4 11   70% 0.00 0.00 0.00 0.00 0.05 0.15
60% 0 0 0 0 8 21   60% 0.00 0.00 0.00 0.00 0.11 0.28
50% 0 0 0 6 20 46   50% 0.00 0.00 0.00 0.08 0.27 0.62
40% 0 0 4 21 46 94   40% 0.00 0.00 0.05 0.28 0.62 1.26
30% 0 0 11 45 106 175   30% 0.00 0.00 0.15 0.60 1.42 2.34
20% 4 14 63 142 286 381   20% 0.05 0.19 0.84 1.90 3.83 5.10
10% 59 153 347 539 752 869   10% 0.79 2.05 4.64 7.21 10.06 11.63

normalization size 1500

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 2 9   70% 0.00 0.00 0.00 0.00 0.03 0.12
60% 0 0 0 0 6 19   60% 0.00 0.00 0.00 0.00 0.08 0.25
50% 0 0 0 1 10 36   50% 0.00 0.00 0.00 0.01 0.13 0.48
40% 0 0 0 8 28 76   40% 0.00 0.00 0.00 0.11 0.37 1.02
30% 0 0 3 26 81 150   30% 0.00 0.00 0.04 0.35 1.08 2.01
20% 1 4 30 88 221 316   20% 0.01 0.05 0.40 1.18 2.96 4.23
10% 39 99 257 433 639 756   10% 0.52 1.32 3.44 5.79 8.55 10.11

normalization size 1000

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000     5000 < 4000 < 3000 < 2000 < 1000 < < 1000
100% 0 0 0 0 0 0   100% 0.00 0.00 0.00 0.00 0.00 0.00
90% 0 0 0 0 0 1   90% 0.00 0.00 0.00 0.00 0.00 0.01
80% 0 0 0 0 0 1   80% 0.00 0.00 0.00 0.00 0.00 0.01
70% 0 0 0 0 1 8   70% 0.00 0.00 0.00 0.00 0.01 0.11
60% 0 0 0 0 2 15   60% 0.00 0.00 0.00 0.00 0.03 0.20
50% 0 0 0 0 9 35   50% 0.00 0.00 0.00 0.00 0.12 0.47
40% 0 0 0 0 14 62   40% 0.00 0.00 0.00 0.00 0.19 0.83
30% 0 0 0 4 35 104   30% 0.00 0.00 0.00 0.05 0.47 1.39
20% 0 0 11 39 138 233   20% 0.00 0.00 0.15 0.52 1.85 3.12
10% 14 38 135 270 449 566   10% 0.19 0.51 1.81 3.61 6.01 7.57

 

 

Conclusion

According to the test result, in terms of matching percentage, resizing before hashing gives better results; this can be a solution for better matching. However, false positive matching percentage is important.

Advertisements

DCT Hash matching quality for resized images

DCT Hash in pHash is selected as image similarity search algorithm for Creative Commons image license search. Recently, we found that some images are not matched when they are resized. So, I tested it for flickr CC images.

Firstly, I resized image to 90%, 80%, 70%, 60%, 50%, 40%, 30%, 20%, and 10%. Those resized image is hashed and calculated hamming distance from 100%. Since image size matters, I categorized images depending of original size to bigger than 5000 pixels width, 4000~5000, 3000~4000, 2000~3000, 1000~2000, and smaller than 1000 pixels.

Total image count was 7475 images.

Image count that the hamming distance is bigger than 4

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 5 21 31 39 53 71
80% 8 28 46 58 80 97
70% 13 32 60 85 128 172
60% 23 71 123 170 244 322
50% 30 97 173 246 359 491
40% 65 182 344 490 712 908
30% 125 339 626 861 1217 1519
20% 236 577 1079 1472 2012 2349
10% 505 1080 1983 2698 3419 3823

Percentage of images that the hamming distance is bigger than 4

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0.07 0.28 0.41 0.52 0.71 0.95
80% 0.11 0.37 0.62 0.78 1.07 1.30
70% 0.17 0.43 0.80 1.14 1.71 2.30
60% 0.31 0.95 1.65 2.27 3.26 4.31
50% 0.40 1.30 2.31 3.29 4.80 6.57
40% 0.87 2.43 4.60 6.56 9.53 12.15
30% 1.67 4.54 8.37 11.52 16.28 20.32
20% 3.16 7.72 14.43 19.69 26.92 31.42
10% 6.76 14.45 26.53 36.09 45.74 51.14

Image count that the hamming distance is bigger than 6

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0 1 1 1 2 4
80% 2 3 7 10 12 17
70% 4 6 14 20 27 38
60% 6 13 23 35 50 76
50% 11 22 42 59 83 129
40% 19 58 99 140 207 297
30% 27 102 195 286 425 577
20% 79 227 441 612 896 1091
10% 249 579 1064 1475 1907 2159

Percentage of images that the hamming distance is bigger than 6

  5000 < 4000 < 3000 < 2000 < 1000 < < 1000
90% 0.00 0.01 0.01 0.01 0.03 0.05
80% 0.03 0.04 0.09 0.13 0.16 0.23
70% 0.05 0.08 0.19 0.27 0.36 0.51
60% 0.08 0.17 0.31 0.47 0.67 1.02
50% 0.15 0.29 0.56 0.79 1.11 1.73
40% 0.25 0.78 1.32 1.87 2.77 3.97
30% 0.36 1.36 2.61 3.83 5.69 7.72
20% 1.06 3.04 5.90 8.19 11.99 14.60
10% 3.33 7.75 14.23 19.73 25.51 28.88

 
 

Conclusion

The result shows when the image is resized, there could be some images that are cannot detected. Possible solution is resizing the image to a certain size when the image is bigger than the size before hashing. I tested when the size is 2000, 1500, and 1000 width.

64bit unsigned long long type transfer between Javascript and C++ Daemon

Currently, APIs to add and match image license get a pHash value that are extracted from image. This hash value is 64bit binary. For the fast processing, database and C++ daemon used it as unsigned long long type. However, recently, while Anna is developing Javascript pHash module, there was a problem. When Javascript calculation print the output hash value, last 4 or 5 characters were wrong values. That was because maximum value of number in javascript was 2^53.

  • Max value of integer in Javascript :
    2^53 : 9007199254740992 : 0x20000000000000

  • Max value of unsigned long long :
    2^64 : 18446744073709551615 : 0xFFFFFFFFFFFFFFFF

There are two solutions:

  1. Using Big integer library like http://silentmatt.com/biginteger/
  2. Using Hexadecimal String for output

First solution has a benefit : another modules do not have to be changed. Second solution’s benefit is that doesn’t need additional Javascript library.
We decided to use solution 2, because

  1. hash value is used only to be sent to php API page
  2. do not need calculation
  3. later, when another hash algorithm is used, it can be much longer
  4. when additional Javascript library is used, client implementation will be slower.

After adopting this solution, following modules are affected.

  • javascript : added code to change from binary string to hexadecimal string

  • phash : hash generator from image
    I changed the code from generating integer string to generating hexadecimal string.

//printf("%llun", tmphash);
printf("%016llXn", tmphash);
  • hamming : hamming distance calculator from two hash values
    I changed it to get hexadecimal string :
//    ulong64 hash1 = strtoull(argv[1], NULL, 10);
//    ulong64 hash2 = strtoull(argv[2], NULL, 10);
    ulong64 hash1 = strtoull(argv[1], NULL, 16);
    ulong64 hash2 = strtoull(argv[2], NULL, 16);
  • regdaemon : C++ daemon
    I changed add/match command so it gets hexadecimal string.
//uint64_t uiHash = std::stoull(strHash);
uint64_t uiHash = std::stoull(strHash, 0, 16);

php API doesn’t have to changed because it bypasses by base64 encoding.

For MySQL database field, we decided to keep 64bit unsigned integer type for DCT hash value. That is because this way doesn’t need to be changed from string type to number type to load on the memory for indexing.

CC Image License Search Engine API Implementation

Picture

CC - New Page

UI

Previously, my colleague Anna made a page that search similar images by uploading or from the link. This UI page can be either inside the server or outside the sever. It uses only PHP API without accessing Database directly.
 

PHP API

This is open API that have functions of Adding, Deleting, and Matching image. It can be accessed by anyone who want this function. UI page or client implementation such as browser extension uses this API. The matching result is JSON format.
This API page Add/Delete/Match by asking “C++ Daemon” without changing Database.
Only for read-only access to the Database will be permitted.
 

C++ Daemon

All adding/deleting operation will be done in this daemon. By doing so, we can remove the problem of synchronization between database and index for matching. That is because this daemon will have content index on the memory all the time for fast matching.
Because this daemon is active all the time, to get the request and give result to “PHP API”, it works as domain socket server. PHP API will request using domain socket.
 

MySQL

Database contains all metadatas about CC license images and thumbnail path that are used to show as a preview in the matching result.

Pastec analysis

Pastec works as following order :

  1. Load visual words : visualWordsORB.dat file contains it, the size is 32,000,000 bytes. Loading the file takes around 1 seconds.
  2. Building the word index : using the visual words, builds word index; it takes around 13 seconds.
  3. Now previously saved index file can be loaded, or an image can be added to the index.
  4. Using an image file, similar image file that contains similar word indexes can be searched.
  5. Index in the memory can be written to a file

Adding new image to the index works as following order :

  1. Using OpenCV, ORB features are extracted.
  2. Matching visual words are searched.
  3. Matching visual words are indexed on the memory

When I added 900 images, the size of index file was 16,967,440 bytes.

By changing source code, I saved matching visual word list to the text file for each images. Each word matching stored using this struct :

struct HitForward
{
    u_int32_t i_wordId;
    u_int32_t i_imageId;
    u_int16_t i_angle;
    u_int16_t x;
    u_int16_t y;
};

Each word matching has word id, image id, angle, and x/y coordination. Saved file looks like this (order of ImageID,Angle,x,y,WordId) :

469,55772,417,111,99042
469,46096,424,453,261282
469,4246,866,265,40072
469,44288,855,295,635378
469,59150,735,268,28827
469,12526,529,112,139341
469,12513,500,39,172187
469,48546,615,59,288827

It contains 1593 lines, which means it has 1593 matching words. Image id 469 was Jánské.jpg and the image looks like this :
Jánské
The size of this image is 12.8 mb. Like other HDR images, it contains lots of features. Also it has biggest number of matching words among 900 images. When the data was written to the text file, the size was 39,173 bytes, it would be the worst case. When the image is simple, only few words are matched. Full size of matching word text files of 900 images was 20.9 mb.

To reduce it, I made a simple binary format. Since the image id is the same for an image, I wrote it once, and it is followed by 4 bytes count. Then every word is written as 4 bytes word id, 2 bytes angle, 2 bytes x, and 2 bytes y.

4 bytes - id
4 bytes - count
4,2,2,2 (10 bytes) *  count

In case of id 469 image, the size is 11,238 bytes. And the file looks like this :

00000000: d501 0000 3906 0000 e282 0100 dcd9 a101  ....9...........
00000010: 6f00 a2fc 0300 10b4 a801 c501 889c 0000  o...............
00000020: 9610 6203 0901 f2b1 0900 00ad 5703 2701  ..b.........W.'.
00000030: 9b70 0000 0ee7 df02 0c01 4d20 0200 ee30  .p........M ...0
00000040: 1102 7000 9ba0 0200 e130 f401 2700 3b68  ..p......0..'.;h
00000050: 0400 a2bd 6702 3b00 b094 0800 c64c 5f02  ....g.;......L_.

0x1d5 is 469 and 0x639 is 1593.
In this case, the size was 15938 bytes, which was 15 kb, around 34% of text format (39 kb).
Since this image is the worst case, storing all binary index to database for all image record is realistic.
Full size of all 900 images was 8.5 mb. (text file was 20.9 mb)
Interestingly, it is smaller than index file for 900 images (16.2 mb)

Conclusion

I was thinking of saving index file. However, saving word list for each image will be the better solution because when it is binary format, it consumes less storage and adding it to the index is very fast. Also, when it is stored as a database field, synchronization between index and database is not a problem.

Making sample images using ‘convert’ utility

In the previous posting, I made sample images using Java. ‘convert’ utility gives powerful image processing functionalities.
‘convert’ is part of ImageMagick. Using ‘convert’ and bash script, we can make sample image easily.

A-Z a-z 0-9 image

for L in {A..Z} {a..z} {0..9} ; do convert -size 80x50 xc:white -font /usr/share/fonts/truetype/msttcorefonts/arial.ttf -pointsize 50 -fill black -gravity center -annotate 0x0+0+0 "$L" "$L.jpg" ; done

Screenshot from 2015-05-12 13:20:47

The same set with italic style

for L in {A..Z} {a..z} {0..9} ; do convert -size 80x50 xc:white -font /usr/share/fonts/truetype/msttcorefonts/arial.ttf -pointsize 50 -fill black -gravity center -annotate 0x30+0+0 "$L" "$L.jpg" ; done

Screenshot from 2015-05-12 13:20:11

The same set with rotation

for L in {A..Z} {a..z} {0..9} ; do convert -size 80x50 xc:white -font /usr/share/fonts/truetype/msttcorefonts/arial.ttf -pointsize 50 -fill black -gravity center -annotate 45x45+0+0 "$L" "$L.jpg" ; done

Screenshot from 2015-05-12 13:21:32

cc_xmp_tag Android/Perl Implementation Compatibility

In the last posting, I was developing and testing Java implementation of Creative Commons license tagging and reading library for Android. In this posting, changes since then and explanation of current version will be covered.

Source Repository

SCM : https://github.com/creativecommons/seneca

Java

In the git repository, cc-xmp-tag/java is Java implementation for Android.
com directory is Adobe XMP Toolkit for Java version 5.1.0 source code.
pixy directory is the most recent version of PixyMeta Android. Recently Wen updated for png and gif format. Current version is merged version.

Perl

Andrew Smith wrote Perl script that uses ExifTool.
cc-xmp-tag/perl/cc-xmp-tag.pl file is the script.

Compatibility test

All test worked for an image that doesn’t have xmp tag and an image that already has xmp tag. The later’s case, keeping the other and add only CC license tag is important. Test result for both were the same.

Write in CCXMPTag -> Read in CCXMPTag

  • For all supported image format(jpg, png, gif, tif), works well.

Write in Android CCXMPTag -> Read in cc-xmp-tag.pl

  • worked for all formats.

Write in cc-xmp-tag.pl -> Read in Android CCXMPTag

  • worked for all formats.

Write in Android CCXMPTag -> Read in XnViewMP

  • worked for jpg, png, tif
  • doesn’t show the information in gif

Write in cc-xmp-tag.pl -> Read in XnViewMP

  • worked for jpg, png, tif
  • doesn’t show the information in gif

Write in Android CCXMPTag -> Read in an online metadata viewer (http://regex.info/exif.cgi)

  • worked for all formats

Write in cc-xmp-tag.pl -> Read in an online metadata viewer (http://regex.info/exif.cgi)

  • worked for all formats

Write in Android CCXMPTag -> Read in on-line metadata viewer (http://metapicz.com/#landing)

  • worked for all formats

Write in cc-xmp-tag.pl -> Read in on-line metadata viewer (http://metapicz.com/#landing)

  • worked for jpg, png, tif
  • doesn’t work for gif

Conclusion

  • Between CCXMPTag and cc-xmp-tag.pl, worked well for 4 image formats.
  • XnViewMP doesn’t show XMP in gif format. This is weird because XnViewMP uses ExifTool.
  • Online metadata viewer http://regex.info worked well with both Java and Perl implementation.
  • Another online metadata viewer http://metapicz.com worked well with Java implementation, but it didn’t read XMP in gif written by Perl version.