MH Hash, MVP-Tree indexer/searcher for MySQL/PHP

Current development server works on the LAMP stack. Anna is working on Creative Commons Image crawler and User Interface using PHP/MySQL. For the prototype that works with the PHP UI code and MySQL database, I made an Indexer and Searcher.

Database

The database contains lot’s of records that contains image url, license, and hash values. And that is make by crawler written in PHP.

Indexer

Source code :
https://github.com/CreativeCommons-Seneca/registry/blob/master/indexer/mhindexer.cpp

Description :

$ ./mhindexer
Usage :
     mhindexer hostName userName password schema table key value treeFilename
     hostName : mysql hostname
     userName : mysql username
     password : mysql password
     schema : db name
     table : table name
     key : image id field name in the table
     value : hash field name in the table
     treeFilename : mvp tree file name
Output :
     treeFilename,datapointCount,elapsedSeconds

The program takes MySQL connection informations : hostname, username, password. And the database information : schema, table, key, value. After connecting using the information, it reads all ‘key’ and ‘value’ fields from the ‘table’. ‘key’ is used as a unique key that points the db record that contains image information : filename, url, hash value, etc. ‘value’ is a hash value that is used to calculate hamming distance.

After connecting to the database, program reads all records that contains hash values. And makes add them to MVP-tree. When the tree is built, it is written to the ‘treeFilename’ file.

I made simple bash script that run mhindexer with parameters. output is :

$ ./mhindexer.sh
tree.mh,784,0.035845

From the hashes in the database, the tree is written to tree.mh and there are 784 nodes and it took 0.035845 seconds.

Searcher

Source code :
https://github.com/CreativeCommons-Seneca/registry/blob/master/indexer/mhsearcher.cpp

Description :

Usage :
    mhsearcher treeFilename imageFilename radius
    eg : mhsearcher tree.mh ./test.jpg 0.0005
output : 0-success, 1-failed
    success : 0,count,id,id,id,...
      eg : 0,2,101,9801 
    failed : 1,error string
      eg : 1,MVP Error

For now, searcher reads the tree file(treeFilename) to generate tree structure, and extracts MH hash from input file(imageFilename), then search the hash value in the tree using ‘radius’.

Output is used by php script. When the first field divided by comma is 0, there is no error and the result is meaningful. Second field is count of detected hashes. And following fields are ids of hashes. Using the ids, php script can get image information from the database.
When the first field is 1, following field is the error message.

To test it, I randomly chose an image that is in the database.
Example output is :

$ ./mhsearcher tree.mh WTW_Nov_2013_Tumanako_023.JPG 0.001
0,0,0.001,8,0.000029
$ ./mhsearcher tree.mh WTW_Nov_2013_Tumanako_023.JPG 0.1
0,0,0.1,778,0.000648
$ ./mhsearcher tree.mh WTW_Nov_2013_Tumanako_023.JPG 0.2
0,1,60,0.2,784,0.000657
$ ./mhsearcher tree.mh WTW_Nov_2013_Tumanako_023.JPG 0.3
0,1,60,0.3,784,0.000658
$ ./mhsearcher tree.mh WTW_Nov_2013_Tumanako_023.JPG 0.44
0,5,539,60,380,188,371,0.44,784,0.000672

For the performance statistics purpose, I added radius, calculation count and extraction time at the end of the result.
In this image’s case, when the radius was 0.2, matching image was found. And when the radius was 0.44, there was 5 results.

Conclusion

  • This utilities works well with MySQL and PHP.
  • Because of the characteristics of tree search algorithm, repeated search from the radius of 0.001 to 0.5 inside the searcher can be done to get the fast and reliable result.
  • Later, indexer and searcher can be changed to linux daemon process to maintain the tree in the memory for fast searching.
  • When the amount of database record is enormous(millions ~ billions), the tree can be divided to several sections in the database.
Advertisements

One thought on “MH Hash, MVP-Tree indexer/searcher for MySQL/PHP

  1. Hi, can u please explain how to compile mhindexer and mhsearcher. I am noob in C and C++, i installed all requered libs, like pHash and mysql-connector-c++ but i am failed when try to compile mhindexer and mhsearcher.

    I try to compile it like this:
    g++ mhindexer.cpp -o mhindexer.o
    and
    g++ mhsearcher.cpp -o mhsearcher.o

    I got his errors:
    /tmp/ccfuEnDm.o: In function `distance(mvp_datapoint_t*, mvp_datapoint_t*)’:
    mhindexer.cpp:(.text+0x4c): undefined reference to `ph_hammingdistance2′
    /tmp/ccfuEnDm.o: In function `makePoint(char const*, unsigned char const*, unsigned int)’:
    mhindexer.cpp:(.text+0x85): undefined reference to `dp_alloc’
    /tmp/ccfuEnDm.o: In function `main’:
    mhindexer.cpp:(.text+0x3d7): undefined reference to `get_driver_instance’
    mhindexer.cpp:(.text+0x586): undefined reference to `mvptree_alloc’
    mhindexer.cpp:(.text+0x80f): undefined reference to `mvptree_add’
    mhindexer.cpp:(.text+0x83c): undefined reference to `mvp_errstr’
    mhindexer.cpp:(.text+0x929): undefined reference to `mvptree_write’
    mhindexer.cpp:(.text+0x956): undefined reference to `mvp_errstr’
    mhindexer.cpp:(.text+0xb1b): undefined reference to `mvptree_clear’
    collect2: error: ld returned 1 exit status

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s