A technique to using php to deliver htdig search engine services

This page is an htdig contribution will show you a way to set up htdig so that you can use it with php. ht://dig (www.htdig.org) is a popular server side tool to index websites so that they can be searched without ads. If you are hosting with us, then this is much simpler, and please folow the instructions at our htdig with webhosting information page.

Special mention should be made to Colin Viebrock who wrote an excellent article on how to do this with php version 3 listed at devhed.com (and not on his own excellent site ???).

Step 1: get the source code and install it

You can get the htdig source code right from ht://dig, but if you run a popular Linux distribution like Red Hat or Suse (or Debian), you can just get the binary installation file right off the Internet. The file you want is the latest "rpm", for Red Hat or Suse, such as htdig-3.2.0-2.011302.i386.rpm. Please find the latest rpm for your distribution as the version mentioned here may be out of date.

To install this, just download the rpm, log on as root and type "rpm -Uvh htdig-3.2.0-2.011302.i386.rpm". That's it. THIS is the only time you need to be root, so your ISP should be able to do this for you without much pain on their part if they are running U*NIX.

Step 2: set up htdig

You will need three directories to store information:

Of course you can use your own files, but my cool object oriented code inspired by Colin will not work without the exact text versions of the results files above.

Step 3: Make a php object

Ok this is the fun part hackers (note: not crackers incorrectly described as hackers by the press)!

The object:

<?php

/*
    Copyright (C) 2005 Computer Engineering Inc. All rights reserved
    Licence: Gnu Public Licence as shown at http://www.gnu.org/licenses/gpl.html
    2005/01/12 create Blair Lowe
    This class abstracts an htdig query and displaying of results.
    See http://www.htdig.org for htdig information.
*/

class htdig {
    
    var $u_result;
    var $u_query;

//	************************************************************

    function htdig() {

        define( HTSEARCH_PROG, "/usr/bin/htsearch") ;
        unset ($this->u_result );
        unset ($this->u_query);

    } // end of function htdig ()

    function htsearch(
          $p_query
        , $p_format = ""
        , $p_config = ""
        , $p_debug = FALSE
    ) {
        // Public function searches for words, and returns the html results.
        //
        // Results are also stored in the class variable u_results so that the 
        // display_results function can display the results without any
        // variable passing
        //
        $l_words = EscapeShellCmd( UrlEncode( $p_query ) );
        $l_query = "format=$p_format&words=$l_words";
        $l_command = HTSEARCH_PROG . " -c $p_config \"$l_query\"";
        exec( $l_command, $this->u_result );
        $this->u_query = $l_words;
        return( $this->u_result );
    } // end of function htsearch ()

    function display_results( $p_debug = FALSE )
    {
        //
        // Public function displays the results from the public class variable 
        // $this->u_result
        //
        if ( $p_debug ) {
            foreach ( $this->u_result as $l_value ) {
                echo "<p>DEBUG this->u_result[n] = $l_value </p>\n";
            }
        }
        $l_rowcount = count($this->u_result);
        if ( $p_debug ) {
            echo "<p>DEBUG l_rowcount = $l_rowcount</p>\n";
            echo "<p>DEBUG this->u_result[2] = $this->u_result[2]</p>\n";
        }
        if ( $l_rowcount < 3 ) {
            // If the search produced an error, we'd get no results or a one-line error
            echo "There was an error executing this query.  Please try later.<br />\n";
            return -2;
        } elseif ( $this->u_result[2] == "NOMATCH" ) {
            //echo "DEBUG nomatch<br />\n";
            // We know to look for "NOMATCH" because that's the string in results-nomatch.html).
            echo "There were no matches for \"<B>" . $this->u_query . "</B>\" found on the website.<P>\n";
            return 0;
        } elseif ( $this->u_result[2]=="SYNTAXERROR" ) {
            // Similarly, we can check for a boolean syntax error:
            echo "There is a syntax error in your search for \"<b>$f_search</b>\":<br>";
            echo "<PRE>" . $this->u_result[3] . "</PRE>\n";
            return -1;
        } else {
            // we have at least one match
            $l_count = 6;
            /*while( $l_count < $l_rowcount - 3 ) {
                echo $this->u_result[ $l_count ] . "\n";
                $l_count++;
            }
            echo "<img src=\"/htdig/htdig.gif\" border=\"0\" alt=\"\">";*/
            while( $l_count < $l_rowcount ) {

                # grab the match information

                $l_title = $this->u_result[ $l_count ];
                $l_url = $this->u_result[ $l_count + 1 ];
                $l_stars = $this->u_result[ $l_count + 2 ];
                $l_percent = $this->u_result[ $l_count + 3 ];
                $l_excerpt = $this->u_result[ $l_count + 4 ];

                # Disguise the private bits !

                $l_excerpt = str_replace( "@", " *at* ", $l_excerpt );

                $l_excerpt = str_replace( ".com", " *dot* com", $l_excerpt );
                $l_excerpt = str_replace( ".ca", " *dot* ca", $l_excerpt );
                $l_excerpt = str_replace( ".net", " *dot* net", $l_excerpt );
                $l_excerpt = str_replace( ".org", " *dot* org", $l_excerpt );
                $l_excerpt = str_replace( ".info", " *dot* info", $l_excerpt );
                $l_excerpt = str_replace( ".biz", " *dot* biz", $l_excerpt );
                $l_excerpt = str_replace( ".us", " *dot* us", $l_excerpt );

                # output the match information

                echo "<h3><a href=\"" . $l_url . "\">" . $l_title .  "</a></h3>";
                echo "(" . $l_percent . "% match) " . $l_stars . "<br />\n";
                echo "<blockquote>" . $l_excerpt . "</blockquote>\n";

                # move to the next match

                $l_count =  $l_count + 5;
            }
        }
        return 0;
        
    } // end of function display_results

} // End class.htdig

?>

Put the code into a file like "class.htdig.php" possibly in your "inc" directory just below the web root.

Step 4: run htdig

Basically you need to run

/usr/bin/rundig -v -s -c ~/conf/htdig.conf

... once (you may want to redirect the output and errors to a file).

Everytime you add or change a webpage, you should refresh the database with

/usr/bin/rundig -v -s -c -a ~/conf/htdig.conf .

This creates separate "work" files that need to be merged with the original database that was created. Colin wrote a good article on this and it can be found at google.

You can also rerun the rundig command from scratch, but the people on the website will not be able to search when you are re-indexing:(

Step 5: create your search page

Now is the best part: creating your search page. Here is a sample section of an php page that I wrote to do this (remember to have a ".php" suffix on your web page):

<!-- start htdig section -->
<?php
    foreach ( $_POST as $key => $value ) {
        $$key = trim( $value );
    }
    if ( isset( $f_search_submit ) && ! empty( $f_search_submit) ) {
        include( "class.htdig.php" );
        $l_htdig = new htdig;
        $l_htdig->htsearch( $f_search,"","/home//conf/htdig.conf" );
    }
?>
<h1>htdig Website Hosting Search</h1>
<form method="post" action="<?php echo $PHP_SELF; ?>">
    Search for: <input type="text" name="f_search" SIZE=30 value="<?php echo $f_search; ?>">
    <input type="submit" name="f_search_submit" value="Search">
</form>
<?php
    if ( isset($f_search_submit) && ! empty ( $f_search_submit) ) {
        $l_result_code = $l_htdig->display_results( );
    }
?>
<!-- end htdig section -->

Step 6: questions?

You can find some great help at the htdig web site: http://www.htdig.org/mailarchive.html . If you question is not in the archives, then just join a mailing list, and ask away: someone will very likely help you.


For more information about our products and services, please phone us (see number up top in banner) (toll free within North America).

Copyright © 2004-2014 Computer Engineering Inc. All rights reserved.
site map | links | search | add your URL - link exchange
Get our small business packaging including card design, logo, and simple website free domain name with web hosting Canadian php5 web hosting - Edmonton, Calgary, Alberta, Toronto, Ontario Canadian Windows 2008 ASP web hosting - Edmonton, Calgary, Alberta, Toronto, Ontario Stop Spam Here  Use OpenOffice.org