Dynamic PHP Search Engine (Part 1)

Related Articles

Page 2

I started working with PHP 9 months ago. I used to read many articles in Internet that gave me better understanding on PHP. I started developing software for “Online Journals” that has the capability of searching document’s contents. You can find many articles in Internet that can perform keyword title and author search. This article gives you a brief idea of Content-Based Search.

What is Document search?

In a document based search every word in the document is parsed (read) and matched with the search words. Results are displayed based on the matches found.

Reading every word of the article matching it with the search word over thousands or even lakhs of documents is very difficult task. Also by default, PHP is configured to run maximum 30 seconds.

Prerequisites:

To understand this article, you should have a fair knowledge of PHP. To run examples given in your machine, you need APACHE, PHP, and MYSQL software installed and configured. I used PHP Version 4.3.1 and MYSQL 2.2.3.

Building Database:

The database consists of three tables. Wiz. Content Table, Keyword Table, Link Table. Content table holds article’s title, and abstract. Keyword table holds keyword. Keyword field is indexed. Link table holds keyword id, content id.


The SQL Statement for creating these three tables are shown below.

Content Table:
CREATE TABLE content (
contid mediumint(9) NOT NULL auto_increment,
title text,
abstract longtext,
PRIMARY KEY (contid) ) TYPE=MyISAM;

Keyword Table:
CREATE TABLE keytable (
keyid mediumint NOT NULL auto_increment,
keyword varchar(100) default NULL,
PRIMARY KEY (keyid),
KEY keyword (keyword) ) TYPE=MyISAM;

Link Table:
CREATE TABLE link (
keyid mediumint NOT NULL,
contid mediumint NOT NULL)
TYPE=MyISAM


Preparing Database:

An input interface with HTML form is created to enter title and content. After filling and hitting enter, the title and the abstract is stored in the content table. The generated new content id is stored in a variable temporarily. In the next step and ‘Upload Engine’ that parses each word in the abstract and process the whole text. It removes common words like is, was, and, if, so, else, then etc. Then stores each word in wordmap array. See that every word has only one entry in the wordmap array.

For every word in the wordmap array, keyword table is parsed and math is found. If there is a match, the generated key id, and content id generated id earlier is stored in the link table. Else, the new keyword is inserted in the keyword table and with the generated keyword table and content id the link table is updated. And thus we finished preparing our database.

The code snippet given below explains every step of the program.

Searching keyword table for every word is a long process. This also reduces the efficiency of the program. To implement this all the keywords in the keyword table is stored in an associative array $allWords. An associative array is one, which works on B-Tree algorithm and very useful to perform searches. Here is the function.

Function LoadCurrenrWords(){
global $allWords;

$result = mysql_query( "select keyid, keyword from keytable" ) or die( "Error in executing mysql query" );

while ( $row = mysql_fetch_array($result) ) {
$allWords[$row[‘keyword’]] = $row[‘keyid’];
}
}

Common Words:

$COMMON_WORDS is an associative array that stores an array of words, which are commonly used in English Language. These words have to be removed while parsing the file.
$COMMON_WORDS=array(“a”=>1, “as”=>1);

You can add as many common words as you like. See source code for full list of common words.

ExtractWords() Function:

This function filters words by allowing only alphabetic characters. To implement this, I used a technique called STATE MACHINE that filters the characters.


Alphabetic characters are taken as STATE1 and other characters (Numeric and Special Characters) as STATE0. Initially the machine will be in the STATE0. While parsing letters, it encounters alphabetic characters, the machine switches to STATE1 else it will remain in the same state. As a result we get a word with only alphabetic characters.

function ExtractWords($text){
$STATE0 = 0; //Numeric / Other Characters
$STATE1= 1; //Alpha Characters
$state = $ STATE0;

$wordList = array();
$curWord = "";

for ( $i = 0; $i < strlen($text); ++$i ) {
$ch = $text{$i};
$isAlpha = ctype_alpha( $ch );

if ( $state == $STATE0) {
if ( $isAlpha ) {
$curWord = $ch;
$state = $STATE1;
}
}
else if ( $state == $STATE1) {
if ( $isAlpha ) {
$curWord .= $ch;
}
else {
$wordList[] = strtolower( $curWord );
$state = $ STATE0;
}
}
}

if ( $state == $ STATE1) {
$wordList[] = strtolower( $curWord );
}

return $wordList;
}

As a result we get a list of words stored in an array returned to the called function.

Example:

Example 1: “mmuraleedhar@hotmail.com”

The state machine will return an array of three words, wiz. mmraleedhar, hotmail, com.

Example 2: “The ExtractWords($text) function will return a array of pure alphabetic words.”
The state machine will return an array of ExtractWords, text, function, return, array, pure, alphabetic, words. The words will, an, of, a are removed as they are common words

FilterCommonAndDuplicateWords() Function:

This function is called after ExtractWords() function. This parses filtered words removes common words like ‘a’,’is’,’was’,’and’…. Other words are taken as valid words, remove duplicate among them and then stored in an associative array $wordMap and this array is returned to the called function.

function FilterCommonAndDuplicateWords( $wordList ) {
global $COMMON_WORDS;
global $MAX_WORD_LENGTH;

$wordMap = array();

foreach ( $wordList as $word ) {
$len = strlen( $word );
if ( ($len > 1) && ($len < $MAX_WORD_LENGTH) ) {
if ( !$wordMap[$word] ) {
if ( !$COMMON_WORDS[$word] ) {
$wordMap[$word] = 1;
}
}
}
}

Process Form function():

This is the core part of the upload program. After finishing filtering, removing common words and duplicate words, this function is called. First this function inserts the title and abstract in the content table. The newly generated content id stored in $contentId. Then it updates keyword and link table.

For every word in the $wordMap array, if the word is already exists in keyword table, it inserts the key id, content id in to link table. Conversely, if the word is not found, it inserts the new word in keyword talble, the generated new key id is stored in $keyId. Then it updates link table by inserting key id content id in link table.

function ProcessForm($title ,$body){

global $allWords;

$tempWordList = ExtractWords( $body );
$wordList = FilterCommonAndDuplicateWords($tempWordList);

// insert into content
mysql_query( sprintf( "INSERT INTO content (title, abstract) VALUES ('%s', '%s')",
mysql_escape_string($title), mysql_escape_string($body) ) );

//store the newly generated content id in $contentId
$contentId = mysql_insert_id();

// insert all the new words and links
while(list($word,$val)=each($wordList)) {
$keyId = "";
if ( !$allWords[$word] ) {
mysql_query( sprintf( "INSERT INTO keytable ( keyword ) VALUES ( '%s' )",
mysql_escape_string($word) ) );

$keyId = mysql_insert_id();
$allWords[$word] = $keyId;
}
else {
$keyId = $allWords[$word];
}

// insert the link
mysql_query( sprintf( "INSERT INTO link (keyid, contid) VALUES ( %d, %d )", $keyId, $contentId ) );
}
//End of Processing Form.

}
?>

The following code snippet is the starting place of execution, which calls all the above functions. Here it connects to database server and database. Initially form() function is called that allows you to enter the title and abstract of the document.

if($submit){

global $allWords;

mysql_connect( "localhost", "root", "" ) or die( "Unable to connect to database" );
mysql_select_db( "kpp" ) or die( "Unable to select database" );

LoadCurrentWords();

if ( $title and $body){
ProcessForm($title ,$body);
}

}else{ //end of main
$err="Please fill in the fields to uploadn";
form($err);
}

function form($errmsg)
{ ?>
<h4 align="center">File Parser & Uploader</h4>
<b><? echo $errmsg; ?></b>
<center>
<form method="POST" action=<? echo $PHP_SELF ?>>
Title: <input type="text" name="title" ><p>
Abstract: <input type="text" name="body" ><p>
<input type="submit" name="submit" value="Start Parsing and Upload Content">
</table>
</form>

</center>
<?
}

?>

Search Engine:

PHP script is written that makes it possible to query the database through a HTML form. This will work as any other search engine: the user enters a word in a textbox, hits enter, and the interface presents a result page with links to the pages which contains the word that is searched for.

In this example, the results are displayed the order in which the pages are presented is selected by the number of search words appeared in each document.

Declare an associative array $CommonWords that contains common words like ‘is’, ‘in’, ‘was’ etc.

First convert all the search words in to lower case.

$search_keywords=strtolower(trim($keywords));

Next, we have to perform an explode operation on search words that will store each search word in an array. The code is shown here.

$arrWords = explode(" ", $search_keywords);

Next, remove duplicate words in $arrWords.

$arrWords = array_unique($arrWords);

In a search operation, first we have to remove the common words like ‘is’, ‘in’, ‘was’ … This refines our search criteria. To implement this we store common words in an associative array $CommonWords.

Next, remove common words in the search words. Search words are stored in $searchWords and common words are stored in $junkWords Here is the code.

$searchWords=array();
$junkWords=array();
foreach($arrWords as $word)
//remove common words
if(!$CommonWords[$word]){
$searchWords[]=$word;
}else{
$junkWords[]=$word;
}

We can display results in two ways.
Type 1: Display the document if all the search words present in the document
Type 2: Display the document if any one of the search words is present.

If you want to perform the Type 1 operation, include the following code snippet in to your program:
//count no of words in the search words and store in a variable
$noofSearchWords=count($searchWords);

$noofSearchWords stores the number of search words. Later after searching search words in key word table we get results. There we can perform logical AND operation that will display our desired results. If $noofSearchWords is equal to number of records, the next part of the program gets executed. Else “NO SEARCH RESULT FOUND” is displayed.

In the next step, we have to search for words in $searchWords array in the keyword table. The following code snippet will return you a list of keyids that matched query.

//implode to an array
$arrWords = implode("' OR keyword='", $searchWords);

//get the key ids from the key table
$query = "select * from keytable where keyword='$arrWords'";

$kResult = mysql_query($query);

As discussed earlier, if you need to perform Type 1 operation, you have check whether the number of search words and number of records in query. If they are equal, you can proceed to the next step else display search result not found. Here is the code.

if(mysql_num_rows($kResult) == $noofSearchWords){

//search for the keyids in the link table and get the content id
//Fetch title, first 200 words of the abstract in to an array
//Display the result
}else{
echo “NO SEARCH RESULT FOUND”;
}


The following code searches the link table for occurrences key ids. This will return an array that contains the content ids.

while($kRow=mysql_fetch_array($kResult))
{
//get the link ids for each key id
$kid= $kRow['keyid'];
$query = "SELECT * FROM link WHERE keyid=$kid";
$lResult = mysql_query($query);
//echo mysql_num_rows($lResult);
while($lRow=mysql_fetch_array($lResult))
{
$thisContentId=$lRow["contid"];
if(!$contArray[$thisContentId]){
$contArray[$thisContentId]=1;
}else{
$contArray[$thisContentId]++;
}
}
}//end of while
Sort the array in descending order of the key value. This will order from highest occurrences to the lowest. For example, if the number of search words is four, the order is displayed 4 then 3 then 2 and last 1.

//Sort array in descending order of the key value
arsort($contArray,SORT_DESC);

In the next step we have to fetch title, first 200 words in content table in to an array $FoundRef.
//declare an array to store the results
$FoundRef=array();

while(list($contentId,$occurances)=each($contArray)){

$aQuery = "select contid,title,left(abstract,200) as summary from content where contid = " . $contentId;
$aResult = mysql_query($aQuery);

if(mysql_num_rows($aResult) > 0){
$aRow = mysql_fetch_array($aResult);
$FoundRef[] = array (
"contid" => $aRow["contid"],
"title" => $aRow["title"],
"summary" => $aRow["summary"],
"occurance"=>$occurances
);
}//end of if
}

Finally we have to display the results in the browser. Here is the code.

if(isset($FoundRef))
{
echo "<table width="100%"><tr><th class="title">Search Result</td></tr></table>";
echo "<a href="#" onclick="history.back()">Back</a>";
echo "<br>";
echo sizeof($FoundRef);
echo (sizeof($FoundRef) == 1 ? " reference" : " references");
echo " found";
echo "<p>";
if($junkWords){
echo "Common words like";
foreach($junkWords as $jWords){
echo "&nbsp"."'".$jWords."'";
}
echo "are removed from the search string";
}
echo "</h5>";
foreach($FoundRef as $a => $value)
{
echo "<table>";
echo "<tr><td valign="top">";
// echo $FoundRef[$a]["contid"];
?>

<a href=showref.php?refid=<? echo $FoundRef[$a]["contid"]?>><emp><b><? echo $FoundRef[$a]["title"]?></b></emp></a><div align="right"> Occurance(s): <? echo $FoundRef[$a]["occurance"] ?></div>

<br><small><? echo $FoundRef[$a]["summary"] ?>...</small><br><br>
<? echo "</td></tr>";
}?>


<?
echo "</table>";
}//end of isset FoundRef

The HTML page to get input from user is given below.

<html>
<head>
<title>Search Engine</title>
<style type="text/css">

body{ font-size:20; font-weight:bold; font-stretch:semi-expand; font-family:MSserif; color:#0066CC; background-color:#EEEEE4;
align:center; background-color:white }
h4{ background-color:#0066CC; color:#FFFFFF; font-family:verdana; }
h3{ color:#0066CC; }
th{ background-color:#6996ED; color:#FFFFFF; font-family:Arial; }
a{text-decoration:none;}
</style>
</head>
<body>
<?php
if($submit)
{
if(!$keywords){
$errmsg="Sorry, Please fill in search field";
form($errmsg);
}else{
//Start Timer
$start = getmicrotime();

//PERFORM SEARCH OPERATION AND DISPLAY RESULT
}else {
//end Timer
$end = getmicrotime();

//TOTAL TIME TAKEN TO DO SEARCH OPERATION
$time_taken=(float)($end-$start);
$time_taken=number_format($time_taken,2,'.','');

echo "<p>Your Query Executed in $time_taken Seconds";

$errmsg="<p>No Search result found for '$keywords'";
echo $errmsg;
echo "<br><a href="#" onclick="history.back()">Back</a>";
}//endof isset ref
}//end of if key word exists
} else{ //display the form
form($keyword);
} //END OF FORM DISPLAY ?>
</body>
</html>
<?
function form($errmsg)
{ ?>
<h4 align="center">Search Engine</h4>
<b><? echo $errmsg; ?></b>
<center>
<form method=POST action=<? echo $PHP_SELF ?>>
</div>
Enter keywords to search on:
<input type="text" name="keywords" maxlength="100">
<input type="submit" name="submit" value="Search">
</form>
</body>
</html>
<?
}


function getmicrotime()
{
list($usec,$sec)=explode(" ",microtime());
return ((float)$usec+(float)$sec);
}
?>

Function getmicrotime() returns time in microseconds. This function is called during start and end of the search process.

Conclusion:

In this part 1, the search engine searches for the occurrence of words in the content. Part 2 is slightly modified such that when we upload the content, the number of occurrence of each word is stored in the link table. The search engine then ranks with the number of occurrence of each word in the document. For example, if the word ‘paging’ occurred 11 times, ‘programs’ occurred 21 times then the rank for the document is 11 + 21 = 32.

Source code:
Upload.php

<?
$MAX_WORD_LENGTH = 50;

//COMMON WORD LIST
$COMMON_WORDS = array("a"=>1,"as"=>1,"any"=>1,"all"=>1,"ate"=>1,"after"=>1,"am"=>1,"an"=>1,"and"=>1,"are"=>1,
"at"=>1,"away"=>1,"about"=>1,"ago"=>1,"almost"=>1,"along"=>1,"answer"=>1,"anybody"=>1,
"anywhere"=>1,"arent"=>1,"around"=>1,"ask"=>1,"also"=>1, "b"=>1,"be"=>1,"better"=>1,"black"=>1,
"brown"=>1,"but"=>1,"both"=>1,"bring"=>1,"because"=>1,"been"=>1,"before"=>1,"big"=>1, "blue"=>1,
"best"=>1,"by"=>1,"beg"=>1,"bad"=>1,"being"=>1,"best"=>1,"between"=>1,"based"=>1, "c"=>1,
"call"=>1,"can"=>1,"cut"=>1,"carry"=>1,"cold"=>1,"could"=>1,"clean"=>1,"cant"=>1,"come"=>1,
"couldnt"=>1, "consider"=>1,"called"=>1, "d"=>1,"did"=>1,"does"=>1,"do"=>1,"down"=>1,"dont"=>1,
"day"=>1,"didnt"=>1, "e"=>1,"eat"=>1,"every"=>1,"eve"=>1,"egg"=>1,"end"=>1,"eve"=>1,"era"=>1,
"eye"=>1,"each"=>1,"either"=>1,"else"=>1,"even"=>1, "ever"=>1,"every"=>1,"everybody"=>1,
"everyone"=>1, "f"=>1,"for"=>1,"from"=>1,"full"=>1,"found"=>1,"far"=>1,"fly"=>1,"fall"=>1,
"first"=>1,"fast"=>1,"five"=>1,"fall"=>1, "find"=>1,"four"=>1,"funny"=>1, "g"=>1,"go"=>1,
"get"=>1,"goes"=>1,"give"=>1,"gun"=>1,"good"=>1,"god"=>1,"give"=>1,"got"=>1,"green"=>1,
"grow"=>1,"good"=>1, "green"=>1,"grow"=>1,"got"=>1,"gave"=>1,"going"=>1,"gone"=>1,"given"=>1,
"h"=>1,"hi"=>1,"hoo"=>1,"he"=>1,"his"=>1,"him"=>1,"her"=>1,"has"=>1,"how"=>1,"hold"=>1,"how"=>1,
"hot"=>1,"had"=>1, "here"=>1,"help"=>1,"hurt"=>1,"have"=>1,"havet"=>1,"having"=>1,"hers"=>1,
"home"=>1,"home"=>1,"href"=>1, "i"=>1,"in"=>1,"is"=>1,"if"=>1,"its"=>1,"i"=>1,"it"=>1,"into"=>1,
"im"=>1,"ill"=>1,"id"=>1, "j"=>1,"just"=>1,"jump"=>1,"jet"=>1,"jaw"=>1,"jar"=>1,"jag"=>1,
"jam"=>1,"job"=>1,"jog"=>1,"joy"=>1,"jot"=>1, "k"=>1,"kind"=>1,"keep"=>1,"kiss"=>1,"kinder"=>1,
"kind"=>1,"kid"=>1,"key"=>1,"kit"=>1,"ken"=>1,"know"=>1, "l"=>1,"like"=>1,"little"=>1,"lust"=>1,
"led"=>1,"lap"=>1,"let"=>1,"live"=>1,"long"=>1,"live"=>1,"let"=>1,"look"=>1, "law"=>1,"leg"=>1,
"lie"=>1,"lid"=>1,"less"=>1,"look"=>1,"looking"=>1, "m"=>1,"my"=>1,"may"=>1,"me"=>1,"many"=>1,
"must"=>1,"much"=>1,"made"=>1,"my"=>1,"make"=>1,"met"=>1,"mix"=>1,"mom"=>1, "mud"=>1,"mug"=>1,
"mum"=>1,"myself"=>1,"more"=>1,"most"=>1,"max"=>1,"maximun"=>1, "n"=>1,"no"=>1,"nose"=>1,
"not"=>1,"new"=>1,"now"=>1,"nor"=>1,"nod"=>1,"now"=>1,"nil"=>1,"nib"=>1,"nut"=>1,"nun"=>1,
"never"=>1,"near"=>1,"news"=>1,"none"=>1,"nothing"=>1,"next"=>1, "o"=>1,"of"=>1,"on"=>1,"or"=>1,
"old"=>1,"open"=>1,"once"=>1,"only"=>1,"off"=>1,"our"=>1,"oops"=>1,"out"=>1,"oil"=>1,
"old"=>1,"oak"=>1,"oak"=>1,"ohm"=>1,"oho"=>1,"ore"=>1,"owl"=>1,"often"=>1,"other"=>1,"ours"=>1,
"out"=>1,"over"=>1,"one"=>1, "p"=>1,"play"=>1,"pull"=>1,"pretty"=>1,"put"=>1,"push"=>1,"pad"=>1,
"pop"=>1,"pan"=>1,"pap"=>1,"pay"=>1,"peg"=>1,"pet"=>1, "phi"=>1,"pie"=>1,"pig"=>1,"pet"=>1,
"pub"=>1,"pin"=>1,"pit"=>1,"ply"=>1,"pod"=>1,"pus"=>1,"page"=>1,"please"=>1, "q"=>1,
"question"=>1,"quick"=>1,"quest"=>1, "r"=>1,"ran"=>1,"red"=>1,"run"=>1,"ride"=>1,"read"=>1,
"rag"=>1,"rat"=>1,"ran"=>1,"ram"=>1,"red"=>1,"ray"=>1,"rev"=>1, "rid"=>1,"rib"=>1,"rig"=>1,
"rim"=>1,"rip"=>1,"rob"=>1,"rod"=>1,"roe"=>1,"row"=>1,"rum"=>1,"rug"=>1,"rut"=>1,"rather"=>1,
"recent"=>1, "s"=>1,"so"=>1,"some"=>1,"stop"=>1,"say"=>1,"sing"=>1,"say"=>1,"she"=>1,"stay"=>1,
"said"=>1,"start"=>1,"soon"=>1,"six"=>1,"seven"=>1,"see"=>1,"sit"=>1,"sitting"=>1,"son"=>1,
"soap"=>1,"spy"=>1,"sum"=>1,"say"=>1,"sea"=>1,"sex"=>1,"shy"=>1,"sib"=>1,"sic"=>1,"sin"=>1,
"sip"=>1,"sir"=>1,"sky"=>1,"ski"=>1,"sly"=>1,"sob"=>1,"sow"=>1,"sod"=>1,"should"=>1,
"something"=>1,"sometime"=>1,"somewhere"=>1,"set"=>1,"simple"=>1,"such"=>1,"side"=>1,
"t"=>1,"to"=>1,"the"=>1,"then"=>1,"that"=>1,"this"=>1,"those"=>1,"than"=>1,"these"=>1,
"those"=>1,"they"=>1,"thank"=>1,"tank"=>1,"tell"=>1,"take"=>1,"together"=>1,"try"=>1,"today"=>1,
"three"=>1,"tie"=>1,"thy"=>1,"tax"=>1,"tea"=>1,"tap"=>1,"taxi"=>1,"ten"=>1,"tin"=>1,"tip"=>1,
"tit"=>1,"toe"=>1,"tog"=>1,"tom"=>1,"ton"=>1,"top"=>1,"tow"=>1,"toy"=>1,"two"=>1,"tub"=>1,
"tug"=>1,"tun"=>1,"tux"=>1,"true"=>1,"thank"=>1,"theirs"=>1,"them"=>1,"there"=>1,"though"=>1,
"through"=>1,"thus"=>1,"time"=>1,"times"=>1,"too"=>1,"type"=>1, "u"=>1,"use"=>1,"us"=>1,
"using"=>1,"usage"=>1,"useful"=>1,"up"=>1,"upon"=>1,"ups"=>1,"under"=>1,"until"=>1,"untrue"=>1,
"users"=>1, "v"=>1,"van"=>1,"vex"=>1,"via"=>1,"vow"=>1,"vat"=>1,"vim"=>1,"version"=>1,"very"=>1,
"w"=>1,"was"=>1,"waste"=>1,"why"=>1,"who"=>1,"whose"=>1,"well"=>1,"walk"=>1,"were"=>1,
"which"=>1,"wish"=>1,"white"=>1,"with"=>1,"would"=>1,"write"=>1,"when"=>1,"what"=>1,"wash"=>1,
"warm"=>1,"want"=>1,"went"=>1,"will"=>1,"won"=>1,"woe"=>1,"wow"=>1,"woo"=>1,"wins"=>1,
"where"=>1,"web"=>1,"way"=>1,"were"=>1,"where"=>1,"whom"=>1,"wide"=>1,"within"=>1,"without"=>1,
"world"=>1,"worse"=>1,"worst"=>1,"www"=>1,"we"=>1,"whether"=>1, "y"=>1,"yes"=>1,"ya"=>1,"you"=>1,"yellow"=>1,"your"=>1,"yet"=>1,"yen"=>1,"year"=>1,"yep"=>1,
"yon"=>1,"yours"=>1,"z"=>1,"zoo"=>1,"zip"=>1,"zed"=>1,"zinc"=>1,"zoom"=>1,"zero"=>1,
"zeal"=>1,"zone"=>1);


$allWords = array();

if($submit){

global $allWords;

mysql_connect( "localhost", "root", "" ) or die( "Unable to connect to database" );
mysql_select_db( "test" ) or die( "Unable to select database" );

LoadCurrentWords();


if ( $title and $body){
ProcessForm($title ,$body);
echo "Successfully Finished Parsing and Uploading Content";
}else{
$err="Please fill in the fields to uploadn";
form($err);
}
}else{ //end of main
form($err);
}

function form($errmsg)
{ ?>
<h4 align="center">File Parser & Uploader</h4>
<div align="center"><b><? echo $errmsg; ?></b></div>
<center>
<form method="POST" action=<? echo $PHP_SELF ?>>
Title: <input type="text" name="title" size="50" maxlength="100"><p>
Abstract: <textarea rows=20 cols=50 wrap="off" name="body"></textarea><p>
<input type="submit" name="submit" value="Start Parsing and Upload Content">
</table>
</form>

</center>
<?
}

function LoadCurrentWords(){
global $allWords;

$result = mysql_query( "select keyid, keyword from keytable" ) or die( "Error in executing mysql query" );

while ( $row = mysql_fetch_array($result) ) {
$allWords[$row['keyword']] = $row['keyid'];
}
}


function ExtractWords($text){
$STATE0 = 0; //Numeric / Other Characters
$STATE1= 1; //Alpha Characters
$state = $STATE0;

$wordList = array();
$curWord = "";

for ( $i = 0; $i < strlen($text); ++$i ) {
$ch = $text{$i};
$isAlpha = ctype_alpha( $ch );

if ( $state == $STATE0) {
if ( $isAlpha ) {
$curWord = $ch;
$state = $STATE1;
}
}
else if ( $state == $STATE1) {
if ( $isAlpha ) {
$curWord .= $ch;
}
else {
$wordList[] = strtolower( $curWord );
$state = $STATE0;
}
}
}

if ( $state == $STATE1) {
$wordList[] = strtolower( $curWord );
}

return $wordList;
}

function FilterCommonAndDuplicateWords( $wordList ) {
global $COMMON_WORDS;
global $MAX_WORD_LENGTH;

$wordMap = array();

foreach ( $wordList as $word ) {
$len = strlen( $word );
if ( ($len > 1) && ($len < $MAX_WORD_LENGTH) ) {
if ( !$wordMap[$word] ) {
if ( !$COMMON_WORDS[$word] ) {
$wordMap[$word] = 1;
}
}
}
}

return $wordMap;
}

function ProcessForm($title ,$body){

global $allWords;

$tempWordList = ExtractWords( $body );
$wordList = FilterCommonAndDuplicateWords($tempWordList);

// insert into content
mysql_query( sprintf( "INSERT INTO content (title, abstract) VALUES ('%s', '%s')",
mysql_escape_string($title), mysql_escape_string($body) ) );

//store the newly generated content id in $contentId
$contentId = mysql_insert_id();

// insert all the new words and links
while(list($word,$val)=each($wordList)) {
$keyId = "";
if ( !$allWords[$word] ) {
mysql_query( sprintf( "INSERT INTO keytable ( keyword ) VALUES ( '%s' )",
mysql_escape_string($word) ) );

$keyId = mysql_insert_id();
$allWords[$word] = $keyId;
}
else {
$keyId = $allWords[$word];
}

// insert the link
mysql_query( sprintf( "INSERT INTO link (keyid, contid) VALUES ( %d, %d )", $keyId, $contentId ) );
}
//End of Processing Form.

}
?>


search.php

<html>
<head>
<title>Search Engine</title>
<style type="text/css">

body{ font-size:20; font-weight:bold; font-stretch:semi-expand; font-family:MSserif; color:#0066CC; background-color:#EEEEE4;
align:center; background-color:white }
h4{ background-color:#0066CC; color:#FFFFFF; font-family:verdana; }
h3{ color:#0066CC; }
th{ background-color:#6996ED; color:#FFFFFF; font-family:Arial; }
a{text-decoration:none;}
</style>
</head>
<body>
<?php
if($submit)
{
if(!$keywords){
$errmsg="Sorry, Please fill in search field";
form($errmsg);
}else{
// Connect to the database
$dServer = "localhost";
$dDb = "test";
$dUser = "admin";
$dPass = "";

$s = @mysql_connect($dServer, $dUser, $dPass)
or die("Couldn't connect to database server");

@mysql_select_db($dDb, $s)
or die("Couldn't connect to database");

$CommonWords=array("a"=>1,"as"=>1,"any"=>1,"all"=>1,"am"=>1,"an"=>1,"and"=>1,"are"=>1,"at"=>1,
"b"=>1,"be"=>1,"but"=>1,"by"=>1,
"c"=>1,"can"=>1,
"d"=>1,"did"=>1,"does"=>1,"do"=>1,
"e"=>1,"each"=>1,"else"=>1,"even"=>1,"ever"=>1,
"f"=>1,"for"=>1,"from"=>1,
"g"=>1,"go"=>1,"get"=>1,
"h"=>1,"hi"=>1,"he"=>1,"his"=>1,"him"=>1,"her"=>1,"has"=>1,"how"=>1,"had"=>1,"here"=>1,"have"=>1, "i"=>1,"in"=>1,"is"=>1,"if"=>1,"its"=>1,
"j"=>1,"just"=>1,"k"=>1,
"l"=>1,"like"=>1,"led"=>1,"lap"=>1,"let"=>1,
"m"=>1,"my"=>1,"me"=>1,"many"=>1,"must"=>1,"more"=>1,
"n"=>1,"no"=>1,"not"=>1,"new"=>1,"now"=>1,
"o"=>1,"of"=>1,"on"=>1,"or"=>1,"once"=>1,
"p"=>1,"q"=>1,"r"=>1,
"s"=>1,"so"=>1,"some"=>1,"say"=>1,"she"=>1,
"t"=>1,"to"=>1,"the"=>1,"then"=>1,"that"=>1,
"u"=>1,"use"=>1,"us"=>1,"up"=>1,"upon"=>1,
"v"=>1,"via"=>1,"vow"=>1,
"w"=>1,"was"=>1,"why"=>1,"who"=>1,"whose"=>1,"were"=>1,
"y"=>1,"yes"=>1,"ya"=>1,"you"=>1,"your"=>1,
"z"=>1,"zoo"=>1,);


//START TIMER
$start=getmicrotime();

$search_keywords=strtolower(trim($keywords));
$arrWords = explode(" ", $search_keywords);

//remove duplicates
$arrWords=array_unique($arrWords);

$searchWords=array();
$junkWords=array();
foreach($arrWords as $word)
//remove common words
if(!$CommonWords[$word]){
$searchWords[]=$word;
}else{
$junkWords[]=$word;
}

//count no of words in the search words and store in a variable
$noofSearchWords=count($searchWords);


//explode to an array
$arrWords = implode("' OR keyword='", $searchWords);

//get the key ids from the key table
$query = "select * from keytable where keyword='$arrWords'";

$kResult = mysql_query($query);

//array to store the content id and occurances
$contArray=array();
$rescount=0;
//search for the link table only if all the given keywords present in the keytable

if(mysql_num_rows($kResult) == $noofSearchWords){
while($kRow=mysql_fetch_array($kResult))
{
//get the link ids for each key id
$kid= $kRow['keyid'];
$query = "SELECT * FROM link WHERE keyid=$kid";
$lResult = mysql_query($query);
//echo mysql_num_rows($lResult);
while($lRow=mysql_fetch_array($lResult))
{
$thisContentId=$lRow["contid"];
if(!$contArray[$thisContentId]){
$contArray[$thisContentId]=1;
}else{
$contArray[$thisContentId]++;
}
}
}//end of while

if(isset($contArray)){
//declare an array to store the results
$FoundRef=array();

//Sort array in desending order of the key value
arsort($contArray,SORT_DESC);

// while(list($contentId,$occurances)=each($contArray)){
while(list($contentId,$occurances)=each($contArray)){

$aQuery = "select contid,title,left(abstract,200) as summary from content where contid = " . $contentId;
$aResult = mysql_query($aQuery);

if(mysql_num_rows($aResult) > 0){
$aRow = mysql_fetch_array($aResult);
$FoundRef[] = array (
"contid" => $aRow["contid"],
"title" => $aRow["title"],
"summary" => $aRow["summary"],
"occurance"=>$occurances
);
}//end of if
}


//end TIMER
$end=getmicrotime();

//TOTAL TIME TAKEN TO DO SEARCH OPERATION
$time_taken=(float)($end-$start);
$time_taken=number_format($time_taken,2,'.','');

}//end of if countwords == mysql_number_of _ records

//end TIMER
$end=getmicrotime();

//TOTAL TIME TAKEN TO DO SEARCH OPERATION
$time_taken=(float)($end-$start);
$time_taken=number_format($time_taken,2,'.','');

if(isset($FoundRef))
{
echo "<table width="100%"><tr><th class="title">Search Result</td></tr></table>";
echo "<a href="#" onclick="history.back()">Back</a>";
echo "<br>";
echo sizeof($FoundRef);
echo (sizeof($FoundRef) == 1 ? " reference" : " references");
echo " found";
echo "<p>";
if($junkWords){
echo "Common words like";
foreach($junkWords as $jWords){
echo "&nbsp"."'".$jWords."'";
}
echo "are removed from the search string";
}
echo "</h5>";
foreach($FoundRef as $a => $value)
{
echo "<table>";
echo "<tr><td valign="top">";
// echo $FoundRef[$a]["contid"];
?>

<a href=showref.php?refid=<? echo $FoundRef[$a]["contid"]?>><emp><b><? echo $FoundRef[$a]["title"]?></b></emp></a><div align="right"> Occurance(s): <? echo $FoundRef[$a]["occurance"] ?></div>

<br><small><? echo $FoundRef[$a]["summary"] ?>...</small><br><br>
<? echo "</td></tr>";
}?>


<?
echo "</table>";
}//end of isset FoundRef


}else {
//end TIMER
$end=getmicrotime();

//TOTAL TIME TAKEN TO DO SEARCH OPERATION
$time_taken=(float)($end-$start);
$time_taken=number_format($time_taken,2,'.','');

echo "<p>Your Query Executed in $time_taken Seconds";

$errmsg="<p>No Search result found for '$keywords'";
echo $errmsg;
echo "<br><a href="#" onclick="history.back()">Back</a>";
}//endof isset ref
}//end of if key word exists
} else{ //display the form
form($keyword);
} //END OF FORM DISPLAY ?>
</body>
</html>
<?
function form($errmsg)
{ ?>
<h4 align="center">Search Engine</h4>
<b><? echo $errmsg; ?></b>
<center>
<form method=POST action=<? echo $PHP_SELF ?>>
</div>
Enter keywords to search on:
<input type="text" name="keywords" maxlength="100">
<input type="submit" name="submit" value="Search">
</form>
</body>
</html>
<?
}


function getmicrotime()
{
list($usec,$sec)=explode(" ",microtime());
return ((float)$usec+(float)$sec);
}
?>

showdoc.php

<?
$contid=$HTTP_GET_VARS["refid"];

//Connect to Database
$dServer = "localhost";
$dDb = "test";
$dUser = "admin";
$dPass = "";

$s = @mysql_connect($dServer, $dUser, $dPass)
or die("Couldn't connect to database server");

@mysql_select_db($dDb, $s)
or die("Couldn't connect to database");

//Get the data from the database
$query=mysql_query("SELECT * FROM content WHERE contid={$contid}");
$result=mysql_fetch_array($query);

?>
<html>
<head><title>Search Display</title>
<style type="text/css">
h2{
font-family:verdana;
font-size:15;
color:#123453;
}
th{
font-family:verdana;
font-size:12;
color:#123453;
}
td{
font-family:verdana;
font-size:12;
color:#123453;
}
th.title{
background-color:#10B0B0;
color:white;
}
td.data{
background-color:#E4E4E4;
}
a{
text-decoration:none;
font-family:verdana;
font-size:12;
background-color:#E4E4E4;
}
h3{
background-color="#5494E4";
color:white;
}
</style>
</head>
<body bgcolor=#F8F8F8>
<div align="center"><h3>Display Article</h3></div>


<table><tr><th>|</th>
<th> <a href="#" onclick="history.back()">Back to Results</a></th>
<th>|</th>
</table>
<table bgcolor="#F0F0F0" cellpadding="10" cellspacing="10">
<tr><td>Title</td><td><? echo $result["title"] ?></td></tr>
<tr><td valign="top">Abstract</td><td><? echo $result["abstract"] ?></td></tr>
</table>

Page 2


Publication Date: Thursday 24th July, 2003
Author: Murali Dhar View profile

Related Articles