Skip to main content

Fulltext search

More
11 years 3 months ago - 11 years 3 months ago #1808 by MichielStr
Fulltext search was created by MichielStr
I have changed so much, that I am not sure if this will work for others, too. Perhaps Bram could have a look.

I have changed the search mechanism to fulltext search and would like to share this with you, because it seriously increases the speed of searches.

To change to fulltext search, you will first have to create a fulltext index on the dataitems table, including the feed, title and description fields.

Then, in helpers.php, replace the existing make_qQuery function with the following one:
Code:
function make_qQuery($q,&$db) { static $queries=array(); $query = array(); $a=JRequest::getVar('areas',''); $exact=JRequest::getInt('exact',0); if ( $a == '' && defined('DF_AREAS')) { $a=DF_AREAS; } if ( $a == '' ) { $areas=array( 0=>'0', 1=>'1', 2=>'2', 3=>'3', 4=>'4', 5=>'5', 6=>'6', 7=>'7', 8=>'8', 9=>'9', 't'=>'t', 'd'=>'d','f'=>'f'); } else { $areas=array_flip(explode(',',$a)); } $md5=md5(join('',$areas).$q.$exact); if ( isset($queries[$md5]) ) { return $queries[$md5]; } if (preg_match ('/ AND /', $q) || preg_match ('/ and /', $q)){ $q = '+'.$q; } $search = array('/ OR /','/ or /','/ AND /','/ and /','/ NOT /','/ not /','/\'/'); $replace = array(' ',' ',' +',' +',' -',' -','"'); $q = preg_replace($search, $replace, $q); // echo $q; $query_string = " MATCH (`feed`, `description`, `title`) AGAINST ('" .$q. "' IN BOOLEAN MODE)"; // echo $query_string; $queries[$md5]=$query_string; return $query_string; }

This function does not make use of mysql's full fulltext potential, but for me it's sufficient. I have taken out of Bram's original function what I knew I could take out and left in what I wasn't sure of.

I haven't bothered to check for title and description search being set or not, since search is a lot faster already. Because of this, I am posting this in the Pro forum, since title and description searches are Pro features.

To see what this search does and/or what input it accepts, visit www.keuswijzer.nl and hover your mouse over the small question mark next to the 'Zoeken' button top right.

Oh, one more thing: Minimal word length for fulltext indices is 4 usually. This means that search words shorter than 4 characters will be ignored. If you want to change this and have access to my.cnf on the server, add 'ft_min_word_len = X' where 'X' is the minimal length.

If anyone has any improvements/corrections, please share them!
Last edit: 11 years 3 months ago by MichielStr.

Please Log in to join the conversation.

More
11 years 3 months ago - 11 years 3 months ago #1809 by redactie
Replied by redactie on topic Re: Fulltext search
I tested the full text search in the past, similair to the solution shown. I can add the code to the core including a define to switch it on.

'problem' is that the fulltext search does not the same results as the like:

example from one of my sites:

`description` like '%wellness%' versus MATCH ( `description`) AGAINST ('wellness' IN BOOLEAN MODE)

The like returns 5000 results, the match 2500

Google makes a fortune with natural language search it's pretty difficult...
Last edit: 11 years 3 months ago by redactie.

Please Log in to join the conversation.

More
11 years 3 months ago #1810 by MichielStr
Replied by MichielStr on topic Re: Fulltext search
I haven't run a comparison, but I am willing to take that loss given the decrease in query-time.

Hopefully the siblings-grouping partially solves this problem, because if a search word isn't found in one product description, it might be in another...?

Please Log in to join the conversation.

More
11 years 3 months ago #1813 by redactie
Replied by redactie on topic Re: Fulltext search
fulltext has a wildcard as well * ( like uses % )

will try it.

Please Log in to join the conversation.

More
11 years 3 months ago #1814 by MichielStr
Replied by MichielStr on topic Re: Fulltext search
Yes, I know. But I am leaving the decision of using that to the user. Using wildcars by default makes the results less strict.

I am assuming that if the user types in 'television', they are looking for 'television' and not for any product containing 'television'. If they are, they can use 'television*' themselves. Disadvantage of the * in BOOLEAN MODE is that can only be appended to the search term; '*television' doesn't work.

Please Log in to join the conversation.

Time to create page: 0.391 seconds