Few weeks ago i had a project where i needed to search a large database and provide results based on their relevance. The answer was MySQL Full-Text Search.
Normaly i have started coding and used the LIKE finction in MySQL to get the results and started testing on a small temporary database that i’ve created. After some part of the code was finished i’ve decided to get the data from the real database fun some tests and try to finetune the results. AND there was the problem search results were displaying after 20-30 seconds. So, it was time to learn something, For larger databases we should use Full-Text Search.
The basics for Multiword MySQL Full-Text Search
Let’s talk about the basics for using MySQL Full-Text Search. It can’t be implemented on MySQL default InnoDB storage engine tables. So if you are using InnoDB you need to convert them to MyISAM Engine.
Note: From MySQL version 5.6 you can use Full-Text Search with InnoDB storage engine.
ALTER TABLE tablename ENGINE = MYISAM;
After we have our tables on MyISAM storage engine we need to add full text index on our selected columns on the desired table.
ALTER TABLE products ADD FULLTEXT(title, sdescription, ldescription)
MySql Full Text Search for Multiword Search String
So now that we have our table ready for Full-Text Search and we have them indexed here is a simple Query for searching “iphone case 4s”
SELECT * FROM products WHERE MATCH (title, sdescription, ldescription) AGAINST ('iphone case 4s');
This will return all the results that contain ANY of the words “iphone case 4s” in Title, Short Description and Long Descriptions. But this is not what we want. We need to use IN BOOLEAN MODE to fine tune our results.
MySQL Full-Text Search Using IN BOOLEAN MODE
With BOOLEAN FULL TEXT SEARCHES we’ll add boolean mode and we will have some useful operators available to us.
+ means AND
– means NOT
[no operator] means OR
As we have just searched for “iphone case 4s” you can see that there is no operator in between which means the query will find all the rows having any one of those words. We could use operators for further refinement our search, only if we use IN BOOLEAN MODE modifier.
Here i have few examples using MySQL Full-Text search with operators:
- +iphone +case +4s: Will find only rows having all three words
- +iphone -case: will find rows that contain iphone but not case
- +iphone 4s: will find rows that must contain iphone, but rank rows higher if they also contain “4s”.
- +iphone ~4s: Fetch rows that must have the word “iphone”, but if the row also have the word “4s”, rate it lower than if row does not.
- ‘+iphone +(>case <4s)’: Get the rows that contain the words “iphone” and “case”, or “4s” and “4s” (in any order), but rank “iphone case” high than “iphone 4s”.
SELECT * FROM products WHERE MATCH (title, sdescription, ldescription) AGAINST ('+iphone +case +4s' IN BOOLEAN MODE);
MySQL Full-Text Search With Relevance
Finally we want to sort the results by relevance. We will multiply the title results by 10, the short description by 3 and long description as it is.
SELECT *, MATCH(`title`) AGAINST ('+iphone +case +4s' IN BOOLEAN MODE) * 10 as rel1, MATCH(`sdescription`) AGAINST ('+iphone +case +4s' IN BOOLEAN MODE) * 3 as rel2, MATCH(`ldescription`) AGAINST ('+iphone +case +4s' IN BOOLEAN MODE) as rel3, FROM products WHERE MATCH (title, sdescription, ldescription) AGAINST ('+iphone +case +4s' IN BOOLEAN MODE) ORDER BY (rel1)+(rel2)+(rel3) DESC;
A couple points and limitations about MySQL Full-Text Searching
- Search keywords are not case sensitive
- Short words will be ignored, the default minimum length is 4 characters. You can change the min and max word length with the variables ft_min_word_len and ft_max_word_len
- Words called stopwords are ignored, you can specify your own stopwords, but default words include the, have, some – see default stopwords list.
- You can disable stopwords by setting the variable ft_stopword_file to an empty string.
- Full Text searching is only supported by the MyISAM storage engine.
- If a word is present in more than 50% of the rows it will have a weight of zero. This has advantages on large datasets, but can make testing difficult on small ones.