Sphinx Search

From Wikibooks, open books for an open world
Jump to: navigation, search

If you are somebody that doesn't know much about indexes, APIs, and configuration files it's a pretty good indication that you should read this book. Unlike the official documentation, this book explains certain things that the documentation thinks you should know already. Note that I will focus mostly on the PHP API throughout this book, since I'm best at that language, but you can use Sphinx with pretty much any language imaginable.

Why Use Sphinx?[edit]

You might be asking yourself, why bother use Sphinx? It seems to be more trouble than it's worth.

  • Speed: The key reason is SPEED. Your regular MySQL searches take a SIGNIFICANTLY longer time to execute than a Sphinx search. You will start noticing the difference once your database has millions of records. If you have a small database (ex: forum of 100 users), will it be of any help? Not really...It's still fun to experiment with though! Plus there are cool features like word morphology (if I am searching for cats it will match cat, if I'm searching for running it will match run, runs, runable (I think...), etc.)
  • Search Features: Another reason is search features. Ever wonder when you search for 2 words in Google, it will look for them in the same paragraph or 2 paragraphs (or sentence) but not all the way across the page? Sphinx allows you to do similar things :)
  • Scalability: If you have big big databases on many many servers Sphinx will take care of it. And it will make it seem to the application that it's looking at one server. Sphinx can take a lot of stress off the PHP servers in terms of handling and searching information.

What Is Different About Sphinx?[edit]

Sphinx is a little bit different from what you are used to with MySQL queries. So don't expect to get it instantly.

Indexing[edit]

What Is Indexing?[edit]

Sphinx retrieves data from a table in a MySQL database and performs a process called indexing on it. Indexing creates a file that is easily searchable by Sphinx. For example, if you try to search a document in Microsoft Word it will search for words one by one in the document text. In a very large document searching can be noticeably slow. On the other hand, Sphinx indexes before any searches take place. This creates an index which can be efficiently searched instead of searching the entire document word by word. A good example is an encyclopedia index. If I want to find some information about cats, I could do what Microsoft Word does and read every page in the encyclopedia looking for the occurrence of the word "cat". Or I can look in the back, under cat, ah pages 104, 195, and 653. Much easier.

Clipboard

To do:
explain what happens during indexing

You Can Only Search What is Indexed[edit]

What you need to remember is that SPHINX CAN ONLY SEARCH WHAT IS IN THE INDEX! That means that every time you want to search the latest results, you must update the index. So if you were writing an article about cats and indexed the article table, then wrote an article about chickens, and did a search it would find your cat article, but it would treat the chicken article as if it never existed.

If you remember one thing from this section, it's that SPHINX CAN ONLY SEARCH WHAT IS IN THE INDEX!

Accessing Data[edit]

If you used PHP with MySQL this will be a lot easier for you. Otherwise you should probably consider learning PHP and MySQL (or anything and MySQL for that matter). There are great tutorials on Tigzag and W3Schools.

Sphinx normally returns MySQL ids, not data!!![edit]

A key thing you need to remember about Sphinx is that it normally does NOT retrieve data. Primarily it retrieves document ids, so you might be wondering What the heck is Sphinx for then? SEARCHING. Sphinx does the intensive part, which is searching for specific records. Then you can do the easy part through MySQL, which is retrieving that document. So for example, if Sphinx pulls out document ids 1, 5, and 7 from the index you will need to do a query in MySQL to retrieve the records (probably with ids of 1,5, and 7). You might think it's stupid, but it takes very few resources for MySQL to search for a document ID compared to searching for a word.

Here's an example. Say Sphinx pulls out documents with ids 1, 5, 7. You would do

SELECT * FROM documents WHERE id IN (1,5,7)

You are telling MySQL to select all the columns from the documents table (or whatever table the result was in) where the id (or whatever you called the id field) is either 1, 5, or 7. And then you can use mysql_fetch_array in PHP to look at the data and do whatever with it.

Once you learn how to use Sphinx for ordering your results, you can preserve the order by:

  • Storing the order of the results in an array (just store the id property for matches)
  • Doing a IMPLODE on the array using $result = implode(", ", $array) where $array was your array of results, and result will store a string of your result ids separated by comma space.
SELECT * FROM documents WHERE id IN ($result) ORDER BY FIELD(id,$result)

Here we are telling MySQL to order the results by the id field in the order we specified in $result.

It might seem like a lot of work, but you'll get used to it, and eventually you'll write functions to handle all of this for you. Don't worry, it's totally worth it!!!

Clipboard

To do:
To be...CONTINUED