Sphinx Search Installation
Sphinx search introduction
After reading my introduction to full text search or you have read article somewhere else and decided to go with full text search in your next project, but you still confuse what full text search engine to use. One implementation of full text search engine is Sphinx. And I’ll give you a short course on how you installing Sphinx for your full text search engine.
Sphinx is a full-text search engine, distributed under GPL version 2. It is not only fast in searching but it is also fast in indexing your data. Currently, Sphinx API has binding in PHP, Python, Perl, Ruby and Java.
Sphinx features
- high indexing speed (upto 10 MB/sec on modern CPUs);
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
- provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
- provides distributed searching capabilities;
- provides document exceprts generation;
- provides searching from within MySQL through pluggable storage engine;
- supports boolean, phrase, and word proximity queries;
- supports multiple full-text fields per document (upto 32 by default);
- supports multiple additional attributes per document (ie. groups, timestamps, etc);
- supports stopwords;
- supports both single-byte encodings and UTF-8;
- supports English stemming, Russian stemming, and Soundex for morphology;
- supports MySQL natively (MyISAM and InnoDB tables are both supported);
- supports PostgreSQL natively.
There you go, so fire up your terminal or console, and let’s get thing done.
Installing sphinxsearch
- Download sphinx at sphinxsearch.com, for this tutorial, I use Sphinx 0.9.8.1
$wget http://sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
- Open your terminal, extract and install sphinx
$tar -xvf sphinx-0.9.8.1.tar.gz
- sphinx need mysql-dev install, if you use ubuntu linux install this
$sudo apt get install libmysqlclient15-dev
- Install sphinx to your system
$cd sphinx-0.9.8.1/ $./configure $make $sudo make install
Note if you want to use sphinx with PostgreSQL, configure with this argument –with-pgsql
$./configure --with-pgsql
Test your installation
$search
This should come up in your terminal
Sphinx 0.9.8.1-release (r1533) Copyright (c) 2001-2008, Andrew Aksyonoff Usage: search [OPTIONS] [word2 [word3 [...]]] Options are: -c, --config use given config file instead of defaults -i, --index search given index only (default: all indexes) -a, --any match any query word (default: match all words) -b, --boolean match in boolean mode -p, --phrase match exact phrase -e, --extended match in extended mode -f, --filter only match if attribute attr value is v -s, --sortby sort matches by 'CLAUSE' in sort_extended mode -S, --sortexpr sort matches by 'EXPR' DESC in sort_expr mode -o, --offset print matches starting from this offset (default: 0) -l, --limit print this many matches (default: 20) -q, --noinfo dont print document info from SQL database -g, --group group by attribute named attr -gs,--groupsort sort groups by --sort=date sort by date, descending --rsort=date sort by date, ascending --sort=ts sort by time segments --stdin read query from stdin This program (CLI search) is for testing and debugging purposes only; it is NOT intended for production use.
Well done. You have Sphinx at your service. But before you can play with this full text search engine you have just installed, you have to understand how Sphinx works.
Sphinx installed 4 program in your environment, but most of the time we will only use indexer, search and searchd. To begin with, we have to create an index for our source. Let’s create a file name sphinx.conf, and here is a sample of sphinx.conf look like.
source book { type = mysql sql_host = localhost sql_user = root sql_pass = root sql_db = library sql_port = 3306# optional, default is 3306 sql_query = SELECT id, title, summary, author from library sql_query_info = SELECT * FROM library_book WHERE id=$id } index book { source = book path = data/book docinfo = extern charset_type = sbcs } indexer { mem_limit = 32M } searchd { port = 3312 log = log/searchd.log query_log = log/query.log read_timeout = 5 max_children = 30 pid_file = log/searchd.pid max_matches = 1000 }
For more information about sphinx configuration, please go to sphinx documentation.
Create log folder for our searchd log file and another folder named data for our index data. Run indexer to index our database.
$mkdir log $mkdir data $indexer --all Sphinx 0.9.8.1-release(r1533) Copyright (c) 2001-2008, Andrew Aksyonoff using config file ./sphinx.conf'... indexing index 'book'... collected 12 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 12 docs, 10319 bytes total 0.018 sec, 571436.48 bytes/sec, 664.53 docs/sec
You can use search program to test search index you have just created. Assuming you have book with title contain PHP in your database, then run search PHP will give you some results.
$search PHP