Sphinx Search Installation


Sphinx search introduction

507110_binocular

After reading my introduction to full text search or you have read article somewhere else and decided to go with full text search in your next project, but you still confuse what full text search engine to use. One implementation of full text search engine is Sphinx. And I’ll give you a short course on how you installing Sphinx for your full text search engine.

Sphinx is a full-text search engine, distributed under GPL version 2. It is not only fast in searching but it is also fast in indexing your data. Currently, Sphinx API has binding in PHP, Python, Perl, Ruby and Java.

Sphinx features

  • high indexing speed (upto 10 MB/sec on modern CPUs);
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document exceprts generation;
  • provides searching from within MySQL through pluggable storage engine;
  • supports boolean, phrase, and word proximity queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports both single-byte encodings and UTF-8;
  • supports English stemming, Russian stemming, and Soundex for morphology;
  • supports MySQL natively (MyISAM and InnoDB tables are both supported);
  • supports PostgreSQL natively.

There you go, so fire up your terminal or console, and let’s get thing done.

Installing sphinxsearch

  1. Download sphinx at sphinxsearch.com, for this tutorial, I use Sphinx 0.9.8.1
    $wget http://sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
  2. Open your terminal, extract and install sphinx
    $tar -xvf sphinx-0.9.8.1.tar.gz
  3. sphinx need mysql-dev install, if you use ubuntu linux install this
    $sudo apt get install libmysqlclient15-dev
  4. Install sphinx to your system
    $cd sphinx-0.9.8.1/
    $./configure
    $make
    $sudo make install

    Note if you want to use sphinx with PostgreSQL, configure with this argument –with-pgsql

    $./configure --with-pgsql
  5. Test your installation

    $search

    This should come up in your terminal

    Sphinx 0.9.8.1-release (r1533)
    Copyright (c) 2001-2008, Andrew Aksyonoff
    
    Usage: search [OPTIONS] [word2 [word3 [...]]]
    
    Options are:
    -c, --config 	use given config file instead of defaults
    -i, --index 	search given index only (default: all indexes)
    -a, --any		match any query word (default: match all words)
    -b, --boolean		match in boolean mode
    -p, --phrase		match exact phrase
    -e, --extended		match in extended mode
    -f, --filter  	only match if attribute attr value is v
    -s, --sortby 	sort matches by 'CLAUSE' in sort_extended mode
    -S, --sortexpr 	sort matches by 'EXPR' DESC in sort_expr mode
    -o, --offset 	print matches starting from this offset (default: 0)
    -l, --limit 	print this many matches (default: 20)
    -q, --noinfo		dont print document info from SQL database
    -g, --group 	group by attribute named attr
    -gs,--groupsort 	sort groups by
    --sort=date		sort by date, descending
    --rsort=date		sort by date, ascending
    --sort=ts		sort by time segments
    --stdin			read query from stdin
    
    This program (CLI search) is for testing and debugging purposes only;
    it is NOT intended for production use.

Well done. You have Sphinx at your service. But before you can play with this full text search engine you have just installed, you have to understand how Sphinx works.

Sphinx installed 4 program in your environment, but most of the time we will only use indexer, search and searchd. To begin with, we have to create an index for our source. Let’s create a file name sphinx.conf, and here is a sample of sphinx.conf look like.

source book
{
    type            = mysql
    sql_host        = localhost
    sql_user        = root
    sql_pass        = root
    sql_db          = library
    sql_port        = 3306# optional, default is 3306
    sql_query       = SELECT id, title, summary, author from library
    sql_query_info  = SELECT * FROM library_book WHERE id=$id
}

index book
{
    source          = book
    path            = data/book
    docinfo         = extern
    charset_type    = sbcs
}

indexer
{
    mem_limit       = 32M
}

searchd
{
    port            = 3312
    log             = log/searchd.log
    query_log       = log/query.log
    read_timeout    = 5
    max_children    = 30
    pid_file        = log/searchd.pid
    max_matches     = 1000
}

For more information about sphinx configuration, please go to sphinx documentation.

Create log folder for our searchd log file and another folder named data for our index data. Run indexer to index our database.

$mkdir log
$mkdir data
$indexer --all
Sphinx 0.9.8.1-release(r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file ./sphinx.conf'...
indexing index 'book'...
collected 12 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 12 docs, 10319 bytes
total 0.018 sec, 571436.48 bytes/sec, 664.53 docs/sec

You can use search program to test search index you have just created. Assuming you have book with title contain PHP in your database, then run search PHP will give you some results.

$search PHP
  1. November 24th, 2009
    Trackback from : Trackback

Leave a comment