Sphinx (SQL Phrase Index) Introduction and Installation
Sphinx is a full-text search engine, distributed under GPL version 2. Commercial licensing (eg. for embedded use) is also available upon request.
Generally, it’s a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages.
Currently built-in data source drivers support fetching data either via direct connection to MySQL, or PostgreSQL, or from a pipe in a custom XML format. Adding new drivers (eg. to natively support some other DBMSes) is designed to be as easy as possible.
Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.
As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.
- high indexing speed (upto 10 MB/sec on modern CPUs);
- high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
- high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
- provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
- provides distributed searching capabilities;
- provides document exceprts generation;
- provides searching from within MySQL through pluggable storage engine;
- supports boolean, phrase, and word proximity queries;
- supports multiple full-text fields per document (upto 32 by default);
- supports multiple additional attributes per document (ie. groups, timestamps, etc);
- supports stopwords;
- supports both single-byte encodings and UTF-8;
- supports English stemming, Russian stemming, and Soundex for morphology;
- supports MySQL natively (MyISAM and InnoDB tables are both supported);
- supports PostgreSQL natively.
Where to get Sphinx
Sphinx is available through its official Web site at http://www.sphinxsearch.com/.
Currently, Sphinx distribution tarball includes the following software:
indexer: an utility which creates fulltext indexes;
search: a simple command-line (CLI) test utility which searches through fulltext indexes;
searchd: a daemon which enables external software (eg. Web applications) to search through fulltext indexes;
sphinxapi: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).
spelldump: a simple command-line tool to extract the items from an
MySpell(as bundled with OpenOffice) format dictionary to help customize your index, for use with wordforms.
indextool: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.
Most modern UNIX systems with a C++ compiler should be able to compile and run Sphinx without any modifications.
Currently known systems Sphinx has been successfully running on are:
- Linux 2.4.x, 2.6.x (various distributions)
- Windows 2000, XP
- FreeBSD 4.x, 5.x, 6.x
- NetBSD 1.6, 3.0
- Solaris 9, 11
- Mac OS X
CPU architectures known to work include X86, X86-64, SPARC64.
I hope Sphinx will work on other Unix platforms as well. If the platform you run Sphinx on is not in this list, please do report it.
At the moment, Windows version of Sphinx is not intended to be used in production, but rather for testing and debugging only. Two most prominent issues are missing concurrent queries support (client queries are stacked on TCP connection level instead), and missing index data rotation support. There are succesful production installations which workaround these issues. However, running high-volume search service under Windows is still not recommended.
On UNIX, you will need the following tools to build and install Sphinx:
- a working C++ compiler. GNU gcc is known to work.
- a good make program. GNU make is known to work.
On Windows, you will need Microsoft Visual C/C++ Studio .NET 2003 or 2005. Other compilers/environments will probably work as well, but for the time being, you will have to build makefile (or other environment specific project files) manually.
- Extract everything from the distribution tarball (haven’t you already?) and go to the
$ tar xzvf sphinx-0.9.8.tar.gz
$ cd sphinx
- Run the configuration program:
There’s a number of options to configure. The complete listing may be obtained by using
--helpswitch. The most important ones are:
--prefix, which specifies where to install Sphinx; such as
--prefix=/usr/local/sphinx(all of the examples use this prefix)
--with-mysql, which specifies where to look for MySQL include and library files, if auto-detection fails;
--with-pgsql, which specifies where to look for PostgreSQL include and library files.
- Build the binaries:
- Install the binaries in the directory of your choice: (defaults to
/usr/local/bin/on *nix systems, but is overridden with
$ make install
Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.
- Extract everything from the .zip file you have downloaded –
sphinx-0.9.8-win32-pgsql.zipif you need PostgresSQL support as well.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into
C:\Sphinx, such that
searchd.execan be found in
C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
- Install the
searchdsystem as a Windows service:
C:\Sphinx> C:\Sphinx\searchd --install --config C:\Sphinx\sphinx.conf --servicename SphinxSearch
searchdservice will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with
indexerbefore starting the service. A guide to do this can be found under Quick tour.
configure fails to locate MySQL headers and/or libraries, try checking for and installing
mysql-devel package. On some systems, it is not installed by default.
make fails with a message which look like
/bin/sh: g++: command not found make: *** [libsphinx_a-sphinx.o] Error 127
try checking for and installing
If you are getting compile-time errors which look like
sphinx.cpp:67: error: invalid application of `sizeof' to incomplete type `Private::SizeError<false>'
this means that some compile-time type size check failed. The most probable reason is that off_t type is less than 64-bit on your system. As a quick hack, you can edit sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t, but note that this will prohibit you from using full-text indexes larger than 2 GB. Even if the hack helps, please report such issues, providing the exact error message and compiler/OS details, so I could properly fix them in next releases.
If you keep getting any other error, or the suggestions above do not seem to help you, please don’t hesitate to contact me.
All the example commands below assume that you installed Sphinx in
searchd can be found in
To use Sphinx, you will need to:
- Create a configuration file.Default configuration file name is
sphinx.conf. All Sphinx programs look for this file in current working directory by default.
Sample configuration file,
sphinx.conf.dist, which has all the options documented, is created by
configure. Copy and edit that sample file to make your own configuration: (assuming Sphinx is installed into
$ cd /usr/local/sphinx/etc
$ cp sphinx.conf.dist sphinx.conf
$ vi sphinx.conf
Sample configuration file is setup to index
documentstable from MySQL database
test; so there’s
example.sqlsample data file to populate that table with a few documents for testing purposes:
$ mysql -u test < /usr/local/sphinx/etc/example.sql
- Run the indexer to create full-text index from your data:
$ cd /usr/local/sphinx/etc
- Query your newly created index!
To query the index from command line, use
$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/search test
To query the index from your PHP scripts, you need to:
- Run the search daemon which your script will talk to:
$ cd /usr/local/sphinx/etc
- Run the attached PHP API test script (to ensure that the daemon was succesfully started and is ready to serve the queries):
$ cd sphinx/api
$ php test.php test
- Include the API (it’s located in
api/sphinxapi.php) into your own scripts and use it.
Help Links :