Sphinx (SQL Phrase Index) Introduction and Installation


Sphinx is a full-text search engine, distributed under GPL version 2. Commercial licensing (eg. for embedded use) is also available upon request.

Generally, it’s a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages.

Currently built-in data source drivers support fetching data either via direct connection to MySQL, or PostgreSQL, or from a pipe in a custom XML format. Adding new drivers (eg. to natively support some other DBMSes) is designed to be as easy as possible.

Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.

As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.

Sphinx features:

  • high indexing speed (upto 10 MB/sec on modern CPUs);
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document exceprts generation;
  • provides searching from within MySQL through pluggable storage engine;
  • supports boolean, phrase, and word proximity queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports both single-byte encodings and UTF-8;
  • supports English stemming, Russian stemming, and Soundex for morphology;
  • supports MySQL natively (MyISAM and InnoDB tables are both supported);
  • supports PostgreSQL natively.

Where to get Sphinx

Sphinx is available through its official Web site at http://www.sphinxsearch.com/.

Currently, Sphinx distribution tarball includes the following software:

  • indexer: an utility which creates fulltext indexes;
  • search: a simple command-line (CLI) test utility which searches through fulltext indexes;
  • searchd: a daemon which enables external software (eg. Web applications) to search through fulltext indexes;
  • sphinxapi: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).
  • spelldump: a simple command-line tool to extract the items from an ispell or MySpell (as bundled with OpenOffice) format dictionary to help customize your index, for use with wordforms.
  • indextool: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.

Installation

Supported systems

Most modern UNIX systems with a C++ compiler should be able to compile and run Sphinx without any modifications.

Currently known systems Sphinx has been successfully running on are:

  • Linux 2.4.x, 2.6.x (various distributions)
  • Windows 2000, XP
  • FreeBSD 4.x, 5.x, 6.x
  • NetBSD 1.6, 3.0
  • Solaris 9, 11
  • Mac OS X

CPU architectures known to work include X86, X86-64, SPARC64.

I hope Sphinx will work on other Unix platforms as well. If the platform you run Sphinx on is not in this list, please do report it.

At the moment, Windows version of Sphinx is not intended to be used in production, but rather for testing and debugging only. Two most prominent issues are missing concurrent queries support (client queries are stacked on TCP connection level instead), and missing index data rotation support. There are succesful production installations which workaround these issues. However, running high-volume search service under Windows is still not recommended.

Required tools

On UNIX, you will need the following tools to build and install Sphinx:

  • a working C++ compiler. GNU gcc is known to work.
  • a good make program. GNU make is known to work.

On Windows, you will need Microsoft Visual C/C++ Studio .NET 2003 or 2005. Other compilers/environments will probably work as well, but for the time being, you will have to build makefile (or other environment specific project files) manually.

Installing Sphinx on Linux

  1. Extract everything from the distribution tarball (haven’t you already?) and go to the sphinx subdirectory:

    $ tar xzvf sphinx-0.9.8.tar.gz
    $ cd sphinx

  2. Run the configuration program:

    $ ./configure

    There’s a number of options to configure. The complete listing may be obtained by using --help switch. The most important ones are:

    • --prefix, which specifies where to install Sphinx; such as --prefix=/usr/local/sphinx (all of the examples use this prefix)
    • --with-mysql, which specifies where to look for MySQL include and library files, if auto-detection fails;
    • --with-pgsql, which specifies where to look for PostgreSQL include and library files.
  3. Build the binaries:

    $ make

  4. Install the binaries in the directory of your choice: (defaults to /usr/local/bin/ on *nix systems, but is overridden with configure --prefix)

    $ make install

Installing Sphinx on Windows

Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.

  1. Extract everything from the .zip file you have downloaded – sphinx-0.9.8-win32.zip (or sphinx-0.9.8-win32-pgsql.zip if you need PostgresSQL support as well.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx, such that searchd.exe can be found in C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
  2. Install the searchd system as a Windows service:C:\Sphinx> C:\Sphinx\searchd --install --config C:\Sphinx\sphinx.conf --servicename SphinxSearch
  3. The searchd service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with indexer before starting the service. A guide to do this can be found under Quick tour.

Known installation issues

If configure fails to locate MySQL headers and/or libraries, try checking for and installing mysql-devel package. On some systems, it is not installed by default.

If make fails with a message which look like

/bin/sh: g++: command not found
make[1]: *** [libsphinx_a-sphinx.o] Error 127

try checking for and installing gcc-c++ package.

If you are getting compile-time errors which look like

sphinx.cpp:67: error: invalid application of `sizeof' to
    incomplete type `Private::SizeError<false>'

this means that some compile-time type size check failed. The most probable reason is that off_t type is less than 64-bit on your system. As a quick hack, you can edit sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t, but note that this will prohibit you from using full-text indexes larger than 2 GB. Even if the hack helps, please report such issues, providing the exact error message and compiler/OS details, so I could properly fix them in next releases.

If you keep getting any other error, or the suggestions above do not seem to help you, please don’t hesitate to contact me.

Quick Sphinx usage tour

All the example commands below assume that you installed Sphinx in /usr/local/sphinx, so searchd can be found in /usr/local/sphinx/bin/searchd.

To use Sphinx, you will need to:

  1. Create a configuration file.Default configuration file name is sphinx.conf. All Sphinx programs look for this file in current working directory by default.

    Sample configuration file, sphinx.conf.dist, which has all the options documented, is created by configure. Copy and edit that sample file to make your own configuration: (assuming Sphinx is installed into /usr/local/sphinx/)

    $ cd /usr/local/sphinx/etc
    $ cp sphinx.conf.dist sphinx.conf
    $ vi sphinx.conf

    Sample configuration file is setup to index documents table from MySQL database test; so there’s example.sql sample data file to populate that table with a few documents for testing purposes:

    $ mysql -u test < /usr/local/sphinx/etc/example.sql

  2. Run the indexer to create full-text index from your data:

    $ cd /usr/local/sphinx/etc
    $ /usr/local/sphinx/bin/indexer

  3. Query your newly created index!

To query the index from command line, use search utility:

$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/search test

To query the index from your PHP scripts, you need to:

  1. Run the search daemon which your script will talk to:

    $ cd /usr/local/sphinx/etc
    $ /usr/local/sphinx/bin/searchd

  2. Run the attached PHP API test script (to ensure that the daemon was succesfully started and is ready to serve the queries):

    $ cd sphinx/api
    $ php test.php test

  3. Include the API (it’s located in api/sphinxapi.php) into your own scripts and use it.

Happy searching!

Help Links :

komunitasweb.com/2009/03/sphinxsearch-introduction/

http://www.sphinxsearch.com/docs/current.html#intro

http://www.mysqlperformanceblog.com/?s=sphinx+tutorial+mysql –

wordpress.org/extend/plugins/search/other_notes/


Documentation

Sphinx 0.9.9 reference manual

Table of Contents

1. Introduction
1.1. About
1.2. Sphinx features
1.3. Where to get Sphinx
1.4. License
1.5. Author and contributors
1.6. History
2. Installation
2.1. Supported systems
2.2. Required tools
2.3. Installing Sphinx on Linux
2.4. Installing Sphinx on Windows
2.5. Known installation issues
2.6. Quick Sphinx usage tour
3. Indexing
3.1. Data sources
3.2. Attributes
3.3. MVA (multi-valued attributes)
3.4. Indexes
3.5. Restrictions on the source data
3.6. Charsets, case folding, and translation tables
3.7. SQL data sources (MySQL, PostgreSQL)
3.8. xmlpipe data source
3.9. xmlpipe2 data source
3.10. Live index updates
3.11. Index merging
4. Searching
4.1. Matching modes
4.2. Boolean query syntax
4.3. Extended query syntax
4.4. Weighting
4.5. Sorting modes
4.6. Grouping (clustering) search results
4.7. Distributed searching
4.8. searchd query log format
4.9. MySQL protocol support and SphinxQL
5. Command line tools reference
5.1. indexer command reference
5.2. searchd command reference
5.3. search command reference
5.4. spelldump command reference
5.5. indextool command reference
6. API reference
6.1. General API functions
6.1.1. GetLastError
6.1.2. GetLastWarning
6.1.3. SetServer
6.1.4. SetRetries
6.1.5. SetConnectTimeout
6.1.6. SetArrayResult
6.1.7. IsConnectError
6.2. General query settings
6.2.1. SetLimits
6.2.2. SetMaxQueryTime
6.2.3. SetOverride
6.2.4. SetSelect
6.3. Full-text search query settings
6.3.1. SetMatchMode
6.3.2. SetRankingMode
6.3.3. SetSortMode
6.3.4. SetWeights
6.3.5. SetFieldWeights
6.3.6. SetIndexWeights
6.4. Result set filtering settings
6.4.1. SetIDRange
6.4.2. SetFilter
6.4.3. SetFilterRange
6.4.4. SetFilterFloatRange
6.4.5. SetGeoAnchor
6.5. GROUP BY settings
6.5.1. SetGroupBy
6.5.2. SetGroupDistinct
6.6. Querying
6.6.1. Query
6.6.2. AddQuery
6.6.3. RunQueries
6.6.4. ResetFilters
6.6.5. ResetGroupBy
6.7. Additional functionality
6.7.1. BuildExcerpts
6.7.2. UpdateAttributes
6.7.3. BuildKeywords
6.7.4. EscapeString
6.7.5. Status
6.8. Persistent connections
6.8.1. Open
6.8.2. Close
7. MySQL storage engine (SphinxSE)
7.1. SphinxSE overview
7.2. Installing SphinxSE
7.2.1. Compiling MySQL 5.0.x with SphinxSE
7.2.2. Compiling MySQL 5.1.x with SphinxSE
7.2.3. Checking SphinxSE installation
7.3. Using SphinxSE
7.4. Building snippets (excerpts) via MySQL
8. Reporting bugs
9. sphinx.conf options reference
9.1. Data source configuration options
9.1.1. type
9.1.2. sql_host
9.1.3. sql_port
9.1.4. sql_user
9.1.5. sql_pass
9.1.6. sql_db
9.1.7. sql_sock
9.1.8. mysql_connect_flags
9.1.9. mysql_ssl_cert, mysql_ssl_key, mysql_ssl_ca
9.1.10. odbc_dsn
9.1.11. sql_query_pre
9.1.12. sql_query
9.1.13. sql_query_range
9.1.14. sql_range_step
9.1.15. sql_query_killlist
9.1.16. sql_attr_uint
9.1.17. sql_attr_bool
9.1.18. sql_attr_bigint
9.1.19. sql_attr_timestamp
9.1.20. sql_attr_str2ordinal
9.1.21. sql_attr_float
9.1.22. sql_attr_multi
9.1.23. sql_query_post
9.1.24. sql_query_post_index
9.1.25. sql_ranged_throttle
9.1.26. sql_query_info
9.1.27. xmlpipe_command
9.1.28. xmlpipe_field
9.1.29. xmlpipe_attr_uint
9.1.30. xmlpipe_attr_bool
9.1.31. xmlpipe_attr_timestamp
9.1.32. xmlpipe_attr_str2ordinal
9.1.33. xmlpipe_attr_float
9.1.34. xmlpipe_attr_multi
9.1.35. xmlpipe_fixup_utf8
9.1.36. mssql_winauth
9.1.37. mssql_unicode
9.1.38. unpack_zlib
9.1.39. unpack_mysqlcompress
9.1.40. unpack_mysqlcompress_maxsize
9.2. Index configuration options
9.2.1. type
9.2.2. source
9.2.3. path
9.2.4. docinfo
9.2.5. mlock
9.2.6. morphology
9.2.7. min_stemming_len
9.2.8. stopwords
9.2.9. wordforms
9.2.10. exceptions
9.2.11. min_word_len
9.2.12. charset_type
9.2.13. charset_table
9.2.14. ignore_chars
9.2.15. min_prefix_len
9.2.16. min_infix_len
9.2.17. prefix_fields
9.2.18. infix_fields
9.2.19. enable_star
9.2.20. ngram_len
9.2.21. ngram_chars
9.2.22. phrase_boundary
9.2.23. phrase_boundary_step
9.2.24. html_strip
9.2.25. html_index_attrs
9.2.26. html_remove_elements
9.2.27. local
9.2.28. agent
9.2.29. agent_blackhole
9.2.30. agent_connect_timeout
9.2.31. agent_query_timeout
9.2.32. preopen
9.2.33. ondisk_dict
9.2.34. inplace_enable
9.2.35. inplace_hit_gap
9.2.36. inplace_docinfo_gap
9.2.37. inplace_reloc_factor
9.2.38. inplace_write_factor
9.2.39. index_exact_words
9.2.40. overshort_step
9.2.41. stopword_step
9.3. indexer program configuration options
9.3.1. mem_limit
9.3.2. max_iops
9.3.3. max_iosize
9.3.4. max_xmlpipe2_field
9.3.5. write_buffer
9.4. searchd program configuration options
9.4.1. listen
9.4.2. address
9.4.3. port
9.4.4. log
9.4.5. query_log
9.4.6. read_timeout
9.4.7. client_timeout
9.4.8. max_children
9.4.9. pid_file
9.4.10. max_matches
9.4.11. seamless_rotate
9.4.12. preopen_indexes
9.4.13. unlink_old
9.4.14. attr_flush_period
9.4.15. ondisk_dict_default
9.4.16. max_packet_size
9.4.17. mva_updates_pool
9.4.18. crash_log_path
9.4.19. max_filters
9.4.20. max_filter_values
9.4.21. listen_backlog
9.4.22. read_buffer
9.4.23. read_unhinted
A. Sphinx revision history

1. Introduction

1.1. About

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial licensing (eg. for embedded use) is also available upon request.

Generally, it’s a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages.

Currently built-in data source drivers support fetching data either via direct connection to MySQL, or PostgreSQL, or from a pipe in a custom XML format. Adding new drivers (eg. to natively support some other DBMSes) is designed to be as easy as possible.

Search API is natively ported to PHP, Python, Perl, Ruby, Java, and also available as a pluggable MySQL storage engine. API is very lightweight so porting it to new language is known to take a few hours.

As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.

1.2. Sphinx features

  • high indexing speed (upto 10 MB/sec on modern CPUs);
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
  • provides good relevance ranking through combination of phrase proximity ranking and statistical (BM25) ranking;
  • provides distributed searching capabilities;
  • provides document exceprts generation;
  • provides searching from within MySQL through pluggable storage engine;
  • supports boolean, phrase, and word proximity queries;
  • supports multiple full-text fields per document (upto 32 by default);
  • supports multiple additional attributes per document (ie. groups, timestamps, etc);
  • supports stopwords;
  • supports both single-byte encodings and UTF-8;
  • supports English stemming, Russian stemming, and Soundex for morphology;
  • supports MySQL natively (MyISAM and InnoDB tables are both supported);
  • supports PostgreSQL natively.

1.3. Where to get Sphinx

Sphinx is available through its official Web site at http://www.sphinxsearch.com/.

Currently, Sphinx distribution tarball includes the following software:

  • indexer: an utility which creates fulltext indexes;
  • search: a simple command-line (CLI) test utility which searches through fulltext indexes;
  • searchd: a daemon which enables external software (eg. Web applications) to search through fulltext indexes;
  • sphinxapi: a set of searchd client API libraries for popular Web scripting languages (PHP, Python, Perl, Ruby).
  • spelldump: a simple command-line tool to extract the items from an ispell or MySpell (as bundled with OpenOffice) format dictionary to help customize your index, for use with wordforms.
  • indextool: an utility to dump miscellaneous debug information about the index, added in version 0.9.9-rc2.

Bookmark and Share
    • arasu
    • November 12th, 2009

    how to identify in MySQL

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: