Posts Tagged ‘ Search ’

Create robots.txt file for better SEO

SEO - Search Engine Optimization

It has been a half month since ThemeLib.com was created, not a long time, huh ?. Today, I decided to test how is my site in two popular Search Engines : Google and Yahoo.

Here is the result :

All of the results in Google is fine. They indexed my posts, my tags, my keywords, … But there are some problems with my site in Yahoo. Yahoo indexed my wp-login page and download links !!! How did it happen, huh ? If you have knowledge on SEO (Search Engine Optimization), these results are not good, really. After examining ThemeLib.com a few minutes, I realized that I have not created a Robots.txt file yet !!! Oh man, how can I forget it? :shock:

Introduction

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant forsearch engines (like images, download links, admin files, …) therefore creating a robots.txt file can actually improve your website indexation.

How to Create a Robots.txt file

A robots.txt is just a simple text file that can be created with any text editor such as Notepad. If you are using WordPress, a sample robots.txt file would be :

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/
Disallow: /download/

“User-agent: *” means that all the search engine (Google, Yahoo, MSN and so on) should use those instructions to crawl your website.

“Disallow: /wp-” will make sure that the search engines will not crawl the WordPress files. This line will exclude all files and foldes starting with “wp-” from the indexation, avoiding duplicated content and admin files. Similar to Disallow: /feed/, Disallow: /trackback/, …

After you created the robots.txt file just upload it to your root directory and DONE!

Further reading

Here are some good links that may useful for you if you want to know more about SEO and Robots.txt

Conclusion

You should always verify your site in Search Engine like Google, Yahoo to see if there are any problems with the indexing. Hope you do not make a mistake like me. If you need professional help, you can hire a SEO company

XABH88M229UZ

Advertisements

Sphinx Search Engine Performance

The following is a summary of some real-world data collected from the Sphinx query logs on a cluster of 15 servers. Each server runs its own copy of Sphinx, Apache, a busy web application, MySQL and miscellaneous services.

The dataset contains 453 million query log instances from 180 Sphinx indexes, collected over several months, using Sphinx version 0.9.8 on Linux kernel 2.6.18. The servers are all Dell PowerEdge 1950 with Quad Core Intel® Xeon® E5335, 2×4MB Cache, 2.0GHz, 1333MHz FSB, SATA drives, 7200rpm.

Keep in mind, though, that this is real world data and not a controlled test. This is how Sphinx performed in our environment, for the particular way we use Sphinx.

The graph below displays the response time distribution for all servers and all indexes, and shows, for example, that 60% of queries complete within 0.01 secs, 80% within 0.1 secs and 99% within 0.5 secs. Response times tend to occur in 3 bands (corresponding to the peaks in the frequency graph) – <0.001 sec, 0.03 sec and 0.3secs, which partly relates to the number of disk accesses required to fulfil a request. At 0.001 sec, all data is in memory, while at 0.3 secs, several disk accesses are occurring. Whilst the middle peak is not so obvious in this graph, the per-server or per-index graphs often have different distributions but still tend to have peaks at one or more of these three bands.
Sphinx Query Response Times Total for all servers, all indexes

The next observation is that query word count affects performance, but not necessarily in proportion to the number of query words, as shown in the graph below. 1-4 word queries consistently offer best performance. The 6-50 words range is consistently the slowest, most likely because the chance of finding documents with multiple matches is high so there is extra ranking effort involved. Above 50, there is presumably a higher chance of having words with few matches, which speeds up the ranking process.
Sphinx Query Response Time by Query Word Count

Finally, we see that the size of the inverted index (.spd files) also affects performance. The three graphs below show how the response time distribution tends to move to the right as the index size increases. The larger the index, the higher the chance that data will need to be re-read from disk (rather than from Sphinx-internal or system buffers/cache), hence this is not unexpected.
Sphinx Query Response Times for Index Sizes 1MB - 3MB
Sphinx Query Response Times for Index Sizes 3MB - 30MBSphinx Query Response Times for Index Sizes >30MB

Here is a PDF summary of Sphinx performance for this dataset, including many additional graphs of the data by server and by index.

Technorati Site View