How to automatically ping search engines when your sitemap has changed
I prefer letting cron update sitemaps in the background, and at the end of the script I ping search engines to let them know it’s been updated:
1 # Recreate sitemap goes here 2 3 # Let search engines know about the update 4 [ "http://www.google.com/webmasters/tools/ping?sitemap=http://xxx/sitemap.xml", 5 "http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap=http://xxx/sitemap.xml", 6 "http://submissions.ask.com/ping?sitemap=http://xxx/sitemap.xml", 7 "http://webmaster.live.com/ping.aspx?siteMap=http://xxx/sitemap.xml" ].each do |url| 8 open(url) do |f| 9 if f.status[0] == "200" 10 puts "Sitemap successfully submitted to #{url}" 11 else 12 puts "Failed to submit sitemap to #{url}" 13 end 14 end 15 end 16
More about sitemaps: http://en.wikipedia.org/wiki/Sitemaps
How to optimize your MephistoBlog powered site's search engine ranking (SEO for MephistoBlog)
At Aktagon we use MephistoBlog as CMS , and I couldn’t find any information on how to SEO optimize MephistoBlog on Google, so I’m sharing my notes here.
This tip shows you how to make your pages more search engine friendly.
First, add the title tag, plus the meta description and keywords tags to your layout’s Liquid template , as shown here:
1 <meta name="description" content="{% if article %} {{ article.excerpt }} {% else %} YOUR DEFAULT SITE DESCRIPTION {% endif %}" /> 2 <meta name="keywords" content="{% if article %} {% for tag in article.tags %}{{ tag }}, {% endfor %} {% endif %} YOUR DEFAULT KEYWORDS" /> 3 <title>{% if article %} {{ article.title }} » {{ site.title }} {% else %} {{ site.title }} » {{ site.subtitle }} {% endif %}</title>
Remember to update the default description and keywords in the meta tags’ body.
Now, whenever you publish an article, simply add an excerpt and some tags to it. The excerpt is used as the meta description and the article’s tags as the meta keywords, both make Google a bit happier, but the description is by far the more important.
How to detect traffic from the most common search spiders with Ruby
- Google – Googlebot/2.1 ( http://www.googlebot.com/bot.html)
- Google Image – Googlebot-Image/1.0 ( http://www.googlebot.com/bot.html)
- MSN Live – msnbot-Products/1.0 (+http://search.msn.com/msnbot.htm)
- Yahoo – Mozilla/5.0 (compatible; Yahoo! Slurp;)
The code (via):
1 user_agent = request.user_agent.downcase 2 @bot = [ 'msnbot', 'yahoo! slurp','googlebot' ].detect { |bot| user_agent.include? bot }
When the Google bot visists your site the @bot string will contain ‘googlebot’.
If you need to detect more bots than these, then the user-agents.org site contains a list of various user agents for both bots and browsers.
Sample thinking-sphinx configuration
How to install and use the Sphinx search engine and acts_as_sphinx plugin on Debian Etch
Inspiration for this snippet was taken from this post on the Sphinx forum, plus this blog post.
Compiling Sphinx
First install the prerequisites:
1 sudo aptitude install libmysql++-dev libmysqlclient15-dev checkinstall
Next download sphinx, libstemmer and install everything and the fish:
1 cd /usr/local/src 2 3 wget http://sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz 4 tar zxvf sphinx-0.9.8-rc2.tar.gz 5 6 cd sphinx-0.9.8-rc2/ 7 8 # Add stemming support for Swedish, Finnish and other fun languages. 9 wget http://snowball.tartarus.org/dist/libstemmer_c.tgz 10 tar zxvf libstemmer_c.tgz 11 12 ./configure --with-libstemmer 13 make 14 15 make install
Configure Sphinx
Create a sphinx.conf file in your Rails config directory, as described here, or use this template.
Install acts_as_sphinx plugin
1 ./script/plugin install http://svn.datanoise.com/acts_as_sphinx
Add acts_as_sphinx to your model:
1 class Documents 2 acts_as_sphinx 3 end
Indexing content
1 rake sphinx:index 2 3 (in /var/www/xxx.com/releases/20080429144230) 4 Sphinx 0.9.8-rc2 (r1234) 5 Copyright (c) 2001-2008, Andrew Aksyonoff 6 7 using config file './sphinx.conf'... 8 indexing index 'xxx.com'... 9 collected 5077 docs, 0.6 MB 10 sorted 0.1 Mhits, 100.0% done 11 total 5077 docs, 632096 bytes 12 total 0.160 sec, 3950427.25 bytes/sec, 31729.86 docs/sec
Reindexing content
sphinx:index shouldn’t be run while the searchd process is running, so use rake sphinx:rotate instead, which restarts the searchd process after indexing.
Starting the daemon
1 mkdir -m 664 /var/log/sphinx 2 rake sphinx:start 3 4 (in /var/www/xxx.com/releases/20080429144230) 5 Sphinx 0.9.8-rc2 (r1234) 6 Copyright (c) 2001-2008, Andrew Aksyonoff 7 8 using config file './sphinx.conf'... 9 Sphinx searchd server started.
Searching
1 Documents.find_with_sphinx 'why did I write this'