September 2004 Archives

Yahoo's search bot and your web site

| | Comments (0)

A year or so ago now I moved the NZDWFC page from http://www.tetrap.com/ drwho/nzdwfc/ to nzdwfc.tetrap.com. To ease the transition, I placed a permanent 301 redirection from the old URL to the new one. Anyone going to the old URL gets bounced to the new URL without having to do anything.

Yahoo uses a spider called "Yahoo Slurp" to crawl the web looking for pages to add to the search index. Slurp hits http://www.tetrap.com/ drwho/nzdwfc/ and gets redirected to nzdwfc.tetrap.com like everyone else. Unfortunately Slurp has a bug in it, and adds the page to Yahoo's search index under the old URL.

Most of the NZDWFC page is indexed in Yahoo under the old URL, even pages I've added since the move (There are, in fact, only 5 pages in the Yahoo index for the nzdwfc.tetrap.com subdomain). This means that if the NZDWFC page comes up in a search and the user clicks on it, my server has to redirect them to the new page.

Last month the main domain www.tetrap.com got 4633 hits which resulted in redirects, a good number of those caused by people coming from Yahoo search results. Obviously I want to reduce this number so my web server has less work to do - the trouble is how to tell Yahoo Slurp not to index the old URL without breaking the redirection for users who surf in.

To do this I use Apache's rewrite engine like so:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} help\.yahoo\.com [NC]
RewriteCond %{REQUEST_URI} ^/drwho/nzdwfc/.*$ [NC]
RewriteRule ^.* - [G,L]

This all goes in the .htaccess file which sits in my root directory. The lines work as follows:

  1. Turns the rewrite engine on. Kinda essential.
  2. Checks the user-agent of the bot for the string "help.yahoo.com". Slurp uses this in its user agent.
  3. Matches any file they request in the /drwho/nzdwfc/ directory or below.
  4. Tells Apache to send back a 410 response. 410 means "it's gone, matey, and it ain't coming back". Additionally the L indicates to Apache not to process any more Rewrite stuff because we're finished.

So put them together, and the server tells Slurp that the file it is requesting is gone, but lets anyone else through to hit the redirection. There are still a lot of redirections happening, but hopefully Yahoo will gradually drop the old URLs in favour of the new ones, and the redirections will decrease.

That's the theory, anyway. I'll update this weblog with the results in a few month's time, hopefully...

The Apache rewrite engine is a great and powerful thing, but also a dangerous thing.

Messing with the look

| | Comments (0)

I've mucked about with the template a bit to make it look more like the rest of my site. I started doing a complete template to make it look like my LiveJournal but then realised how long it would take. LiveJournal's method of creating journal styles, named S2, may be tricky to learn, but it's much easier to customise every page without duplicating a lot of effort.

I will have to fiddle with it a bit more to put in cool stuff like a blogroll and some links, and maybe install some plugins, but that should be fairly easy.

Spam and Movable Type comments

| | Comments (0)

Back last year I posted about a possible way Movable Type could combat the spam problem thusly:

Even Movable Type has problems with spam, and they've been talking about ways to combat it.

Personally I'd have thought the easiest way would be to have a page which lists all the recently posted comments, in chronological order... that way the blog owner would be able to see comments made on months-old entries.

I see Movable Type 3 not only has such a page, but it does email notification like LiveJournal does. In fact, the back-end of MT 3.11 is pretty cool all told. :)

Right, this should all work then

| | Comments (0)

This is a new weblog. It's not intended to replace my old one, which is a LiveJournal, but to suppliment it. I'm still working out how I'll decide how to split the entries, and I may just end up using my LiveJournal for boring personal entries and this one for boring technical entries. You have been warned.

The main reason this is a Movable Type weblog and not another LiveJournal one, is because LiveJournal is missing a number of features which I don't expect will be implemented in the near future. Not limited to:

  • Trackbacks
  • Categories
  • Decent commenting for people who don't have a LiveJournal themselves
  • Etc

So this weblog here (I'm not too keen on the term 'blog' myself) now exists.

1 2 3 >>

About this Archive

This page is an archive of entries from September 2004 listed from newest to oldest.

October 2004 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 5.01