Simple virtual hosts (vhosts) with lighttpd

Posted 2007-08-02 in WWW by Johann.

lighttpd, the up-and-coming webserver, offers simple virtual hosts. Here’s how to do it.

Create your root directory

This is the directory all your websites will be stored under.

$ pwd
/var
$ mkdir www

Create folders for your virtual hosts/websites

For each virtual host, create a directory under the root directory. Do not create directories starting with www. – these prefixed will be removed.

$ cd www
$ mkdir rofl.info
$ mkdir blab.la
$ mkdir lm.ao

Upload your websites

Well… upload them.

Set the root directory in lighttpd

Open lighttpd.conf and set the var.basedir variable.

# pwd
/var/www/
# cd /etc/lighttpd
# nano lighttpd.conf
var.basedir = "/var/www"

Set up virtual hosts

In lighttpd.conf set the evhost.path-pattern key accordingly.

evhost.path-pattern = var.basedir + "/%0/"

Done

Just restart lighttpd or reload its configuration and you’re done. lighttpd will now distribute the requests based on the Host header to the different directories. Because we used %0, prefixes (like www. or w.) will be stripped.

Where to go from here

If you use an application server like Orion, I suggest reading lighttpd and Java application servers: integrating JSP and Servlets to integrate lighttpd with Java application servers.

lighttpd and Java application servers: integrating JSP and Servlets

Posted 2007-09-09 in WWW by Johann.

In this blog entry, I’ll show you how to integrate lighttpd in a JEE environment. After performing all the changes, lighty will transparently proxy your Java application server.

1. When to use lighttpd

You can use lighttpd to

  • secure access to your application server
  • reduce load on your server by offloading static requests
  • load balance your application servers
  • use lighttpd’s spambot and bad bot blocking capabilities
  • get more request rewriting and redirecting flexibility
  • use the above flexibility to improve your search engine rankings
  • profit.

2. When not to use lighttpd

You might not like lighttpd if you

  • don’t like configuring software
  • use URL rewriting and ;jsessionid.

3. lighttpd modules you need

The following lighty modules are needed:

  • mod_access
  • mod_redirect
  • mod_rewrite
  • mod_proxy

Add them to your server.modules section:

server.modules = (
 "mod_accesslog",
 "mod_access",
 "mod_redirect",
 "mod_rewrite",
 "mod_proxy",
 "mod_status",
 "mod_evhost",
 "mod_expire"
)

4. Denying access to JEE directories

The WEB-INF and META-INF directories shouldn’t be accessible through lighttpd. Files from your development environment also shouldn’t be visible.

url.access-deny = ( "WEB-INF", ".classpath", ".project", "META-INF" )

5. Binding your application server to localhost

To prevent duplicate content penalties, your application server shouldn’t be visible from the web. Even if you run it on a high port, someone might eventually find it.

Binding a web site to localhost looks like this in Orion’s <name>-web-site.xml:

<web-site host="127.0.0.1" port="12345">
	<frontend host="johannburkard.de" port="80"/>

Consult your documentation if you aren’t using Orion.

6. Redirecting www. to non-www. hosts

Even if you don’t really need to do this, I recommend doing so. Removing duplicate content will improve your rankings.

The following snippet redirects all visitors from www.<domain> to <domain> with a 301 permanent redirect.

$HTTP["host"] =~ "^www\.(.*)$" {
 url.redirect = ( "^/(.*)" => "http://%1/$1" )
}

You should also redirect all additional domains (johannburkard.com, johann-burkard.org) to your main domain.

7. Proxying dynamic requests

We will use mod_proxy to proxy some requests to your Java application server.

Depending on your site’s structure, one of the following approaches will work better.

Simple JSP

If all you have is a bunch of Java Server Pages, the following mod_proxy rule is sufficient:

proxy.server = ( ".jsp" =>
 (
  ( "host" => "127.0.0.1",
    "port" => "12345"
  )
 )
)

Note that the JSP must be actual files. You cannot use Servlets mapped to these URIs.

Applications

If you use Servlets or more complex applications, you can proxy URIs by prefix:

proxy.server = ( "/blog/" =>
 (
  ( "host" => "127.0.0.1",
    "port" => "12345"
  )
 )
)

Proxying with exceptions

If most of your site is dynamic and you have a directory for static content (/assets, /static or so), you can proxy all requests except requests for static files:

$HTTP["url"] !~ "^/static" {
 proxy.server = ( "" =>
  (
   ( "host" => "127.0.0.1",
     "port" => "12345"
   )
  )
 )
}

8. Rewriting requests

lighttpd can dynamically rewrite requests. I mostly use this to use default.jsp as dynamic index file instead of index.html. Here’s an example:

url.rewrite-once = ( "^(.*)/$" => "$1/default.jsp",
 "^(.*)/([;?]+.*)$" => "$1/default.jsp$2" )

This is visible at gra0.com and internally rewrites all requests from / to /default.jsp (including jsessionid and query string).

mod_rewrite can also be used to make URLs shorter. For example, to remove the ?page=comments query string, I use the following:

url.rewrite-once = (
 "^/blog/(.*)\.html$" => "/blog/$1.html?page=comments"
)

9. Redirecting requests

You can use mod_redirect to redirect the user to a different URL. Contrary to mod_rewrite where the request is rewritten, a 301 permanent redirect will be sent to the browser.

In this example, I’m redirecting requests to an old domain to a new domain:

$HTTP["host"] == "olddomain.com" {
 url.redirect = (
  "^/(.*)$" => "http://newdomain.com/$1"
 )
}

10. More things to be aware of

  • The only IP address in your application server log files should be 127.0.0.1. If you need the original address, log the X-FORWARDED-FOR header.
  • Don’t analyze both lighttpd and application server logs – lighty’s log files already contain all requests.
  • You might want to set up virtual hosts sooner or later.
  • Use mod_expire to make resources cacheable. Doing so can make your site a lot faster and save you money.

2 comments

Yahoo! bots list – all user agents

Posted 2008-01-12 in WWW by Johann.

Yahoo! has a huge number of bots. In this bot list, I’ll try to list all and explain shortly what they do.

  • Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
  • Mozilla/5.0 (Yahoo-MMCrawler/4.0; mailto:vertical-crawl-support@yahoo-inc.com)

Image or multimedia crawlers.

YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide; users …; views …)

News feed crawler.

Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)

Crawler for the Chinese Yahoo.

Mozilla/5.0 (compatible; Yahoo! DE Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Crawler for the German Yahoo.

  • Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
  • Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

The well known Slurp crawler, probably the most active legit crawler. I don’t see the first one a lot.

  • Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • LG-C1500 UP.Browser/6.2.3 (GUI) MMP/1.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • MOT-V975/81.33.02I MIB/2.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • SGH-Z130 SHP/VPP/R5 SMB3.1 SMM-MMS/1.1.0 profile/MIDP-2.0 configuration/CLDC-1.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • Nokia6610/1.0 (3.09) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
  • YahooSeeker/M1A1-R2D2

These user agent strings belong to Yahoo!’s mobile web index crawler.

  • Vodafone/1.0/V705SH (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)
  • DoCoMo/2.0 SH902i (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)
  • KDDI-CA33 UP.Browser/6.2.0.10.4 (compatible; Y!J-SRD/1.0; http://help.yahoo.co.jp/help/jp/search/indexing/indexing-27.html)

Some more mobile web crawlers, probably specific to the Japanese Yahoo.

Y!J-BSC/1.0 (http://help.yahoo.co.jp/help/jp/blog-search/)

A blog spider.

Yahoo! Slurp/Site Explorer

Bot that verifies site authentication through Yahoo! Site Explorer.

4 comments

Pages

Page 1 · Page 2

Subscribe

RSS 2.0, Atom or subscribe by Email.

Top Posts

  1. DynaCloud - a dynamic JavaScript tag/keyword cloud with jQuery
  2. 6 fast jQuery Tips: More basic Snippets
  3. xslt.js version 3.2 released
  4. xslt.js version 3.0 released XML XSLT now with jQuery plugin
  5. Forum Scanners - prevent forum abuse
  6. Automate JavaScript compression with YUI Compressor and /packer/

Navigation