MVEL Templating Introduction

Posted 2009-01-29 in Java by Johann.

MVEL is an expression language – similar to OGNL – and a templating engine.

I’d like to give you an example of MVEL Templates in this post so you can find out if MVEL might work for you.

Templating Examples

This is how templating with MVEL looks like.

Basic Object Access

<h1>@{name}</h1>

Simple Iteration

<p>
@foreach{index : alphabetical}
<a href="@{index.uri}">@{index.description}</a>
@end{}
</p>

Accessing Static Methods

<a href="@{ua.pageURI}">
@{org.apache.commons.lang.StringEscapeUtils.escapeHtml(ua.name)}
</a>

Inline Ternary Operator

<li>
@{ua.hitsTotal} total @{ua.hitsTotal == 1 ? "Hit" : "Hits"}.
</li>

MVEL Integration

The following code integrates MVEL into your application. The first part parses a template from a String, the second part applies an object to the template and writes it to a file.

public class MVELTemplateWriter {

    private final CompiledTemplate template;

    /**
     * Constructor for MVELTemplateWriter.
     *
     * @param template the MVEL template
     */
    public MVELTemplateWriter(String template) {
        super();
        this.template = TemplateCompiler.compileTemplate(template);
    }

    /**
     * Merge an Object with the template and write the output
     * to f.
     *
     * @param o the Object
     * @param f the output File
     */
    public void write(Object o, File f) throws IOException {
        String output = (String)
            TemplateRuntime.execute(template, o);
        Writer writer = null;
        try {
            if (!f.getParentFile().exists()) {
                boolean created = f.getParentFile().mkdirs();
                assert created;
            }
            writer = new OutputStreamWriter(new
                FileOutputStream(f), "UTF-8");
            writer.write(output);
        }
        finally {
            close(writer);
        }
    }

}

You use this code like you would use other templating engines/expression languages: You add your objects to a Map and then merge the Map with a template. In the template, you reference the objects in the Map by their key.

Note that the template is pre-compiled for performance reasons. You can use something like FileUtils.readFileToString(File) to read a template file into a String.

Summary

Good

I liked:

Speed is excellent. Most of the time when building the User Agent Database is spent writing graphs and parsing log files however.
Clean syntax. Cleaner than everything Sun has ever produced, but probably not as clean and simple as Velocity.
Supports arbitrary methods. Velocity makes it hard to use static methods and does not support operations on arrays at all.

Bad

Not all is nice however. I did not like the following:

No streaming output. All output is cached in RAM before it can be written to a file.

Do you use a templating engine/expression language? Maybe you use Velocity, OGNL, FreeMaker, StringTemplate or something else entirely? Please post a comment if you do.

9 comments

Java Wildcard String Matching

Posted 2008-12-15 in Java by Johann.

This entry contains code examples of Java pattern matching with wildcards.

Note that wildcard matching is not the same as the .* regular expression which matches any number of characters – a wildcard matches only one character. Wildcards are usually encoded as ., but the actual value may vary across libraries.

StringSearch

StringSearch 1.2 comes with two wildcard pattern matching algorithms, BNDMWildcards and ShiftOrWildcards. Generally, BNDMWildcards will be faster, which is why I removed ShiftOrWildcards in version 2.

public void testStringSearch() {
    BNDMWildcards bndm = new BNDMWildcards();
    Object compiled = bndm.processString("bla.blorb");
    // "bla?blorb" for StringSearch 1.2
    assertEquals(3,
        bndm.searchString("la bla0blorb null", "bla.blorb", compiled));
}

java.util.regex.Pattern

The java.util.regex.Pattern API isn’t very compact, but of course it does offer more than just wildcards.

public void testJavaUtilRegex() {
    Pattern searchPattern = Pattern.compile("bla.blorb");
    Matcher m = searchPattern.matcher("la bla0blorb null");
    assertTrue(m.find());
    assertEquals(3, m.start());
}

Jakarta Oro

ORO offers many PatternMatcher implementations. In this example, I am using the Perl5Matcher class.

public void testJakartaORO() throws MalformedPatternException {
    Pattern p = new Perl5Compiler().compile("bla.blorb");
    Perl5Matcher matcher = new Perl5Matcher();
    assertTrue(matcher.contains("la bla0blorb null", p));
    MatchResult result = matcher.getMatch();
    assertEquals(3, result.beginOffset(0));
}

Case Insensitive Search

All of the APIs presented here support case-insensitive string matching. The case insensitive option is simply compiled into the pattern in the compile phase.

StringSearch

This example requires StringSearch version 2 or greater.

    BNDMWildcardsCI bndm = new BNDMWildcardsCI();
    Object compiled = bndm.processString("bla.blorb");
…

java.util.regex.Pattern

    Pattern searchPattern = Pattern.compile("bla.blorb",
        Pattern.CASE_INSENSITIVE);
…

Jakarta ORO

        Pattern p = new Perl5Compiler().compile("bla.blorb",
                Perl5Compiler.CASE_INSENSITIVE_MASK);
…

Apache Lucene - the ghetto search engine

Posted 2008-01-15 in Java by Johann.

The background

Back in 2002, I started using Lucene as the search engine on my site.

I tried the little demo application to index my pages and – to my surprise – it couldn’t even index Java Server Pages. Think about it: A Java application that cannot index Java Server Pages.

Anyway, I wrote a working parser for JSPs and emailed it to Doug Cutting, Lucene’s inventor. I never heard anything from Doug which I found rather rude.

Doug…

…Cutting used to work for Excite. I don’t know about you, but I have problems remembering anything about Excite. Maybe because they weren’t really good.

Why Lucene?

In the new layout of johannburkard.de, I will integrate search results in the pages directly. This is caused by a move away from tree-like site structures and towards “relatedness”-based linking.

In other words, instead of forcing a tree-based navigation structure (Home -> Blog -> Programming -> Java), I will link to pages that are related to the currently viewed page, regardless of their location within the site.

For this to work, search results must obviously be really good.

Unfortunately, Lucene consistently ranked all blog entries about inc above the original entry which caused me to look at the ranking formula of Lucene.

An example of just how much Lucene sucks

To illustrate the obvious ranking problems of Lucene, here are two example documents:

Document 1

Hallo welt hallo, hallo!

Document 2

Ein Hallo-Welt-Programm ist ein kleines Computerprogramm und soll auf möglichst einfache Weise zeigen, welche Anweisungen oder Bestandteile für ein vollständiges Programm in einer Programmiersprache benötigt werden und somit einen ersten Einblick in die Syntax geben. Aufgabe des Programms ist, den Text Hallo Welt! oder auf Englisch Hello, world! auszugeben. Ein solches Programm ist auch geeignet, die erfolgreiche Installation eines Compilers für die entsprechende Programmiersprache zu überprüfen. Aufgrund der einfachen Aufgabenstellung kann ein Hallo-Welt-Programm aber nicht als Einführung in die Sprache selbst dienen, denn es folgt zumeist nur dem Programmierparadigma der imperativen Programmierung und demonstriert somit nur einen Bruchteil der Möglichkeiten der meisten Sprachen.

The question: Which one of these ranks first for “hallo Welt”?

If you guessed document 2, you are wrong. It’s document 1. It ranks better than document 2 by a large margin.

Why? Simply because the frequency of “hallo” over the document is higher in document 1 than it is in document 2.

Sounds stupid? Do you remember keyword stuffing? Search engines in the 90’s were vulnerable to the same problem.

Learning from Lucene’s epic fail

Even if you do not use Lucene, you can still learn from the massive mistakes in Lucene’s design:

Document numbers are stored as signed ints. This means that Lucene will never be able to index more than 2 billion documents. Two billion documents is just ridiculous for Internet search. Exalead say they index 8 billion documents. Eight billion documents might have been a lot in 2000 or so. Consequently, their results aren't great. Now imagine what their results would look like if they had one fourth of their index size. Still you have lots of people trying to become the next Google using Nutch (which uses Lucene).
Text is stored in one field. With Lucene, it is impossible to increase or decrease the weight of individual terms in a document. For example, linking to inc with the anchor text “inc” should decrease the weight of inc in the current document.
Horribly inconsistent API. If I remember correctly, the first versions of Lucene were no interfaces and all final classes or so. In the last years, someone must have done some half-assed refactoring so there is now a Fieldable interface that – uhm – does the same thing as the Field class. W00t. The lesson to learn here is that a bad API can ruin your project for years so don’t let people that are new to Java design Java APIs.
Using tf/idf. I admit I’m not an information retrieval expert but in all text books that I read (Ricardo Baeza-Yates’ books come to mind), tf/idf was always presented as a basic, “beginner’s” ranking formula that only yields inferior results.
Locking down APIs. When I experimented with Lucene, I thought to myself “Maybe there is a CrappyRanking that I can change to GoodRanking by calling IndexReader#setRanking(Ranking),” then I got lost writing wrappers for Query, tried to access the Weight instance from the wrapper which didn’t work because it is package private, tried to find the ranking formula, found out that by default, 50 results are fetched (hardcoded)… and gave up.

What Lucene does well

Surprisingly, Lucene does a few things really well.

Indexation and index access performance. I believe index performance is something that is getting better all the time.
Query analysis. There’s a variety of query parsers available so even complex queries can be parsed.
Resource usage. I have never noticed any excessive RAM or disk usage.

The verdict

If you plan on using Lucene for anything else than simple site search (“enter keywords, return documents that contain keywords”), you should look somewhere else.

6 comments

The X-FORWARDED-FOR HTTP header

Posted 2008-12-02 in Java by Johann.

X-FORWARDED-FOR is a HTTP header that is inserted by proxies to identify the IP address of the client. It can also be added to requests if application servers are proxied by proxy servers. In this case, the request IP address is always a local address and the client IP address must be extracted from the request.

Since proxies can be chained – for example if the client’s request is already made through a proxy – the X-FORWARDED-FOR header can contain more than one IP address, separated by commas. In this case, the first one should be used.

Java Code

The following Java code extracts the originating IP address of an HttpServletRequest object.

public final class HTTPUtils {

    private static final String HEADER_X_FORWARDED_FOR =
        "X-FORWARDED-FOR";

    public static String remoteAddr(HttpServletRequest request) {
        String remoteAddr = request.getRemoteAddr();
        String x;
        if ((x = request.getHeader(HEADER_X_FORWARDED_FOR)) != null) {
            remoteAddr = x;
            int idx = remoteAddr.indexOf(',');
            if (idx > -1) {
                remoteAddr = remoteAddr.substring(0, idx);
            }
        }
        return remoteAddr;
    }

}

JSPs

In a JSP, the X-FORWARDED-FOR header can be retrieved as follows:

<%= request.getHeader("X-FORWARDED-FOR") %>

Of course, a Servlet Filter could replace the original HttpServletRequest with a wrapped version that returns the X-FORWARDED-FOR value.

Example Request

Here is a full request that was made from 129.78.138.66 through the proxy at 129.78.64.103:

2008-12-01 16:00:59,878 INFO  AntiScrape - 129.78.138.66, 129.78.64.103:
 USER-AGENT: …
 HOST: johannburkard.de
 PRAGMA: no-cache
 ACCEPT: */*
 ACCEPT-ENCODING: identity
 VIA: 1.1 www-cacheF.usyd.edu.au:8080 (squid/2.6.STABLE5)
 X-FORWARDED-FOR: 129.78.138.66, 129.78.64.103
 CACHE-CONTROL: no-cache, max-age=604800
 X-HOST: johannburkard.de
 X-FORWARDED-PROTO: http

MVEL Templating Introduction

Templating Examples

Basic Object Access

Simple Iteration

Accessing Static Methods

Inline Ternary Operator

MVEL Integration

Summary

Good

Bad

9 comments

Java Wildcard String Matching

StringSearch

java.util.regex.Pattern

Jakarta Oro

Case Insensitive Search

StringSearch

java.util.regex.Pattern

Jakarta ORO

Apache Lucene - the ghetto search engine

The background

Doug…

Why Lucene?

An example of just how much Lucene sucks

Document 1

Document 2

Learning from Lucene’s epic fail

What Lucene does well

The verdict

6 comments

The X-FORWARDED-FOR HTTP header

Java Code

JSPs

Example Request

6 comments

Pages

Subscribe

Top Posts

Categories

Navigation