Mike Schaeffer's Blog

Articles with tag: java
March 31, 2014

Sometimes, it’s easy to focus so much on the architecture of a system that the details of its implementation get lost. While it’s true that inattention to architectural concerns can cause a system to fail, it’s also true that poor attention to the details can undermine even the best overall system design. This post covers a few minor details of code structure that I’ve found to be useful in my work:

It’s a small thing, but one of my favorite utility methods is a short way to throw run-time exceptions.

public static void FAIL(String message)
{
    throw new RuntimeException(message);
}

Defining this method accomplishes a few useful goals. The first is that (with an import static) it makes it possible to throw a RuntimeException with 22 fewer characters of source text per call site. If you’re writing usefully descriptive error messages (which you should be), this can significantly improve the readability of the code. The text FAIL tends to stand out in source code listings, and bringing the error message closer to the left margin of the source text makes it more obvious. The symbol FAIL is also easy to identify with tools like grep, ack, and M-x occur.

To handle re-throw scenarios, it's also useful to have another definition that lets you specify a cause for the failure.

public static void FAIL(String message, Throwable cause)
{
    throw new RuntimeException(message, cause);
}

Related to this is a useful naming convention for loop control variables. Thanks in large part to FORTRAN, and its mathematical heritage, it's very common to use the names i, j, and k for loop control variables. These names aren't very descriptive, but they're short and for small loop bodies, there's usually enough context that a longer name would be superfluous. (If your loop spans pages of text, you should use a more descriptive variable name... but first, you should try to break up your loop into sensible, testable functions.) One technique I've found useful for making loop control variables more obvious (and searchable) without going to fully descriptive variable names is to double up the letters, giving ii, jj, and kk.

These are both small changes, but they both can improve the readability of the code. Try them out and see if you like them. If you disagree that they are improvements, it's easy to switch back.

Tags:javaksm
March 26, 2014

Update 2019-01-17: KSM recently redesigned their website in a way that removes the original blog. Because of this, I've taken some of what I wrote then for KSM and re-hosted it here. Thanks are due both to KSM Technology Partners for allowing me to do this and to the Wayback Machine for retaining the content. All the links below are updated to reflect the articles' new locations.


Sorry for the radio silence, but recently I've been focusing my writing time on the KSM Techology Partners Blog. My writing there is still technical in nature, but it tends to be more heavily focused on the JVM. If you're interested, here are a few of what I consider to be the highlights.

In mid-2013, I started out writing about how to use Runnable to explictly enforce dynamic extent in Java. In a nutshell, this is a way to implement try...with...resources in versions of Java that don't have it built in to the language. I then used the dynamic extent technique to build a ThreadLocal that plays nicely with thread pools. This is useful because thread pools require an understanding of which thread you're running on, which thread pooling techniques can abstract away.

Later in the year, I focused more on Clojure, starting off with a quick bit on the relationship of lexical closures to Java inner classes. I also wrote about a particular kind of stack overflow exception that can happen with lazy sequences. Lazy sequences can nicely remove the need to use recursion while traversing their length, but each time two unrealized lazy sequences are combined, it adds to the recursive depth required to compute the first element. For me, this stack overflow was a difficult error to diagnose, because it seemed so counter-intuitive.

I'm also in the middle of a series of posts that relate the GoF command pattern to functional programming. The posts start off with Java, but will ultimately describe a Clojure implementation that compiles a stack based expression language into optimized Java bytecode. If you'd like to play with the code, it's on github.

January 2, 2014

Up to now, the calculator’s main command loop has been a straightforward implementation of a REPL, or ‘read-eval-print-loop’. If you’re unfamiliar with the term, REPLs are the traditional means that interactive programming languages use to provide their interactivity. REPL’s provide a command prompt that a user can use to explore and manipulate the programming environment. In this way, a REPL makes it possible to work more quickly than traditional environments that require a program to be recompiled and restarted to test code changes.

While REPLs can become very complex in the details, the core idea is quite simple. As the name implies, REPL’s read a command from the user, evaluate that command, print the result of that evaluation, and loop back to start again. In rpncalc, all four of these steps are clearly evident in the code of the REPL. This is useful for explanatory purposes, but it closely couples the REPL to specific implementations of ‘read’, ‘evaluate’ and ‘print’. For this post, we’ll look into another way to model a REPL in code that offers a way to break this coupling.

The main command loop of rpncalc contains explicit code for each of the steps in an REPL:


// Set initial state
State state = new State();
 
// Loop until we no longer have a state.
while(state != null) {
 
    // Print the current state
    System.out.println();
    showStack(state);
 
    // Print a prompt, and read the next command from the user.
    System.out.print("> ");
    String cmdLine = System.console().readLine();
 
    if (cmdLine == null)
        break;
 
    Command cmd = parseCommandString(cmdLine);
 
    // Evaluate the command and produce the next state.
    state = cmd.execute(state);
}

This code is easy to read and explicit in intent, but it totally breaks down if commands can’t be read from the console. In the case of a REPL running on a server, it may be the case that a REPL needs to print and read over a (secured!) network connection. What would be useful is a way to decouple the mechanism for reading command from the loop itself.

In functionally oriented languages, this problem can be addressed by extending the REPL function with function arguments. These function arguments allow different implementations of read and print to be plugged into the same basic loop structure. Default implementations can be provided that connect to the console, with other implementations that might read and print using a network connection, or some other command transport. In Java, a similar effect can be achieved using functional interfaces (aka SAM types) to provide the pluggable alternative implementations. In fact, Java 8’s syntax for anonymous functions will make this approach syntactically convenient. Java also provides ways to achieve this extensiblity via class derivation.

Another way to view this problem can be seen by slightly changing your perspective on the REPL. It may not be completely obvious, but as with many loops, the REPL is iterating over a sequence of values. In the case of the REPL, the sequence is the sequence of commands that the user enters in response to prompts. For each command in the sequence, the REPL updates the current state and advances to the next sequence element. This isn’t as concrete as iterating over an in-memory data structure (and it isn’t necessarily bounded) but the semantics of the iteration are the same. The key to implementing this design is to provide a version of Iterable that implements iteration over a command stream. Given an iterable command stream, the REPL takes on a slightly different character:

// Set initial state
State state = new State();
 
// Loop over all input commands
for(Command cmd : new ConsoleCommandStream()) {
 
    // Evaluate the command and produce the next state.
    state = cmd.execute(state);
 
    if (state == null)
        break;
 
    // Print the current state
    showStack(state);
}

Compared to the initial loop implementation, this version is completely detached from the mechanisms used to prompt the user for input and read incoming commands. The termination criteria is also simpler: there isn’t an explicit check for the end of the command stream. The implicit termination check within the foreach loop captures that requirement.

The other component of this implementation is the implementation of the CommandStream. Unfortunately, this is where Java extracts its tax in lines of code for the additional modularity of this design. Like all iterable objects, the console command stream implements the Iterable interface. The iterator itself is defined as an anonymous inner class:

class ConsoleCommandStream implements Iterable<Command>
{
    public Iterator<Command> iterator()
    {
        return new Iterator<Command> ()
        {

One of the complexities of implementing Java’s Iterator interface is that callers must be able to call hasNext any number of times (zero to n) before each call to next. It’s not possible to assume one and only one call to hasNext for each call to next, despite the fact that the foreach does make that guarantee. Without going into the details, this implies that the actual advance operation can occur within either next or hasNext. While there are several ways to implement this, the approach I like to use is to have a separate method that advances the iterator, but only if it needs to be advanced. (Calls to next put the iterator into ‘requires advance’ state.) The advanceIfNecessary method is where the bulk of the work of the command stream takes place, including prompting the user, and reading and parsing the command.

Command nextCmd = null;
 
private void advanceIfNecessary()
{
    if (nextCmd != null)
        return;
 
    System.out.println();
    System.out.print("> ");
 
    String cmdLine = System.console().readLine();
 
    if (cmdLine == null)
        return;
 
    try {
        nextCmd = parseCommandString(cmdLine);
    } catch (Exception ex) {
        throw new RuntimeException("Error while parsing command: " + cmdLine, ex);
    }
}

In this way, Java’s built in support for iteration can be used to break the REPL apart into sub-compoments for handling the stages of command processing. The REPL is still clearly a REPL, but it no longer has explicit dependencies on the means used to acquire input commands. Unfortunately, the REPL still has explicit coupling for command evaluation and printing the result. As it stands now, we could modify the REPL to read commands from a network port, but we couldn’t redirect the output away from the local console. In the next post in the series, we’ll use the idea of reduce from functional programming to break the REPL into a pipeline of iterators. This will bring the rest of the flexibility we need.

December 17, 2013

Throughout the last four parts of this series, the common theme has been that the state of the calculator program has been managed globally. This requires the main command loop to directly update global data after each command, to prepare the state for the next command. While this works, it would be nice to remove the need for the global update. This post talks about how that’s done in the functional version of rpncalc.

The last installment in the series updated the calculator by adding a Java class to model State. This allows the entire state of the calculator to be managed as a first-class entity within the program. That’s evident from the fact that the calculator now models state through a single global reference:

public class RpnCalc extends Calculator
{
    // ...
 
    private State state = null;

Before I continue, I should clarify by saying that state isn’t truly a global variable. Obviously state is an instance variable within a single instance of RpnCalc. However, because of the fact that the calculator program only runs with a single instance of RpnCalc, any instance variables within that class are effectively global within the program, and suffer many of the same problems associated with true globals. This parallel is also true for instance variables within singletons managed by Spring. They’re instance variables, but they act like globals and should be viewed with the same kind of suspicion.

To remove the global state, we’re going to take advantage of the fact that the last update to rpncalc changed the signature for a command, so that it returns a new copy of the state:

abstract class Command
    {
        State execute(State in)

This implies that the main command loop has direct access to the updated state, and can manage it as a local variable:

public void main()
        throws Exception
    {
        State state = new State();
 
        while(state != null) {
            // ...read and gather command...
            state = cmd.execute(state);
        }
    }

This eliminates the global variable, moving the same data to a single variable contained within a function definition. This hugely reduces the variable’s footprint within the code. The global variable can be read and updated by 200 lines of code, compared to 20 lines of code for the local variable. For a developer, this makes the management of state more evident within the code, and makes it easier to write and maintain correct code.

Unfortunately, there’s a problem. For most of the commands, storing state in a local variable works well: a command like addition only needs access to the current state to produce the next state. Where it breaks down is in the undo command. In contrast to every other command, undo doesn’t need access to the current state, it needs access to the previous state. Now that we’ve moved the state management entirely within the command loop function, there’s no way for the undo command to get access to the history of states. Given that undo is a requirement, we need to find a solution.

One approach would be to extend the command loop to maintain a full history of states, and then broaden the command function signature to accept both the current and previous states. Each command would then be a function on a history of states. This approach has some appeal, but a simpler way to achieve the same result is to acknowledge the fact that each state is a function of the previous state, and store a back reference within each state:

private class State
{
    State prev;
    // ...
 
    State()
    {
        prev = null;
        // ...
    }
 
    State(State original)
    {
        prev = original;
        // ...
    }
}

This makes undo easy:

cmds.put("undo", new Command() {
        public State execute(State in) {
            // skip past the copy made for 'undo', and get to the previous.
            return in.prev.prev;
        }
    });

This design is also closely related to how git stores commits: each commit stores a back link to the previous commit in the history. It does have potentially unbounded memory usage, but our state in rpncalc is small enough in size and our memory is large enough that this shouldn’t present a problem. If it ever did present a memory consumption problem, adding a bound on the length number of retained states is as simple as walking down the previous links and nulling the previous link of the last history element to retain.

Next up, we’ll start taking a bit more advantage of the newly functional nature of rpncalc, and start using functional techniques to decompose the main command loop into component modules.

Older Articles...