Going Round and Round redux

This article is another visit to the subject raised in this previous post. The aim of this discussion is to convert to Java and present a new solution. The question revolves around the difficulty in finding a good loop construct to iterate over data that comes from something that looks like a stream, but without the usual eof() test.

First, let’s imagine a provider:

public interface Provider {
      Item getNext();
}

Imagine we want to loop over all items from this provider and do something with each one. This surely needs a loop. There are various options:

while(true)

while (true) {
    Item item = provider.getNext();
    if (item=null) {
        break;
    }
    process(item);
}

This works, but has a surprising while(true) in it. It’s not inefficient at runtime but it reads badly.

do while

Item item;
do {
    item = provider.getNext();
    if (item!=null) {
        process(item);
    }
while (item!=null);

This has two null checks for the item, which is a surprise. Reading this, I feel like the loop could have just terminated when the first null check failed, rather than progress on a couple more lines until the while clause repeats the same check.

A for loop

for(Item item = provider.getNext();
      item!=null;
      item=provider.getNext()) {
    process(item);
}

This has a duplication of the provider.getNext() call in it and doesn’t feel incredibly easy to read as there’s lot of custom code in the declaration of the for loop. Compared to a for(int i=0; i<something; i++), this is less familiar.

A normal while loop

Item item = provider.getNext();
while(item!=null) {
    process(item);
    item = provider.getNext();
}

This also has duplication of provider.getNext(), but looks like the simplest option as it’s the sort of loop we’re often used to seeing. The nerdiest of readers might point out that all these loops are equivalent in some way and that a for and a while are basically the same loop with slightly different syntax. They’d be right, but we’re talking about readability, not execution.

No duplication

Item item;
while ((item = provider.getNext())!=null) {
    process(item);
}

This has the fewest statements, but has a rather messy looking clause for the while loop. It’s how C++ would have solved the problem when code readability was for wimps.

In short, it turns out that there’s no nice way of expressing this loop. Don’t think that you can somehow extract the contents to a collection and then use that. To do that would still require solving the above looping problem. Of all the options above, the plainest while-loop is the one which most teams settle on in this situation.

Why is this even happening?
It turns out that the problem is that the Provider is both getting next and telling you if there is a next one. The fact that it’s one step makes it hard to write a loop. If you could do this as two steps, it would be easier. There are reasons why a real-life Provider might only be able to provide a single answer – it may not be clear until you try to read an item whether there are any items to read – e.g. reading from a queue, or from a file which may not contain any more complete entries.

While a provider can only say “item or null”, you have a challenge iterating over it.

There is a solution

An iterator in front of the provider would help:

public class ProviderIterator {
    private Item current;
    private Provider provider;
    
    public ProviderIterator(Provider provider) {
        this.provider = provider;
    }  

    // go to the next one and return false if there isn't one
    public boolean gotoNext() {
        current = provider.getNext();
        return current!=null;
    }

    public Item getCurrent() {
        return current;
    }
}

Meaning you can write a nice loop:

ProviderIterator iterator = new ProviderIterator(provider);
while (iterator.gotoNext()) {
    process(iterator.getCurrent();
}

Searching for a solution that’s broader than just the loop in front of you is the answer. Just inserting a little pattern as a wrapper for interface you don’t want so you can write the code you do want is a nice way forward.

With huge thanks to all the teams who helped me explore this problem and to Shanny for finally asking the question which led to the above answer.

Advertisements

3 comments

  1. Hi Ashley,

    the simple “Provider” Interface has one advantage against the “ProviderIterator”….

    In the simple Provider Interface you just add the “synchronized” keyword to the method to use the implementation in a multi-threaded context (why ever ;-))

    In the “ProviderIterator” I have no “idea”, how to synchronize the two methods in a way, so that in theory, it is not possible to jump an “entry” with the gotoNext-Method.

    Greetings JJR

  2. Hi JJ

    As always your observations are thought provoking. Before I try to answer the question about thread safety, the first thought that comes to my mind is why you would need it. This theoretical provider is effectively streaming a series of values until it hits a stop, so when would this stream need multi-threaded “clients”?

    I can think of some examples for this, and maybe those examples will help understand how you might use this.

    Let’s say that there were a queue of data and we were to access that from multiple threads to process the queue. Let’s say that when there are items “Provider” gives me the first one, and when there are no items left, it gives me a null. Now we have a real-world type of an example which needs us to consider concurrency.

    There’s no doubt in my mind that the Provider, in this case, would have to be thread safe behind its interface. With that said, you don’t have to worry about it. So, the question is whether ProviderIterator would need to be threadsafe.

    Before I answer this, let’s look at one more thought. This blog post is on optimising some code for readability. That’s my first priority, UNLESS something else is more pressing. If we had a concurrency pattern that works best when you lose a little readability and is NEEDED, then we should use that concurrency pattern and tolerate it. The optimisation should be chosen based on the biggest need; readability is a good default.

    There are two answers to the ProviderIterator thread-safety issue:

    1. It doesn’t need to be thread-safe – it’s actually an adapter between a loop pattern we like and a streaming provider. As such, each individual thread that uses it can instantiate it because it actually has the same events within it as the while-loop implementation.

    2. We COULD make it thread-safe. If you can imagine a situation where this were needed, we could use a simple Java synchronisation pattern to make that work. You would have a synchronisation object (literally Object) within the ProviderIterator and use it as a lock to ensure that only one of the two methods could be called at once… or even use it so that a Thread could lock the whole ProviderIterator…

    The principle of parsimony says that my first answer is better šŸ™‚

    Cheers

    Ashley

    • Hello Ashley,

      I would use the multi-threaded Client, perhaps in a Scenario, where the series of values is streamed over network … but you are right it is very hard to find a Scenario, where it is necessary and no other solution can be implemented.

      As you mentioned in your article “A lesson on APIs and Documentation”, also the side-effects should / could / must be stated for “optimisations”.

      Greetings JJR

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s