Tuesday, 8 September 2015

Garbage free ArrayList

Lists are wonderful little data structures, and find themselves in most (all?) projects and applications I've ever been involved in.  Over the last couple of years, I've become more interested in both the performance and garbage production of data structures (I'll cover Strings another day).  As a result, I've tended to favour ArrayList's - although they're not perfect.


The garbage

Let's take a look at some aspects of ArrayList that do create garbage.  When you add new elements to the ArrayList, it will at some point grow the array to hold more elements:


    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

Assuming you're able to use either a state-based object or ThreadLocal to keep an ArrayList around for a long time then the size of the array isn't that important as it will never be collected.

The real hurdle you will face is either removing elements or clearing the list.  Both of these methods set element data within the array to null, which will allow the underlying references to be collected at some point:


    public void clear() {
        modCount++;

        // clear to let GC do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;

        size = 0;
    }

A really basic example could result in code like:


    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.add(new Price().bid(0.9837).offer(0.9845));
        prices.add(new Price().bid(0.9838).offer(0.9844));
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
        prices.clear(); // elements set to null => likely garbage
    }


Granted, the Price objects might not be created there, but they need to be created somewhere.  Also, you could probably pool the Price's in an external pool somewhere, but I imagine management would be quite tricky to ensure that Price's were reserved and released in a consistent order.

I think it's highly likely you will wind up with either a complex solution for a simple problem or garbage.  Neither are desirable.


MutablesArray

I think it's fairly safe to say that most use cases of ArrayList don't use all of it's public methods (and I'd shudder to see code that did).  I certainly don't.  I generally:
  1. add items
  2. clear the list
  3. iterate the list
  4. search the list
Using the 4 requirements above, it's not that cumbersome to build a new array list that somewhat doubles as a object pool.  Let's dive straight into the code:

public class MutablesArray<T extends Mutable> implements Iterable<T> {

    private final MutablesIterator mutablesIterator = new MutablesIterator();

    private T[] mutables;

    private int numMutables;

    public MutablesArray(Class<T> cls, int initialSize) {
        //noinspection unchecked
        mutables = expand((T[]) Array.newInstance(cls, 0), initialSize);
        numMutables = 0;
    }

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

    public MutablesArray<T> clear() {
        numMutables = 0;
        return this;
    }

    public int size() {
        return numMutables;
    }

    public T get(int i) {
        return mutables[i];
    }

    public Iterator<T> iterator() {
        return mutablesIterator.reset();
    }

    public T find(Predicate<T> predicate) {
        for (int i = 0; i < numMutables; i++) {
            final T mutable = mutables[i];
            if (predicate.test(mutable)) {
                return mutable;
            }
        }
        return null;
    }

    private void ensureCapacity() {
        int required = numMutables + 1;
        if (required > mutables.length) {
            mutables = expand(mutables, required);
        }
    }

    private class MutablesIterator implements Iterator<T> {

        private int idx;

        public MutablesIterator reset() {
            idx = 0;
            return this;
        }

        @Override
        public boolean hasNext() {
            return idx < numMutables;
        }

        @Override
        public T next() {
            return mutables[idx++];
        }
    }
}

The two most important methods in MutablesArray are create and clear.  Let's start with create.


MutablesArray.create

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

The difference compared to ArrayList, is that we no longer pass an instance in that we wish to be held, but request an instance to modify.  The array holding the underlying data is expanded when necessary, as in ArrayList.

This means our data structure is essentially acting as a pool, which is also responsible for element data instance creation.  Elements are only created when the capacity of the underlying array is increased.  So, once we're at a suitably large level we are simply reusing existing instances.


MutablesArray.clear

    public MutablesArray clear() {
        numMutables = 0;
        return this;
    }

In a similar fashion to ArrayList.clear, we also set the number of elements to 0.  However, we do not set the element data to be null, meaning the data is not available to be collected as garbage.

The same example

    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.clear();
        prices.create().bid(0.9837).offer(0.9845);
        prices.create().bid(0.9838).offer(0.9844);
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
    }

The code barely changes, with the exception of Price creation.

Conclusion

What I've shown above is a relatively simple way to provide some key list-style functionality in a garbage free manner.  It does require your domain/model objects to be mutable and whilst it's not for everyone, it certainly helps when you're trying to reduce garbage collection frequencies.

Hopefully this helps you be more garbage aware next time you're coding.

As always, the full source is available on github.

No comments:

Post a Comment