Tuesday, 8 September 2015

Garbage free ArrayList

Lists are wonderful little data structures, and find themselves in most (all?) projects and applications I've ever been involved in.  Over the last couple of years, I've become more interested in both the performance and garbage production of data structures (I'll cover Strings another day).  As a result, I've tended to favour ArrayList's - although they're not perfect.


The garbage

Let's take a look at some aspects of ArrayList that do create garbage.  When you add new elements to the ArrayList, it will at some point grow the array to hold more elements:


    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

Assuming you're able to use either a state-based object or ThreadLocal to keep an ArrayList around for a long time then the size of the array isn't that important as it will never be collected.

The real hurdle you will face is either removing elements or clearing the list.  Both of these methods set element data within the array to null, which will allow the underlying references to be collected at some point:


    public void clear() {
        modCount++;

        // clear to let GC do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;

        size = 0;
    }

A really basic example could result in code like:


    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.add(new Price().bid(0.9837).offer(0.9845));
        prices.add(new Price().bid(0.9838).offer(0.9844));
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
        prices.clear(); // elements set to null => likely garbage
    }


Granted, the Price objects might not be created there, but they need to be created somewhere.  Also, you could probably pool the Price's in an external pool somewhere, but I imagine management would be quite tricky to ensure that Price's were reserved and released in a consistent order.

I think it's highly likely you will wind up with either a complex solution for a simple problem or garbage.  Neither are desirable.


MutablesArray

I think it's fairly safe to say that most use cases of ArrayList don't use all of it's public methods (and I'd shudder to see code that did).  I certainly don't.  I generally:
  1. add items
  2. clear the list
  3. iterate the list
  4. search the list
Using the 4 requirements above, it's not that cumbersome to build a new array list that somewhat doubles as a object pool.  Let's dive straight into the code:

public class MutablesArray<T extends Mutable> implements Iterable<T> {

    private final MutablesIterator mutablesIterator = new MutablesIterator();

    private T[] mutables;

    private int numMutables;

    public MutablesArray(Class<T> cls, int initialSize) {
        //noinspection unchecked
        mutables = expand((T[]) Array.newInstance(cls, 0), initialSize);
        numMutables = 0;
    }

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

    public MutablesArray<T> clear() {
        numMutables = 0;
        return this;
    }

    public int size() {
        return numMutables;
    }

    public T get(int i) {
        return mutables[i];
    }

    public Iterator<T> iterator() {
        return mutablesIterator.reset();
    }

    public T find(Predicate<T> predicate) {
        for (int i = 0; i < numMutables; i++) {
            final T mutable = mutables[i];
            if (predicate.test(mutable)) {
                return mutable;
            }
        }
        return null;
    }

    private void ensureCapacity() {
        int required = numMutables + 1;
        if (required > mutables.length) {
            mutables = expand(mutables, required);
        }
    }

    private class MutablesIterator implements Iterator<T> {

        private int idx;

        public MutablesIterator reset() {
            idx = 0;
            return this;
        }

        @Override
        public boolean hasNext() {
            return idx < numMutables;
        }

        @Override
        public T next() {
            return mutables[idx++];
        }
    }
}

The two most important methods in MutablesArray are create and clear.  Let's start with create.


MutablesArray.create

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

The difference compared to ArrayList, is that we no longer pass an instance in that we wish to be held, but request an instance to modify.  The array holding the underlying data is expanded when necessary, as in ArrayList.

This means our data structure is essentially acting as a pool, which is also responsible for element data instance creation.  Elements are only created when the capacity of the underlying array is increased.  So, once we're at a suitably large level we are simply reusing existing instances.


MutablesArray.clear

    public MutablesArray clear() {
        numMutables = 0;
        return this;
    }

In a similar fashion to ArrayList.clear, we also set the number of elements to 0.  However, we do not set the element data to be null, meaning the data is not available to be collected as garbage.

The same example

    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.clear();
        prices.create().bid(0.9837).offer(0.9845);
        prices.create().bid(0.9838).offer(0.9844);
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
    }

The code barely changes, with the exception of Price creation.

Conclusion

What I've shown above is a relatively simple way to provide some key list-style functionality in a garbage free manner.  It does require your domain/model objects to be mutable and whilst it's not for everyone, it certainly helps when you're trying to reduce garbage collection frequencies.

Hopefully this helps you be more garbage aware next time you're coding.

As always, the full source is available on github.

Wednesday, 26 August 2015

Simpler releases with Git (Stash) and TeamCity

Well, this is a couple of days I'd rather have back.  Granted, I learnt a lot and think that we've now got a faster and simpler process that before but it certainly wasn't one of the goals when starting the process.

The initial goal was to release some new software into a test environment, which I'm still working on!  First things first though.


What does a release involve?


The manual steps that would be involved in actually performing a release:
  1. Checkout a clean source tree from trunk/master/branch
  2. Update the version from a SNAPSHOT to GA release number
  3. Build and deploy the release into a maven repository
  4. Commit the changes to a tag in a central repository
This process allows us to keep a permanent (enough) record of the source when the code was released and also access the binaries from maven in a meaningful way.

There might be other steps that you also perform along the way, but they're the main one in my opinion.

Releasing with maven


My build tool of choice is maven, it works for me and I like the fact that it's descriptive and even a little verbose.  We've always had our builds using the maven-release-plugin, which despite it's rough edges has always done the job.

I also create release build configurations in TeamCity, which allow us with a single click to produce a versioned software release into dev integration, test, uat or production.  It makes life easy.

Part of the release process requires an scm to be provided to maven:
<scm>
  <connection>scm:git:https://hostname/path/to/repo.git</connection>
  <developerConnection>scm:git:ssh://git@hostname:7999/path/to/repo.git</developerConnection>
<scm>

And here's where it started to get a little tricky.

As you can see, the connection and developerConnection are different.  The developerConnection is used during the release process to commit/push changes to a maven pom to a central repository.

I've chosen to use ssh here as we build using a specific build user and I didn't want to provide a username/password in a maven settings file, instead choosing passwordless ssh and registering the key with Stash.

Git, SSH and Windows


And now it started to get really "fun", but I'm going to cut a long story very short.

I tried a myriad of ways to pass an identity key to git during the maven build process, ultimately I was able to discard all of that and simply set an environment variable called HOME and point it to a directory containing a .ssh directory containing my identity file id_rsa.

e.g. HOME=C:\Users\BuildAgent

where C:\Users\BuildAgent contains .ssh\id_rsa.

This environment variable was able to be easily set in TeamCity as a Build Parameter.

Simplifying the release build


During my attempts to get git working with ssh and windows I stumbled across a great post by Axel Fontaine, which made me view the process I described in "What does a release involve?" very literally, rather than the more obscure process that the maven-release-plugin takes.

Essentially it involves:
  1. versions:set versions:commit scm:checkin (but don't push)
  2. deploy
  3. scm:tag (do push)
  4. scm:checkout (the tag that was created above as a verify step)
You could argue that a couple of the steps above are unnecessary but I like checks and balances.

This process allowed me to ditch the maven release plugin (and associated properties) in favour of a simpler maven-scm-plugin and versions-maven-plugin.  

The configuration for these plugins sits within a parent pom, in such a way that descendents don't need to apply (or know about) any additional configuration.

As part of the setup for the scm plugin I included the following configuration:
<configuration>
  <tag>${project.version}</tag>
  <connectionType>developerConnection</connectionTypegt;
<configuration>

Tying it together with TeamCity templates


Build Configuration Templates have been around for an eternity, but I've never found a reason to use them.  Now I have, and they're great.

I started out specifying a build number format:


then following build steps:

and, finally added an environment Build Parameter for HOME.

Which makes it very quick, simple and easy for release builds to be created with a minimal of effort and certainly less fuss.

Worth it?


We now have a build that:
  • is twice as quick to perform than maven-release-plugin
  • requires no configuration within a projects pom file (aside from referencing a parent)
  • Can be created in less than a minute in TeamCity
You decide.

Tuesday, 18 August 2015

Testing beans using Hamcrest's hasProperty

Like most devs, I write tests.  They generally start small but quickly become complex.  So, as a general rule over the last 5 or so years, I break tests into a single assertion per test - where possible.

I find this makes it neater to read and more concise to write.

However, it can also create large numbers of tests - particularly when testing transformers or bean properties.  I recently came across a neat way of testing for property values using Hamcrest's hasProperty.

Introducing hasProperty()

    @Test
    public void hasProperties() {
        final Price price = new Price()
                .symbol("AUDUSD")
                .bid(0.9865)
                .ask(0.9875);

        assertThat(price, allOf(
                notNullValue(),
                hasProperty("bid",    closeTo(0.9865, 0.00001)),
                hasProperty("ask",    closeTo(0.9875, 0.00001)),
                hasProperty("spread", closeTo(0.001, 0.0001)),
                hasProperty("symbol", equalTo("AUDUSD"))
        ));
    }

It fits all of my requirements:

 - concise
 - single test (even the null check)
 - easy to read

Fluent style Price

The only catch i that it would require Price to be a bean with getters and setters, which I've tended to lean away from lately.  Instead preferring a more fluent style of methods:
public class Price {

    private double bid = Double.NaN;

    private double ask = Double.NaN;

    private double spread;

    private String symbol;

    public Price bid(double bid) {
        this.bid = bid;
        calcSpread();
        return this;
    }

    public double bid() {
        return bid;
    }

    public Price ask(double ask) {
        this.ask = ask;
        calcSpread();
        return this;
    }

    public double ask() {
        return ask;
    }

    public double spread() {
        return spread;
    }

    public Price symbol(String symbol) {
        this.symbol = symbol;
        return this;
    }

    public String symbol() {
        return symbol;
    }

    private void calcSpread() {
        if (!Double.isNaN(bid) && !Double.isNaN(ask)) {
            spread = Math.abs(bid - ask);
        }
    }
}


The Price class above invalidates the usage of hasProperty() as it no longer has bean getters and setters.


Tying together with BeanInfo

What we're able to do though, is provide an implementation of a BeanInfo class in our test package:


public class PriceBeanInfo extends SimpleBeanInfo {

    private static final Logger log = LoggerFactory.getLogger(PriceBeanInfo.class);

    private final Class beanClass = Price.class;

    private PropertyDescriptor[] propertyDescriptors;

    public PropertyDescriptor[] getPropertyDescriptors() {
        if (propertyDescriptors == null) {
            collectPropertyDescriptors();
        }
        return propertyDescriptors;
    }

    private void collectPropertyDescriptors() {
        java.util.List fields = new ArrayList<>();
        fields.addAll(asList(beanClass.getDeclaredFields()));
        Class parent = beanClass.getSuperclass();
        while (parent != null) {
            fields.addAll(asList(parent.getDeclaredFields()));
            parent = parent.getSuperclass();
        }

        final java.util.List propertyDescriptors =

                fields.stream().filter(field -> !Modifier.isStatic(field.getModifiers()))
                        .map(p -> {
                            final String propertyName = p.getName();
                            final Method readMethod = findReadMethod(p);
                            final Method writeMethod = findWriteMethod(p);

                            try {
                                return new PropertyDescriptor(
                                        propertyName,
                                        readMethod,
                                        writeMethod
                                );

                            } catch (IntrospectionException e) {
                                log.warn("Failed to create property descriptor for: " + propertyName, e);
                                return null;
                            }
                        }).collect(Collectors.toList());
        this.propertyDescriptors = propertyDescriptors.toArray(new PropertyDescriptor[propertyDescriptors.size()]);
    }

}


The general premise of the BeanInfo implementation is to provide a set of PropertyDescriptor's that match the fluent style methods we created in the Price class.

The full source is available on my github.
  
So, there you have it - my find of the week that I thought I'd share.

Further thoughts

This code could easily be abstracted into a base class for easy reuse.  I would think you could also easily dynamically add getters and setters for classes that didn't have them.  This might allow for reuse of hasProperty() rather than using Spring's getField().