My Coding Life

Tuesday, 8 September 2015

Garbage free ArrayList

Lists are wonderful little data structures, and find themselves in most (all?) projects and applications I've ever been involved in. Over the last couple of years, I've become more interested in both the performance and garbage production of data structures (I'll cover Strings another day). As a result, I've tended to favour ArrayList's - although they're not perfect.

The garbage

Let's take a look at some aspects of ArrayList that do create garbage. When you add new elements to the ArrayList, it will at some point grow the array to hold more elements:

    private void grow(int minCapacity) {
        // overflow-conscious code
        int oldCapacity = elementData.length;
        int newCapacity = oldCapacity + (oldCapacity >> 1);
        if (newCapacity - minCapacity < 0)
            newCapacity = minCapacity;
        if (newCapacity - MAX_ARRAY_SIZE > 0)
            newCapacity = hugeCapacity(minCapacity);
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }

Assuming you're able to use either a state-based object or ThreadLocal to keep an ArrayList around for a long time then the size of the array isn't that important as it will never be collected.

The real hurdle you will face is either removing elements or clearing the list. Both of these methods set element data within the array to null, which will allow the underlying references to be collected at some point:

    public void clear() {
        modCount++;

        // clear to let GC do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;

        size = 0;
    }

A really basic example could result in code like:

    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.add(new Price().bid(0.9837).offer(0.9845));
        prices.add(new Price().bid(0.9838).offer(0.9844));
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
        prices.clear(); // elements set to null => likely garbage
    }

Granted, the Price objects might not be created there, but they need to be created somewhere. Also, you could probably pool the Price's in an external pool somewhere, but I imagine management would be quite tricky to ensure that Price's were reserved and released in a consistent order.

I think it's highly likely you will wind up with either a complex solution for a simple problem or garbage. Neither are desirable.

MutablesArray

I think it's fairly safe to say that most use cases of ArrayList don't use all of it's public methods (and I'd shudder to see code that did). I certainly don't. I generally:

add items
clear the list
iterate the list
search the list

Using the 4 requirements above, it's not that cumbersome to build a new array list that somewhat doubles as a object pool. Let's dive straight into the code:

public class MutablesArray<T extends Mutable> implements Iterable<T> {

    private final MutablesIterator mutablesIterator = new MutablesIterator();

    private T[] mutables;

    private int numMutables;

    public MutablesArray(Class<T> cls, int initialSize) {
        //noinspection unchecked
        mutables = expand((T[]) Array.newInstance(cls, 0), initialSize);
        numMutables = 0;
    }

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

    public MutablesArray<T> clear() {
        numMutables = 0;
        return this;
    }

    public int size() {
        return numMutables;
    }

    public T get(int i) {
        return mutables[i];
    }

    public Iterator<T> iterator() {
        return mutablesIterator.reset();
    }

    public T find(Predicate<T> predicate) {
        for (int i = 0; i < numMutables; i++) {
            final T mutable = mutables[i];
            if (predicate.test(mutable)) {
                return mutable;
            }
        }
        return null;
    }

    private void ensureCapacity() {
        int required = numMutables + 1;
        if (required > mutables.length) {
            mutables = expand(mutables, required);
        }
    }

    private class MutablesIterator implements Iterator<T> {

        private int idx;

        public MutablesIterator reset() {
            idx = 0;
            return this;
        }

        @Override
        public boolean hasNext() {
            return idx < numMutables;
        }

        @Override
        public T next() {
            return mutables[idx++];
        }
    }
}

The two most important methods in MutablesArray are create and clear. Let's start with create.

MutablesArray.create

    public T create() {
        ensureCapacity();
        return mutables[numMutables++].reset();
    }

The difference compared to ArrayList, is that we no longer pass an instance in that we wish to be held, but request an instance to modify. The array holding the underlying data is expanded when necessary, as in ArrayList.

This means our data structure is essentially acting as a pool, which is also responsible for element data instance creation. Elements are only created when the capacity of the underlying array is increased. So, once we're at a suitably large level we are simply reusing existing instances.

MutablesArray.clear

    public MutablesArray clear() {
        numMutables = 0;
        return this;
    }

In a similar fashion to ArrayList.clear, we also set the number of elements to 0. However, we do not set the element data to be null, meaning the data is not available to be collected as garbage.

The same example

    @Test
    public void sample() {
        double bid = 0, offer = 0;
        prices.clear();
        prices.create().bid(0.9837).offer(0.9845);
        prices.create().bid(0.9838).offer(0.9844);
        for (int i = 0; i < prices.size(); i++) {
            bid += prices.get(i).bid();
            offer += prices.get(i).offer();
        }
        System.out.printf("avg; bid: %.5f, offer: %.5f", bid / prices.size(), offer / prices.size());
    }

The code barely changes, with the exception of Price creation.

Conclusion

What I've shown above is a relatively simple way to provide some key list-style functionality in a garbage free manner. It does require your domain/model objects to be mutable and whilst it's not for everyone, it certainly helps when you're trying to reduce garbage collection frequencies.

Hopefully this helps you be more garbage aware next time you're coding.

As always, the full source is available on github.

Wednesday, 26 August 2015

Simpler releases with Git (Stash) and TeamCity

Well, this is a couple of days I'd rather have back. Granted, I learnt a lot and think that we've now got a faster and simpler process that before but it certainly wasn't one of the goals when starting the process.

The initial goal was to release some new software into a test environment, which I'm still working on! First things first though.

What does a release involve?

The manual steps that would be involved in actually performing a release:

Checkout a clean source tree from trunk/master/branch
Update the version from a SNAPSHOT to GA release number
Build and deploy the release into a maven repository
Commit the changes to a tag in a central repository

This process allows us to keep a permanent (enough) record of the source when the code was released and also access the binaries from maven in a meaningful way.

There might be other steps that you also perform along the way, but they're the main one in my opinion.

Releasing with maven

My build tool of choice is maven, it works for me and I like the fact that it's descriptive and even a little verbose. We've always had our builds using the maven-release-plugin, which despite it's rough edges has always done the job.

I also create release build configurations in TeamCity, which allow us with a single click to produce a versioned software release into dev integration, test, uat or production. It makes life easy.

Part of the release process requires an scm to be provided to maven:

<scm>
  <connection>scm:git:https://hostname/path/to/repo.git</connection>
  <developerConnection>scm:git:ssh://git@hostname:7999/path/to/repo.git</developerConnection>
<scm>

And here's where it started to get a little tricky.

As you can see, the connection and developerConnection are different. The developerConnection is used during the release process to commit/push changes to a maven pom to a central repository.

I've chosen to use ssh here as we build using a specific build user and I didn't want to provide a username/password in a maven settings file, instead choosing passwordless ssh and registering the key with Stash.

Git, SSH and Windows

And now it started to get really "fun", but I'm going to cut a long story very short.

I tried a myriad of ways to pass an identity key to git during the maven build process, ultimately I was able to discard all of that and simply set an environment variable called HOME and point it to a directory containing a .ssh directory containing my identity file id_rsa.

e.g. HOME=C:\Users\BuildAgent

where C:\Users\BuildAgent contains .ssh\id_rsa.

This environment variable was able to be easily set in TeamCity as a Build Parameter.

Simplifying the release build

During my attempts to get git working with ssh and windows I stumbled across a great post by Axel Fontaine, which made me view the process I described in "What does a release involve?" very literally, rather than the more obscure process that the maven-release-plugin takes.

Essentially it involves:

versions:set versions:commit scm:checkin (but don't push)
deploy
scm:tag (do push)
scm:checkout (the tag that was created above as a verify step)

You could argue that a couple of the steps above are unnecessary but I like checks and balances.

This process allowed me to ditch the maven release plugin (and associated properties) in favour of a simpler maven-scm-plugin and versions-maven-plugin.

The configuration for these plugins sits within a parent pom, in such a way that descendents don't need to apply (or know about) any additional configuration.

As part of the setup for the scm plugin I included the following configuration:

<configuration>
  <tag>${project.version}</tag>
  <connectionType>developerConnection</connectionTypegt;
<configuration>

Tying it together with TeamCity templates

Build Configuration Templates have been around for an eternity, but I've never found a reason to use them. Now I have, and they're great.

I started out specifying a build number format:

then following build steps:

and, finally added an environment Build Parameter for HOME.

Which makes it very quick, simple and easy for release builds to be created with a minimal of effort and certainly less fuss.

Worth it?

We now have a build that:

is twice as quick to perform than maven-release-plugin
requires no configuration within a projects pom file (aside from referencing a parent)
Can be created in less than a minute in TeamCity

You decide.

Tuesday, 18 August 2015

Testing beans using Hamcrest's hasProperty

Like most devs, I write tests. They generally start small but quickly become complex. So, as a general rule over the last 5 or so years, I break tests into a single assertion per test - where possible.

I find this makes it neater to read and more concise to write.

However, it can also create large numbers of tests - particularly when testing transformers or bean properties. I recently came across a neat way of testing for property values using Hamcrest's hasProperty.

Introducing hasProperty()

    @Test
    public void hasProperties() {
        final Price price = new Price()
                .symbol("AUDUSD")
                .bid(0.9865)
                .ask(0.9875);

        assertThat(price, allOf(
                notNullValue(),
                hasProperty("bid",    closeTo(0.9865, 0.00001)),
                hasProperty("ask",    closeTo(0.9875, 0.00001)),
                hasProperty("spread", closeTo(0.001, 0.0001)),
                hasProperty("symbol", equalTo("AUDUSD"))
        ));
    }

It fits all of my requirements:
- concise
- single test (even the null check)
- easy to read

Fluent style Price

The only catch i that it would require Price to be a bean with getters and setters, which I've tended to lean away from lately. Instead preferring a more fluent style of methods:

public class Price {

    private double bid = Double.NaN;

    private double ask = Double.NaN;

    private double spread;

    private String symbol;

    public Price bid(double bid) {
        this.bid = bid;
        calcSpread();
        return this;
    }

    public double bid() {
        return bid;
    }

    public Price ask(double ask) {
        this.ask = ask;
        calcSpread();
        return this;
    }

    public double ask() {
        return ask;
    }

    public double spread() {
        return spread;
    }

    public Price symbol(String symbol) {
        this.symbol = symbol;
        return this;
    }

    public String symbol() {
        return symbol;
    }

    private void calcSpread() {
        if (!Double.isNaN(bid) && !Double.isNaN(ask)) {
            spread = Math.abs(bid - ask);
        }
    }
}

The Price class above invalidates the usage of hasProperty() as it no longer has bean getters and setters.

Tying together with BeanInfo

What we're able to do though, is provide an implementation of a BeanInfo class in our test package:

public class PriceBeanInfo extends SimpleBeanInfo {

    private static final Logger log = LoggerFactory.getLogger(PriceBeanInfo.class);

    private final Class beanClass = Price.class;

    private PropertyDescriptor[] propertyDescriptors;

    public PropertyDescriptor[] getPropertyDescriptors() {
        if (propertyDescriptors == null) {
            collectPropertyDescriptors();
        }
        return propertyDescriptors;
    }

    private void collectPropertyDescriptors() {
        java.util.List fields = new ArrayList<>();
        fields.addAll(asList(beanClass.getDeclaredFields()));
        Class parent = beanClass.getSuperclass();
        while (parent != null) {
            fields.addAll(asList(parent.getDeclaredFields()));
            parent = parent.getSuperclass();
        }

        final java.util.List propertyDescriptors =

                fields.stream().filter(field -> !Modifier.isStatic(field.getModifiers()))
                        .map(p -> {
                            final String propertyName = p.getName();
                            final Method readMethod = findReadMethod(p);
                            final Method writeMethod = findWriteMethod(p);

                            try {
                                return new PropertyDescriptor(
                                        propertyName,
                                        readMethod,
                                        writeMethod
                                );

                            } catch (IntrospectionException e) {
                                log.warn("Failed to create property descriptor for: " + propertyName, e);
                                return null;
                            }
                        }).collect(Collectors.toList());
        this.propertyDescriptors = propertyDescriptors.toArray(new PropertyDescriptor[propertyDescriptors.size()]);
    }

}

The general premise of the BeanInfo implementation is to provide a set of PropertyDescriptor's that match the fluent style methods we created in the Price class.

The full source is available on my github.

So, there you have it - my find of the week that I thought I'd share.

Further thoughts

This code could easily be abstracted into a base class for easy reuse. I would think you could also easily dynamically add getters and setters for classes that didn't have them. This might allow for reuse of hasProperty() rather than using Spring's getField().

Monday, 18 April 2011

Amazon SES - getting rid of the SMTP server

This is my first of a number of Amazon services that I will be integrating with Spring.

Get the Source
The source for these projects can be found at my GitHub spring-amazon-services project.

Background
I've been using Amazon Web Services (AWS) for the last 8 months now. The offering was initially attractive with EC2, EBS and S3. However, over that time they have continued to add more and more functionality.

As a developer, one of the key notification systems we use is email. Sending email with JavaMail is fairly simple, sending it with Spring Mail is even easier. Normally I've been in environments that have SMTP servers setup so it's simply a case of configuring Spring to connect to them and away you go.

However, should you not have an SMTP server handily available, they're fairly daunting to setup.

Simple Email Service (SES) to the rescue!
Amazon SES is a great service, simple to use and extraordinarily cheap! If you're a developer of any ilk, I'd recommend setting up an Amazon account at the least and having a play with their services.

The only issue I have so far is that their sample code isn't production ready (IMO). Given Spring have simple integration, I thought I'd see if I could extend their JavaMailSenderImpl.

Include the AWS jar
I'm a maven fan (love me or hate me), and its simple enough to include the AWS jar:

   <dependencies>  
     <dependency>  
       <groupId>com.amazonaws</groupId>  
       <artifactId>aws-java-sdk</artifactId>  
       <version>1.1.8</version>  
     </dependency>  
     <dependency>  
       <groupId>org.springframework</groupId>  
       <artifactId>spring-context-support</artifactId>  
       <version>3.0.5.RELEASE</version>  
     </dependency>  
   </dependencies>

I've also included spring-context-support, which is required for the next section.

Extending JavaMailSenderImpl
JavaMailSenderImpl is at the core of sending simple mail with Spring. I'll cover a very simple use-case a little further down. Let's get stuck into getting the class ready for SES though.

Firstly, we need to extend the class:

 public class AmazonMailSender extends JavaMailSenderImpl { }

Second, we need to ensure that AWSJavaMailTransport is used as the Transport:


   @Override  
   protected Transport getTransport(Session session) throws NoSuchProviderException {  
     return new AWSJavaMailTransport(session, null);  
   }

Then, we expose the AWS Access ID and the Secret Key. These are both unique to your AWS account and can be found in the security settings of your AWS account. Also, important to note that you should never publish your secret key anywhere unsafe:


   private String awsAccessKeyId;  
   private String awsSecretKey;  
   public void setAwsAccessKeyId(String awsAccessKeyId) {  
     this.awsAccessKeyId = awsAccessKeyId;  
   }  
   public void setAwsSecretKey(String awsSecretKey) {  
     this.awsSecretKey = awsSecretKey;  
   }

Finally, we configure the java mail properties after the bean has been constructed:

   @PostConstruct  
   public void init() {  
     Properties props = getJavaMailProperties();  
     props.setProperty(MAIL_TRANSPORT_PROTOCOL_KEY, "aws");  
     props.setProperty(AWSJavaMailTransport.AWS_ACCESS_KEY_PROPERTY, awsAccessKeyId);  
     props.setProperty(AWSJavaMailTransport.AWS_SECRET_KEY_PROPERTY, awsSecretKey);  
     // set port to -1 to ensure that spring calls the equivalent of transport.connect().  
     setPort(-1);  
   }

Putting it all together and we have:

 public class AmazonMailSender extends JavaMailSenderImpl {  
   public static final String MAIL_TRANSPORT_PROTOCOL_KEY = "mail.transport.protocol";  
   private String awsAccessKeyId;  
   private String awsSecretKey;  
   public void setAwsAccessKeyId(String awsAccessKeyId) {  
     this.awsAccessKeyId = awsAccessKeyId;  
   }  
   public void setAwsSecretKey(String awsSecretKey) {  
     this.awsSecretKey = awsSecretKey;  
   }  
   @PostConstruct  
   public void init() {  
     Properties props = getJavaMailProperties();  
     props.setProperty(MAIL_TRANSPORT_PROTOCOL_KEY, "aws");  
     props.setProperty(AWSJavaMailTransport.AWS_ACCESS_KEY_PROPERTY, awsAccessKeyId);  
     props.setProperty(AWSJavaMailTransport.AWS_SECRET_KEY_PROPERTY, awsSecretKey);  
     // set port to -1 to ensure that spring calls the equivalent of transport.connect().  
     setPort(-1);  
   }  
   @Override  
   protected Transport getTransport(Session session) throws NoSuchProviderException {  
     return new AWSJavaMailTransport(session, null);  
   }  
 }

Spring Configuration
I generally like to configure framework beans as XML, just my preference. This bean can be configured as follows:

 <bean id="mailSender"  
      class="au.com.dt.amazon.AmazonMailSender"  
      p:awsAccessKeyId="#{applicationProperties['amazon.accessKey']}"  
      p:awsSecretKey="#{applicationProperties['amazon.secretKey']}"/>

where applicationProperties is a reference to a Spring Properties bean.

Sending Mail
A brief example of how to send mail with the mailSender we've just configured:

     @Resource(name = "mailSender")
     private MailSender mailSender;

     public void sendMail() {
         SimpleMailMessage message = new SimpleMailMessage();  
         message.setTo("me@home.com");  
         message.setFrom("me@home.com");  
         message.setText("My Amazon SES email!");  
         message.setSubject("SES to the rescue!");  
         try {  
           mailSender.send(message);  
         } catch (MailException me) {  
           log.error("Fail to send activation mail.", me);  
         }  
     }

And there you have it, a nice simple integration of Amazon SES using Spring.

Don't forget that you're able to download this source at my GitHub spring-amazon-services repository.

Wednesday, 13 April 2011

My First Post - GitHub & Blogger

Right, so I've finally decided to setup a blog more as a way of documenting my foray into making some of my code available online. Once its available then at least I'll be able to reuse it in a number of projects and ensure that I can improve it as time goes on.

Blogger
I use google for most other things so added this on as it was one less password to remember. Looks ok so far, although the lack of code formatting may come back to haunt me; although I'm sure someone has something out there to help out!

GitHub
A choice based purely on hype more than anything else. Free open source repository control is becoming very widespread but I wanted to choose something that would be around for a few years to come. Setup was simple and instructions were great!

My first github project: https://github.com/ryanlea/database-integration-seeder. It's currently empty but will get fleshed out over the coming days/weeks.

Yay! Guess I'm one of the cool kids now ... if only I could find my scrums ...