Stephen Nimmo

Energy Trading, Risk Management and Software Development

Month: November 2012

Demo Code: Create Persistence Jar using JPA

I love keeping my repository code in a single jar, isolated from all other code. Persistence code should be portable and reusable as a library specific to a database or even a schema. This wasn’t always the easiest thing to do, especially in an ecosystem where the library may run in a Spring based webapp, a swing gui and a Java EE EJB application. Here’s the template code for how to get that ability.

First, let’s look at the basic EntityManager usage pattern. There are much more sophisticated ways of doing this but I’ll keep it simple for my own sake.

[java]
//Get the correct persistence unit and EntityManagerFactory
EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("demoManager");
EntityManager entityManager = entityManagerFactory.createEntityManager();
entityManager.getTransaction().begin();
//Create an object and save it
entityManager.persist(new ApplicationUser());
//We are just testing so roll that back
entityManager.getTransaction().rollback();
//Close it down.
entityManager.close();
entityManagerFactory.close();
[/java]

JPA Persistence is driven from a file in your classpath, located at META-INF/persistence.xml. Essentially, when creating the EntityManagerFactory, the Persistence class will go look for the persistence.xml file at that location. No file? You get a INFO: HHH000318: Could not find any META-INF/persistence.xml file in the classpath error. Eclipse users: Sometimes you gotta clean to get the file to show up for the junit test. Here’s a simple persistence.xml that shows how to use JPA outside of a container.

[xml]
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
version="2.0">

<persistence-unit name="demoManager" transaction-type="RESOURCE_LOCAL">
<class>com.stephennimmo.demo.jpa.ApplicationUser</class>
<properties>
<property name="javax.persistence.jdbc.driver" value="org.hsqldb.jdbcDriver" />
<property name="javax.persistence.jdbc.user" value="sa" />
<property name="javax.persistence.jdbc.password" value="" />
<property name="javax.persistence.jdbc.url" value="jdbc:hsqldb:." />
<property name="hibernate.dialect" value="org.hibernate.dialect.HSQLDialect" />
<property name="hibernate.hbm2ddl.auto" value="create-drop" />
</properties>
</persistence-unit>

</persistence>
[/xml]

Notice when you create the EntityManagerFactory you need to give it the name of the persistence-unit. Rest is pretty vanilla, but if you need some additional explanation.

Next, let’s look at the basic JPA object.

[java]
@Entity(name="APPLICATION_USER")
public class ApplicationUser implements Serializable {

private static final long serialVersionUID = -4505032763946912352L;

@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
@Column(name="APPLICATION_USER_UID")
private Long uid;

@Column(name="LOGIN")
private String login;

//Getters and Setters omitted for brevity sake

}
[/java]

And if you want to use the JPA in a container, here’s a simple example of how the persistence.xml would change.

[xml]
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd" version="2.0">

<persistence-unit name="demoManager" transaction-type="JTA">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<jta-data-source>java:/DefaultDS</jta-data-source>
<class>com.stephennimmo.demo.jpa.ApplicationUser</class>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.HSQLDialect" />
<property name="hibernate.hbm2ddl.auto" value="create-drop" />
</properties>
</persistence-unit>

</persistence>
[/xml]

That’s basically it. The most painful point comes if you are trying to use a persistence jar inside EJB jar, it REQUIRES you to list out the classes in the persistence.xml.

As always, demo code available at my public repository.

http://code.google.com/p/stephennimmo

DevOps culture can create a strategic advantage for your company

I’ve recently stumbled across a great podcast called DevOps Cafe. These guys basically push out about a show a week, interviewing different people from the DevOps community or simply discussing DevOps culture and why it’s great. DevOps is a software development culture that stresses communication, collaboration and integration between software developers and operations. DevOps blurs the line between development, deployment and system administration creating a more team based approach rather than throwing things over the proverbial wall. DevOps is something that has existed for many years now, but like so many other things, there is finally a nomenclature and language being built around it because of its adoption into the enterprise arena.

What are some of the things this solves? If any of these ring true in your organization, you might want to take a look at DevOps.
  • Does it take weeks to deploy a new build to production?
  • Does the code deployment require manual intervention such as ops updating a properties file?
  • Is your QA team manually testing some production support patch when they should be busy working on regression tests for the next build?
  • After any release, do you must have a period of  “all hands on deck” because you aren’t sure if something is going to go wrong?

What does DevOps culture bring to your organization that can create an advantage?

  • Agile development – small sprints create small deliverables that provide small, manageable change. Lots of small changes rather than huge changes can lower the risks of deployments and help get truly needed bug fixes and enhancements out the door. Hey trading organizations – this should be something you are very interested in as you typically have lots of small changes!
  • Continuous Delivery – your dev teams should be able to push the latest code to production at all times. There are no more huge branch and merge operations, because development work is broken down into many small chunks and pushed continuously and tested continuously. I can’t tell you how many shops I have worked in that take weeks to deliver to production already completed code, only to have that code be obsolete or need changes prior to going live. Moving to production could be a weekly thing and something you do every week should be easy, right?
  • Automation – if you do something more than twice, you need to automate it. Developers should be writing code, not creating builds. QA teams should be creating repeatable regression tests, not manually testing the UI again. Operations should be writing new optimized deployment scripts, not manually patching servers. And they should all be working together to push the product to production – a failure on any part is a failure on the whole. We should all be in this together.

How do you get started?

  1. DevOps is a culture, not a tool set  A new paradigm needs to be established and continuously reinforced – we are all in this together.
  2. From a development side, stop branching and starting checking in on the trunk. Create a new build on every check-in and have that build fully regression tested. EVERY TIME you check in, the code should be production ready.
  3. From the ops side, start automating the environment by using virtualization. Ops should be able to add a new node to a cluster with the push of a button. Ops should be able to deliver builds to a host with the push of a button.
  4. Your QA team needs to refocus their efforts on producing repeatable regression tests on the builds. Again, this should be repeatable.

Here’s some of my favorite tools for the jobs (Remember, I am a java guy).

  • Build with Maven.
  • Manage your artifacts with Sonatype Nexus. I also like Apache Archiva.
  • Automate your testing with TestNG, JUnit. Create regression testing for webapps using Selenium.
  • Deploy using Puppet. I also like Jenkins. Depends on how fancy your infrastructure is. Puppet does a lot with infrastructure management as well.

Performance Killers for Low Latency Java Applications

There are fundamental differences in writing low latency programs which require much more planning and detailed execution than your normal web application or basic workflow application. When optimizing for low latency, here are some of the touch points to be addressed.

  • The first and most deadly is IO. IO of any kind includes network, disk or anything outside the CPU and memory (and sometimes anything outside the CPU). This is the toughest issue to manage because low latency applications also tend to have additional non-functional requirements around failover and guaranteed delivery of messages. We want things to be fast, but losing information along the way is a non-starter. What are the typical ways to reduce the chances of lost data? That’s right, almost all of them involve IO. This answer usually comes down to expectations around performance. How fast is fast enough and does that requirement make redundancy or guaranteed messaging mutually-exclusive?
  • Garbage collection is one of the greatest gifts and largest curses to Java developers. The first time I realized I could instantiate an object without worry about it’s size or having to specifically release the object, it was a great weight off my shoulders. That was until the first time a 3 second garbage collection hit in the middle of processing a thousand messages a second. When building low latency applications, why you create objects becomes important again and object reuse strategies can make a huge difference in your overall application performance. If you are a java guy, you need to be able to discuss GC at a very low level. http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
  • Non-essential functionality needs to be removed. AOP pointcuts around every service call for logging or whatever adds method calls and object creation to the stack. Unless your application needs it, get rid of it. Especially the logging and do the logging right if you need it! http://www.javalobby.org/java/forums/t18758.html
  • Hardware makes a difference. You have a low latency application and your running in the cloud? Virtualization adds network hops. Using commodity hardware? Those absolutely necessary IO calls would perform much better on a fibre. Spinning disks instead of SSD? Get out of here.

Once these other issues are tackled, then you can actually start coding for low latency. See, you could spend thousands of hours pouring over every detail of your code, optimizing which Collections object you use and reducing three lines of code to two, but if you do all of that and run the app on EC2 and use file-based SAN for your messaging redundancy persistence, you are toast. Tackle the big questions first before you tackle the little questions.

Here’s some tools to help:

Metrics – http://metrics.codahale.com

JAMon – http://jamonapi.sourceforge.net

JProfiler – FREE! – http://java.dzone.com/articles/jprofiler-your-java-code-could

VisualVM – http://visualvm.java.net/index.html

Yourkit Profiler – http://www.yourkit.com/overview/index.jsp

Demo Code: Java Concurrency Framework, Multithreading and Race Conditions

This demo template was used to demonstrate not only the capabilities around multithreading using Java Concurrency Framework, specifically the executors. Instead of doing the “Hello World” type of tutorials which have already been done very well by others, I chose to use the time to demonstrate how to extend the framework to solve an issue.

The use case in question is related to FIX message processing, or actually dependent transaction processing in general. For example, let’s say someone sends an equities NewOrderSingle to buy 100 shares of $GOOG, then immediately realizes they don’t want the order and send a OrderCancelRequest. These two messages will be related and need to be processed in order. Processing the cancel prior to the new order would result in two errors – DK for the cancel and would leave an open order, which is bad. For these messages, they will have an identifier (ClOrdID) that is common between them and can be used to identify them.

If you have a multithreaded system, taking messages in and processing them, you could have a race condition where one thread picks up the order and another thread (on possibly a different machine) pick up the cancel. This results in the possibility of the cancel thread finishing first. We can’t have that.

To simplify the use case, we simply the logic. We want the same thread to process both messages. We will accomplish this by mapping the identifier (ClOrdID) to a thread using a local concurrent map, and routing any subsequent messages into the same thread by looking in the map for the key. No key, find the least used queue. Key is there, send it into the same queue. But wait, these chains could have no end because of a FIX message called OrderCancelReplace which will allow you to simultaneously cancel an existing order and replace it with a new order, usually with a different quantity or price. With these messages, you could be a one-to-many relationship for a thread. Here’s an example.

New -> 1000
CanRepl -> 1000, 1001
CanRepl -> 1001, 1002
Cancel -> 1002

Yes, that means that any messages related to these 3 different order ids need to roll into the same queue if they are all still in process. So if you have 10 threads running and the new 1000 goes to thread 5, then all of the order ids (1000, 1001, 1002) need to go to thread 5 if their related messages are still processing. Fun, eh? Try building this logic with 10 fix engines routing messages into an ESB with a cluster of message driven beans listening to the queues. It gets complicated. To the code!

First I needed to extend the Runnable interface to create the ability to see what my current and dependent keys are. For messages like NewOrderSingle, there is no dependent key, but the ClOrderID is the key. For OrderCancelReplaceRequest, the ClOrdID is the key, and the OrigClOrdID will be the dependent key.

[java]
public interface KeyedRunnable<T> extends Runnable {

T getKey();

T getDependentKey();

}
[/java]

Then there are two phases to the handling of these messages. The first is handling and routing inbound messages to the right thread. This service creates a set of ThreadPoolExecutor objects, each with a single thread. Yes, a pool of single threaded thread pools is a thread pool. Why did I do it this way? Because the executors do a bunch of stuff that you have to hand code if building your own Thread objects.

[java]
public class KeyedExecutorService<T> {

private int numberOfExecutors;
private Random random = new Random();
private ExecutorService unorderedExecutorService;
private ConcurrentMap<T, UUID> uuidMap = new ConcurrentHashMap<T, UUID>();
private ConcurrentMap<UUID, KeyedThreadPoolExecutor<T>> threadPoolExecutorMap = new ConcurrentHashMap<UUID, KeyedThreadPoolExecutor<T>>();

public KeyedExecutorService(int numberOfExecutors) {
unorderedExecutorService = Executors.newCachedThreadPool();
this.numberOfExecutors = numberOfExecutors;
for (int i = 0; i < numberOfExecutors; i++) {
UUID uuid = UUID.randomUUID();
KeyedThreadPoolExecutor<T> threadPoolExecutor = new KeyedThreadPoolExecutor<T>(uuidMap, 1, 1, Long.MAX_VALUE, TimeUnit.SECONDS, new LinkedBlockingQueue<Runnable>());
threadPoolExecutorMap.put(uuid, threadPoolExecutor);
}
}

public void execute(KeyedRunnable<T> command) {
if (command.getKey() == null && command.getDependentKey() == null){
unorderedExecutorService.execute(command);
}
synchronized (uuidMap) {
if (command.getDependentKey() == null){
System.out.println("No Dependent Key, sending to least used");
executeInLeastActivePool(command);
} else {
if (uuidMap.containsKey(command.getDependentKey())){
UUID uuid = uuidMap.get(command.getDependentKey());
uuidMap.put(command.getKey(), uuid);
System.out.println("Dependent Key Found, sending to " + uuid);
threadPoolExecutorMap.get(uuid).execute(command);
} else {
System.out.println("No Dependent Key Found, sending to least used");
executeInLeastActivePool(command);
}
}
}
}

private void executeInLeastActivePool(KeyedRunnable<T> command){
//This could be round-robined if you have a fast processing system.
Set<Entry<UUID, KeyedThreadPoolExecutor<T>>> entrySet = threadPoolExecutorMap.entrySet();
for (Entry<UUID, KeyedThreadPoolExecutor<T>> entry : entrySet) {
if (entry.getValue().getQueue().size() == 0) {
uuidMap.put(command.getKey(), entry.getKey());
entry.getValue().execute(command);
return;
}
}
List<UUID> uuidList = new ArrayList<UUID>(threadPoolExecutorMap.keySet());
UUID uuid = uuidList.get(random.nextInt(numberOfExecutors));
uuidMap.put(command.getKey(), uuid);
threadPoolExecutorMap.get(uuid).execute(command);
}

public void shutdown(){
unorderedExecutorService.shutdown();
for (Iterator<UUID> iterator = threadPoolExecutorMap.keySet().iterator(); iterator.hasNext();) {
threadPoolExecutorMap.get(iterator.next()).shutdown();
}
}

}
[/java]

So once the message get to the right thread, after it executes, we need to remove that key from the mapping to make sure it opens back up. If a Cancel comes in for ClOrdID 1000, but the NewOrderSingle is already processed, then it can be processed anywhere. Notice the two classes both have a reference to the ConcurrentMap that stores the UUIDs for thread mapping/routing.

[java]
public class KeyedThreadPoolExecutor<T> extends ThreadPoolExecutor {

private ConcurrentMap<T, UUID> uuidMap;

public KeyedThreadPoolExecutor(ConcurrentMap<T, UUID> uuidMap, int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue) {
super(corePoolSize, maximumPoolSize, keepAliveTime, unit, workQueue);
this.uuidMap = uuidMap;
}

@SuppressWarnings("unchecked")
@Override
protected void afterExecute(Runnable command, Throwable t) {
super.afterExecute(command, t);
if (command instanceof KeyedRunnable<?>){
KeyedRunnable<T> keyedRunnable = (KeyedRunnable<T>)command;
if (keyedRunnable.getKey() == null && keyedRunnable.getDependentKey() == null){
return;
}
synchronized (uuidMap) {
uuidMap.remove(keyedRunnable.getKey());
}
}

}

}
[/java]

Here’s how you would use the API.

[java]
@Test
public void runKeyedExecutionTest(){
KeyedExecutorService<String> threadMultipoolExecutor = new KeyedExecutorService<String>(10);
int total = 0;
for (int i = 1; i < 100000; i++) {
for (int j = 1; j < 100; j++) {
String aString = UUID.randomUUID().toString();
StringPrinter sp = new StringPrinter(Integer.toString(j), Integer.toString(j – 1), aString);
threadMultipoolExecutor.execute(sp);
total++;
}
}
System.out.println("Done " + total);
threadMultipoolExecutor.shutdown();
}
[/java]

This algorithm can be expounded to a distributed system, however any consumption still needs to be single threaded on the backend! Realistically, just process it in the local threads and scale your fix farm out horizontally.

Demo Code Repository – http://code.google.com/p/stephennimmo/
Quickfix/J – http://www.quickfixj.org

Scrum without metrics is useless

You can read all day about the benefits and weaknesses involved in agile project delivery. In my opinion, the most important aspect is fairly simple: I want my team to deliver more code with less defects in an accurate time frame.

You can spend all day deciding prioritization of backlogs, argue about whether or not scope changes should result in sprint cancellations, or whether or not the ideal workday has 6 or 7 hours of capacity but if your team isn’t getting better at doing it’s job, your missing out on the real power of scrum. Here’s a few steps you can take to help reap the benefits of scrum.

  • It all starts with planning. Have a sprint planning session prior to kickoff. Have your entire team sit down with the selected sprint backlog items and provide estimates for development. But don’t just throw numbers out there. Make sure everyone is giving independent estimations and then compare everyone’s estimates. If you have 3 team members give a 2 day estimation and the 4th member says it’s ten days, then something is wrong. Stop. Discuss and get everyone to buy in to the estimate. Write down these planning numbers and have those numbers available at the end of the sprint – regardless of the mid-sprint changes.
  • At the end of the sprint, take the actual hours and compare them against the estimations. Create a tolerance level and for any estimations outside the tolerance, have the team answer for and discuss the reasons for divergence. While these are considered estimations, your developers should be held accountable for them, not because they need to be disciplined but because they should WANT to get better at providing estimations.
  • Demo your code. Do it for anyone in the company. If no one shows up, still keep doing it. It’s just as much for your development team as it is for the product customers. Show off!
  • Don’t be afraid to change your sprint’s underlying attributes to better suit the product backlog. If your demo creation is a big ordeal, then moving to 3 week sprints from 2 week sprints would give you more bandwidth.

Agile isn’t the silver bullet. It won’t solve your problems but it will make your problems very painful, very fast.

Energy Trading and Risk Management: It’s Time for STP

Originally published on Derivsource, an online community and information source for professionals active in derivatives processing, technology and related services.: http://www.derivsource.com/content/energy-trading-and-risk-management-it’s-time-stp#

 

The technology involved with energy trading and risk management is undergoing rapid and sometimes volatile changes, creating opportunities for companies to develop a distinct competitive advantage. Many forces, such as reduced profit margin on trading activities and increasingly complex regulatory requirements, are at work pushing companies to automate the transaction lifecycle to help increase profitability and reduce costs. Straight-through processing (STP) is the ability to have transaction data flow through a company’s different systems with little to no direct human intervention.  STP infrastructure has been adopted throughout the financial services industry with great success, but has yet to be widely utilized in the energy industry.

New regulatory reporting requirements such as those in the Dodd-Frank Act (DFA) are providing a foundation to push the energy industry toward STP. In particular, for swap data recordkeeping and reporting compliance, a transaction’s data may need to be sent to a swap data repository (SDR) within 15 minutes by year two. These timeframes can create data entry scenarios where there is simply not enough time for additional manual intervention or error correction. In addition, some SDRs are choosing to bundle their confirmation services along with DFA reporting causing confirmation processes to begin immediately, something many energy traders may not be used to because common practice is for end-of-day transaction reconciliation.

How could STP benefit an organization engaged in trading activities? Automating the flow of trade lifecycle data shortens processing time by reducing manual intervention. Additionally, these activities reduce operational risk and costs by reducing errors resulting from manual data entry mistakes. This can in turn improve decision making by integrating real-time data for stronger decision support analysis. Consequently, STP is a very powerful tool to increase the profitability of energy organizations’ trading activities.

For most energy trading organizations, there are many different touch points for automating the flow of trade data, including:

  • Connectivity to designated contract markets (DCMs) and swap execution facilities (SEFs) for the purpose of trade capture
  • Internal connectivity between different systems handling areas such as credit, scheduling, transportation and risk
  • Connectivity to external systems, including SDRs, market operators and even directly with counterparties to facilitate regulatory reporting, trade confirmation and other activities such as physical and financial settlements
  • Connectivity with transmission providers such as pipelines and independent system operators (ISO) for automation of scheduling

Ultimately, the concept is simple; when a trade lifecycle event occurs, energy companies should create a representation of that event using a common communication specification and broadcast the event to allow other systems to process the event independently. The complexity lies in formulating the rules, structure and definitions of the data and events being broadcast. Adapting these procedures and systems to STP will reduce dependence on manual entry processes, assure more reliable data, and improve speed of critical decision, risk and performance information and analysis.

With the tightening profit margins on trading activities and increased transaction collateral requirements, it is up to energy trading organizations to find new and inventive ways to reduce operational costs and increase efficiency. With the right foundation of a common communication protocol and long-term strategic vision for a company’s enterprise, the possibilities seem endless.

 

© 2017 Stephen Nimmo

Theme by Anders NorenUp ↑