Thursday, February 11, 2010

Why I hate everything you love about Java

A couple of days ago, Nick Kallen, the author of magicscalingsprinkles.wordpress.com, wrote an interesting blog entry titled "Why I love everything you hate about Java" (link). He starts with addressing the "hipster programmers who loves Clojure, Ruby, Scala, Erlang, or whatever" and why he thinks that their sympathy for convention-over-configuration is a fallacy. He continues to discuss why his favorite design pattern (dependency injection, factories and decorators) are key features for modular applications.

The blog entry is a very good read and every Java/OO programmer should be familiar with these concepts. However, I disagree with most of his statements about convention-over-configuration and functional programming in particular. Overall, the blog reads a bit like "Why OO design pattern are better than functional programming and why those hippie trends like convention-over-configuration are overrated". It's always hard to cover everything in detail in a blog entry and often you simply have to set a focus (that's why it's called a blog and not an academic paper, right?) and here the focus clearly was set on object-orientation and design pattern. Within the OSGi enterprise expert group we had a short off-topic discussion about this blog entry and someone pointed out that it would be nice to see some counter examples.

Challenge accepted! I will use Clojure to convert the examples to a more functional style.

Convention-over-configuration

In the first part of the blog entry Nick presented an API usage where he always parametrizes the use of a ThreadPoolExecutor instead of using a more simplistic approach driven by convention-over-configuration. The used configuration "depends entirely on the nature of the problem you're solving and how callers of this code behave". Nick states that this boilerplate code "is really important when you work at massive scale" and that it provides "a way to configure the behavior of the system". He further describes that this style of API usage leads to modular software.

I think that this has nothing to do with the language or even the paradigm you choose since it's simply a matter of good API design. Provide meaningful defaults but do not tie the user to unchangeable behavior. However, I think that we can already improve his code and that his requirements even show some weaknesses of object-oriented programming! First, lets define the requirements:

1. Create a parametrized ThreadPoolExecutor
2. Execute a future task
3. The configuration depends on the problem and the caller

To keep the code simple, I will limit the configuration to PoolSize (just to demonstrate it, I won't change the size) and ThreadFactory. I will use the ThreadFactory to assign different names to the created threads. This provides us with an easy way to validate our configuration during task execution.

First, we create variables to define the default values for the ThreadPoolExecutor and ThreadFactory. We will later override the values when we want to parametrize:

(def *pool-size* 10)
(def *thread-name* "default name")

We need a ThreadFactory that uses the *thread-name* variable every time it creates new threads:

(def thread-factory (proxy [ThreadFactory] []
(newThread [runnable] (Thread. runnable *thread-name*))))

For testing purposes we will need several ThreadPoolExecutors so lets create a factory function that creates and returns a new executor. The created executor will use the *pool-size* and the thread-factory which in turn uses the *thread-name*:

(defn create-executor
[]
(ThreadPoolExecutor. *pool-size*
10
(long 500)
(TimeUnit/SECONDS)
(LinkedBlockingQueue.)
thread-factory))

We also need a function that will get called by the executor. We create another factory function that returns FutureTasks. Since every Clojure function implements the java.lang.Runnable interface we can easily pass a Clojure function to the FutureTask constructor. The factory function takes a task-id as parameter. We will use this ID to identify the created task later. When the created tasks get called by the executor, they simply print their ID and the name of the current thread:

(defn create-demo-task
[task-id]
(FutureTask. (fn []
(println "Task ID:" task-id
"Thread Name:" (.getName (Thread/currentThread))))))

Lets try it (note: The print to stdout will happen in a different thread so you might not see the output in your normal console):

(.execute (create-executor) (create-demo-task 1))

=> Task ID: 1 Thread Name: default name

Nothing special so far. The task gets executed and prints its ID and the name of the thread. So far we followed the convention-over-configuration way since we simply used our default value of "default name" for the thread. Our executor usage is very simple and we only implemented requirement 1 and 2. Nick argued that this simplicity is only an illusion and that you therefore always have to write the boilerplate code (for requirement 3). However, Clojure has a simply yet powerful mechanism to get a simple API and a flexible configuration at the same time: Bindings! Lets assume our problem demands a different thread name and that therefore the caller wants to influence the configuration:

(binding [*thread-name* "my new name"]
(.execute (create-executor) (create-demo-task 2)))

=> Task ID: 2 Thread Name: my new name

Bindings can be nested and each binding has its own scope. In the next example, the inner binding overrides the name with "my second name":

(binding [*thread-name* "my first name"]
(.execute (create-executor) (create-demo-task 3))
(.execute (create-executor) (create-demo-task 4))

(binding [*thread-name* "my second name"]
(.execute (create-executor) (create-demo-task 5)))

(.execute (create-executor) (create-demo-task 6)))

=> Task ID: 3 Thread Name: my first name
=> Task ID: 4 Thread Name: my first name
=> Task ID: 5 Thread Name: my second name
=> Task ID: 6 Thread Name: my first name

As you can see, the use of binding is completely optional and we only need to use it when we want to provide a different configuration. Requirement 3 is a very important one: "The configuration depends on the problem and the caller" and this is where object-oriented programming shows some weaknesses:
  • The caller (who knows what configuration makes sense) is often not the one who creates the object (where we need to apply the configuration)! Sometimes the objects get created later in the call chain, sometimes the objects were already created long before the caller knew something about it.
  • Once our tangled web of objects is instantiated, it's quite hard to change it. We either have a global dependency injection configuration or we pass factories around.
  • Whenever we create a new object (to apply a new configuration) we in fact create a new tree of objects since the object creation might imply the creation of child objects. This means that we often have a redundant set of trees with only slight differences. These trees often need to share some other objects so we again pass factories and references around etc. The beloved OO-design pattern are often just workarounds for problems introduced by object-oriented programming!
So far my examples mixed the concepts of executor creation, defining new thread names and running tasks. To add the concept of layers, we now assume that we have an application-wide configuration that creates the executor. For whatever app-specific reason, it's common in our application that new threads should get a random name in some contexts. We only need to capture this behavior once in a new function:

;; Create application executor
(def app-executor (create-executor))

;; Capture "random name" behavior
(def with-random-name #(binding [*thread-name* (str (rand))]
(%)))

The application code now only needs to use the with-random-name function and the *thread-name* variable is now an implementation detail of our application-wide configuration:

(with-random-name
#(.execute app-executor (create-demo-task 7)))

=> Task ID: 7 Thread Name: 0.5414362847600062

Easy, isn't it? Obviously, you can create as many functions, compose them, pass them to other functions etc. This is a very powerful programming concepts and I fail to see how "function composition is a degenerate case of the Decorator pattern", as Nick called it.

This style of programming is called context-oriented programming and I hope that we will see more use of it in the future.

The design patterns of modularity

The blog continues to describe the use of dependency injection, decorators and factories and shows why those pattern are important for modular application. Again, I argue that those pattern can be easily implemented in functional languages and that the result is often more flexible and probably easier to use. The examples heavily depend on OO constructs (e.g. class hierarchies) and therefore do not translate 1:1 to a functional style. However, this shouldn't be a big problem since the purpose is to demonstrate the design pattern, not the API of a query framework. We assume that we have a query function and simulate a long running operation with Thread/sleep:

(defn query [q]
(Thread/sleep (rand 1000))
(println "Run query:" q))

In practice, the function would contact the database, return the query data structure etc. The first task is to add a TimingOutQuery that cancels the query after a certain amount of time. To keep our Clojure code simple we instead create a timing decorator that prints how long the operation took (a timing function is already implemented in the Clojure core library):

(defn timing-query [q]
(time (query q)))

This function simply delegates to the original query function. With this design we would need to create a wrapper function for each database function (query/select/cancel) which is clearly not an option. Fortunately it's quite easy to generalize this abstraction:

(defn timing-fn [f & args]
(time (apply f args)))

The function timing-fn takes a function and an arbitrary number of arguments as parameters, calls the function with the arguments and wraps everything in Clojure's time function. We simply created a generic timing decorator that we can use to wrap any function:

(timing-fn query "query string")

Nick stated that "function composition is a degenerate case of the Decorator pattern" but it's the object-oriented implementation which is limited because e.g. only Query objects can be decorated. The TimingOutQuery and Query are even tied together via class inheritance, one of the most strict coupling you can use! Making the OO version generic would involve another abstraction, maybe new interfaces, etc.

One of the advantages of decorators is that you can test the code in isolation, no matter if you use the OO or the functional way. As described above, the timing-fn is ever easier to test since it does not depend on a specific type that it will wrap. For example, we could easily wrap the println function:

(timing-fn println "Hello World!")

After we decomposed everything into small functions, we need factories to put everything together again. The idea is that user code should never know what type of query (normal query, timing query, ...) it is using. We therefore pass factories to the user code. These factories return the actual query function that the user code should use. First we need to create 2 factory function, one for normal queries and one for timing queries:

(defn query-factory []
(fn [q] (query q)))

(defn timing-fn-factory [factory]
(fn [q] (timing-fn (factory) q)))

It is important to note that while the query-factory has no parameters, the timing-fn-factory takes another factory as parameter. Our timing factory knows nothing about the query function and is completely generic. Next we need a user code function that takes a factory as parameter and invokes it to get and use the actual query function:

(defn load-users [factory]
((factory) "load users"))

Lets use it with the normal query function:

(load-users query-factory)

Using the timing-fn-factory is a bit more tricky, since a) we want that the user code calls the factory and b) the user code does not know that the factory takes another factory as parameter. Hence we need something like "here is your factory and I already fill in the parameters for you" feature. This pattern is called partial application in functional languages and is probably one of the most basic operation:

(load-users (partial timing-fn-factory query-factory))

We simply call partial with the function and parameters.

As I wrote in the beginning, I think that every Java/OO programmer should be familiar with the concepts Nick presented and they indeed are very important. However, I hope that I was able to show that the functional programming world is powerful as well and that there are more reasons than just convention-over-configuration why "hipster programmers who loves Clojure, Ruby, Scala, Erlang, or whatever" are adopting functional programming languages.

4 comments:

nk said...

This was a well balanced response to my post. I think you make a few (trivial) mistakes about whether Query and TimingOutQuery are part of the same class hierarchy (they share an Interface only), but perhaps I misunderstood your point.

WRT/ The use of bindings I'm unfamiliar with them but from your blog post it seems like Dynamic Scoping. Dynamic scoping has advantages and disadvantages... It allows run-time configuration (pro) but is something of a shotgun approach that everyone down the callstack inherits from. Sure, you can override it in a nested context, but two siblings inherit the same bindings. It's a whole lengthy debate about dynamic scoping, subject for another post. And we could easily well want it in an Object Oriented language. (In fact, one Scala idiom is "thread-local-inheritable" values that is used to simulate dynamic scoping even when new threads are forked off. But it doesn't work with thread-pools, alas).

A better approach for my use-case (and aesthetics) would be (and here I'm using pseudo-haskell to get terse currying):

threadFactory :: (() -> String) -> Thread
threadFactory namingFunction = makeThread(namingFunction())

then,

threadFactoryWithRandomNaming = threadFactory(() -> toString(rand))

threadFactoryWithConstantNaming = threadFactory(() -> "ConstantName")

This is an imaginary programming language, I apologize for my imprecision here. But with this we can have two sibling thread factories that generate names differently. I don't think this is easy to do with bindings...

But If your point is that many of the techniques I argue for map directly to functional languages--well, I wholeheartedly agree.

Dependency Injection is just a name for abstracting over a parameter. Factories are functions to create objects that satisfy an interface. Decorators are function composition but over a richer domain than just "apply".

In something like Haskell, "polymorphic" code is achieved with type classes. But type classes dispatch on type. So, you, have the same issue arise that your code is anti-modular if you manufacture objects directly rather than abstract over manufacturing functions. In Haskell this might mean creating an algebraic type directly rather than accepting a function as a parameter that creates objects that satisfy a type class.

In a functional language that doesn't use something like type classes, you achieve modularity by passing around functions to manipulate your object. Since these functions satisfy an interface, you can add decorators to them as function composition so long as the type doesn't change. You end up doing Dependency Injection on a per-function/per-method basis rather than a per-manufacture basis. Pros and cons but the essential idea is the same.

Foudres said...

You binding of variable is fairly simple to implement in any language.

In java one could implement it with a static class featuring a map. Key would be variable name. Value would be a stack of values.

And you'll and up writting :

Conf.put("myVar", myValue);
do something ...
Conf.pop("myVar");

You could even make it per thread and per method just by using introspection. So you'll would have just to write :

Conf.put("myVar", myValue);
do something ...

It can be very usefull, but i'll use it with caution because of the side effect of this type of code can introduce.

I had to maintain code where map where used to pass parameter instead of explicit method parameter and in the long run it was really confusing and code was difficult to maintain.

For complex configuration i would prefer a file, why not XML for the auto validation, but format is not important as soon as you can "compile" the file so you know it is correct. It would be far better than include the magic numbers directly in the code.

I would say in the end, it nice and API can be configured to meet more most use cases. But how you'll configure it will heavyly depend of the real use.

Maybe your variable binding by code, maybe files, maybe explicit parameter of the function/method. Maybe another one ? It will depend of what you'll want to do, what you need... And i guess all language allow you to do that.

But if you make an API and not a wrapper for your app, there shall be a way of configuring the API and offer a simple declarative and explicit to do it programaticaly. (And without side effects).

Roman Roelofsen said...

@nk @Foudres

I agree :-) Bindings are a powerful concept but of course they have their pitfalls as well and yes, you can simulate them in in Java with e.g. Maps (even though being forced to manually mark the scope boundaries with push/pop can be error prone).

My goal was to demonstrate that functional languages are powerful as well when it comes to modular design and that e.g. the need for boilerplate code might be an illusion driven by your language and the constructs it provides. Bindings are simply a powerful construct in Clojure.

As you hopefully noticed, I tried to avoid any "X is better than Y" statements and I do not want to generalize "FP is better than OO" or whatever. Both worlds have their strengths and weaknesses and I simply analyzed the scenario from the FP point of view.

Unknown said...

> "hipster programmers who loves Clojure, Ruby, Scala, Erlang, or whatever"

So Scala prograllers are not hipster programmers? ;-)