Monday, February 20, 2012

Lion, brew and gcc

For me, the transition to lion was relatively painless. Painless here basically means that I patched up the couple of things that gave me problems . In the specific situation, the issue is that Apple's new developer's tools do not include regular gcc anymore. Instead there is a version working with llvm backend which is great but has some issues with some packages that have not been updated yet.

Another problem is that is closely related is that I had a Python 2.7 installed with Python main website package. I did this because older OS Xs did not have Python 2.7. That Python was built with the older gcc-4.0 apple shipped with SL. Thus, new libraries I install with pip still want that compiler, which apple moved in /Developers-3.2.x/... Thus, I lived so far adding that directory to the PATH and happily compiling.

What I should have done was getting rid of my beloved python and either use EPD or Apple built-in 2.7 (shipping with Lion) or use a brew-ed python or see if Python.org distributes a Python compiled with the new compiler (which, as far as I know, is perfectly capable of building Python). The question is: are there any python extension I need that need the older gcc? But this is not something I'm going to discuss here: not yet ready to make the transition.

I just want to point out that I installed the old gcc from homebrew-alt, and now I can just brew install --use-gcc for packages that need the old gcc compiler. This is easy to use and nice. I also removed the old Developers and hope everything's fine.

% git clone https://github.com/adamv/homebrew-alt.git /usr/local/LibraryAlt
% brew install /usr/local/LibraryAlt/duplicates/apple-gcc42.rb

An older version of gfortran I installed conflicted, but the commands above worked like a charm after removing it.

Tuesday, February 7, 2012

Clojure: writing tail recursive functions without using recur

Once upon a time, in the land of the Clojure, there was a brilliant student who enquired the nature of things and he for that he was greatly loved and appreciated by his teachers, for he was brilliant and asked about the nature of things and they could explain him the world. He learned about macros and first order functions and actors and software transactional memory and he was happy. But then he began to focus on the nature of recur and he felt that something was amiss and the way trampolines and functions interact made him wonder.

During a short holiday he was in Schemeland, and he saw that they did not use recur. They named the functions and called them with their name in tail position and that was the way they did. And he felt it was good. So he asked his Master:

"Why can the schemers call the functions in tail position and their stacks never end?"

And the Master told him that it was because their soil is fertile, while the land of the Clojure is just an oasis in the wastes of Javaland.

"We are lucky," he added, "for our land still gives us food spontaneously and the air does not drives us mad. And our spring is natural and not a framework. And if you do not like recur, you can always map, for or loop. Higher order functions and macros can help you to hide what you do not like, for you are the master of your own language."

For a while the student was content, still he had a recurring thought: functions can do everything and in Church he did not find recur but only functions. And so he learned Y. But Y is a demanding beast, for it requires functions to be defined awkwardly. And the student wanted to simply write:

(defn factorial [n acc]
  (if (= n 0)
    acc
    (factorial (- n 1) (* n acc))))

His Master saw him troubled and in pain, and one day a big application they were creating exploded with a StackOverflowError and he knew that unless the student was cured he could not be writing code with the others. So one evening, he told his student that if he really wanted to learn the secrets to make tail-call recursive functions run with constant stack usage, he shall go to the Land of Snakes, where they eat only Spam and Eggs, to look for some wise men and have them teach the secrets of creating tail recursion at language level. And he warned the student that the travel was dangerous.

The young student was frightened, because the Land of Snake is far away and he was barely aware of the perils that lurked in the shadow. However, his resolve was strong and he packed his things, an editor and few jars to survive the wastes of Javaland, and ventured forth. He travelled through the dull wastes where everything is private or protected and he has to ask things to do things for him. But as most people grew up in the land of Clojure, he knew that objects are but closure in disguise and he new how to bind them to his will with the power of the dot form.

After three days and three nights he eventually reached the Mines of Ruby and where massive gates blocked his way. He spoke the words as his master instructed (password: mellon), but the gate remained closed, for someone monkey-patched it and the door now spoke english, but the student did not know it and with failure in his heart he left the place, because he was trained in the ways of Lambda and could not cope with such a stateful abomination.

After months of wandering, he eventually reached Schemeland, where he at least could tail recur without recur. He spent another month drowning his pain into first class continuations and losing hope to ever come back to the land of Clojure, until one day he overhead the men telling tales at the tavern about a pythonista adventuring to Schemeland. The student sought the pythonista and eventually he found him and he asked him about the secrets of recursion and the pythonista showed him code and the student was happy because he knew classes are another word for closures and he was a master of closures. But he also understood that state was the missing element and he was saddened, because he also knew that state is treacherous.

Nonetheless, he felt he had come to far to be stopped by the formal purity of stateless programming and he eventually wrote the code. Little is known of what happened afterwards. Some claim he finally found his way to the lend of the Snake, after having extendedly monkey-wrenched the monkey patcher, other say that he went back to the Land of the Clojure.

I, your humble narrator, do not know the truth. I tried nonetheless to follow the student's step and implement myself the function that makes tail recursive calls without using recur. Ugly it is, and is indeed but an illusion, for recur lies inside the implementation. Still, it does the trick:

(defrec factorial [n acc] 
  (if (= n 0) 
    acc 
    (factorial (- n 1) (* n acc))))

And here the code:

(defn uuid [] (str (java.util.UUID/randomUUID))) 
 
(defn tail-recursive [f] 
  (let [state (ref {:func f :first-call true :continue (uuid)})] 
    (fn [& args] 
      (let [cont (:continue @state)] 
        (if (:first-call @state) 
          (let [fnc (:func @state)] 
            (dosync 
              (alter state assoc :first-call false)) 
            (try 
              (loop [targs args] 
                (let [result (apply fnc targs)] 
                  (if (= result cont) 
                    (recur (:args @state)) 
                    result))) 
              (finally 
                (dosync (alter state assoc :first-call true))))) 
          (dosync 
            (alter state assoc :args args) 
            cont))))))

Thursday, February 2, 2012

Geek automation: Maven, Ant and other things

About automation, I'm definitely a geek. And I totally agree with the famous graph:

And I was working to a Java project with an unusual deployment. Part of the problem is that the actual application for which I was writing the plugin wasn't available as a maven repository. Moreover, running the whole thing needed some fiddling with long command line options, plus a non-trivial class-path.

I started appreciating maven with leiningen, and then by itself. Moreover, it solves one of the worse problems I have with IDEs. I hate to version IDE files (after all, people may use a different environment, something many eclipse users do not even consider). And I hate to recreate them every time (especially when they are not trivial). Eventually, I hate to depend from a GUI program to actually build a project (where a command line should suffice).

Then there is another problem related to libraries, i.e., I find it annoying all the solution regarding jars:

  1. place the jars in a system wide directory akin to /usr/local/lib
  2. download the jars every time and manually add them to the project
  3. create a script that downloads the jars (and curse the first time you develop on windows)
  4. do not use external dependencies

Maven solves all these problems together. I just name the jars in the pom and everything is taken care of. I loved the thing so much with Clojure and Leiningen that even with the despicable xml syntax things work smoothly.

But I had this very important jar that ain't versioned nowhere. So maven was out of question, I thought: moreover. The idea of manually downloading the jar and then adding it to maven's repository (and doing it on the 2-3-4 computer I use) really looked unacceptable.

So I dig into ant most advanced stuff and built an ant script that actually downloaded the jar for me and put it in a lib directory inside the project (packaging jars with the project is not acceptable). But I still wanted to use maven… so I learned the terribly documented ant plugin to get the dependencies from maven… and since that was not installed by default, I also wrote the code to go and get the jar.

So I basically had this ant script that did everything for me… and was some hundred lines long. Eventually, I also wrote some targets to generate scripts to execute the project (I told about complex command lines).

Of course, this also meant that maven was under-used. But then I realized… I have maven. I have a plugin that creates the scripts for me (not part of maven, but maven can easily get it). I only need ant to get the ant-maven bridge (which I wouldn't need anymore) and to get the missing jar. So I'm maintaining a very complex system for basically nothing: it is so much simpler to wget the jar, install it in the local maven repository and go on.

Fun that in order not to write to lines a couple of times I wrote a one hundred lines script.

Thursday, January 26, 2012

Handle this! (views, const, state in Clojure, Java, C++ and Python)

Introduction

The original vision Alan Key had on object oriented programming was about separate entities communicating through message passing. A logical consequence is that the global programming state is the sum of the individual states of these entities (called objects). State of such objects is naturally hidden from the outside and state modifications occur only as a consequence of the exchanged messages.

I would like to mention that in this model the "privacy" of internal variables is not exactly simply a matter of a keyword, but a consequence of a programming philosophy. This is not the kind of limitation you get in Java classes or C++, where the field is there, you just cannot access it. It somewhat more similar to calling a black-box with a state that is its own business; there are no fieldsand if there are, they are just an implementation detail. Or even more so, private variables are not accessed in the same sense that the physical address of an object in Java is not part of the programming model.

Such objects do not necessarily have their own thread of execution (in the sense that they are concurrently in control). However, if they had, the logical model would not be overly different. But back to the objects…

I somewhat believe that objects are an overloaded metaphor. In fact, there are at least two types of objects. And while the object oriented message only metaphor well applies for domain specific objects, I somewhat feel that it is not appropriate for some data structures. Sometimes, it is a nice property that "similar structures" have a common interface so that, for example, switching from an array to a linked-list is a painless transition, because it eases experimentation with different trade-offs regarding computational efficiency (although such problems are better solved with pen and paper).

However, in other situations, accessing the internals of some complex structure is plainly the "right thing to do". It is a walking-horror from the object oriented point of view, but it plainly makes sense for computational reasons. I often have to deal with graphs with billions of nodes, and more often than not I feel that usual OO laws are too restrictive.

Graph example

A clear example here is the design of networkx.Graph: I have nothing against the design, by the way. I believe they do the right thing. Here the idea is that they have implemented their Graph internals in some way (does not matter how, right now). However, you may want to get a list of all the nodes in the graph. Now, how to do this? The first issue, is that the nodes may not be memorized in a way which is easier to return. This is actually the case: nodes are a dictionary keys, under the hood. So essentially there is no easy way to return them without calling some dict member which returns a newstructure holding the nodes.

State and "Static" OOP

OOP is all about state change. Perhaps just local state change, if done correctly. And hopefully the state's effect do not propagate too far from where the state is hidden. About C++, I found no other very mainstream OO language that makes it clear what you shall change and what you shall not.

The C++ pragmatics is really precise on consting whatever you can const. And to solve issues where it is not practical to have a logically const object which actually mutates something inside, you can use the mutable modifier to support the idea that the object realstate did not change while some irrelevant parts of it indeed changed. Examples are forms of caching, counting stuff, logging to a logger we hold a reference to.

Another important aspect of C++ is that it quite distinguishes between a const pointer (a pointer that cannot change) and a pointer to a const object (the pointed object cannot change). As always all this leads to additional complexity. However, declaring stuff const is good: first it is a rather strong safety guarantee, second it really leads to optimizations otherwise impossible. Still, it is tragically inadequate wrt. plainly immutable objects.

Moreover, although many other languages do not have pointer arithmetics, they do have references. In Java it is possible, for example, to mark such a reference final, which essentially means it will always refer to a given object. However, there is no way to state that the actual object could not be mutated by accesses through that specific reference.

In Java, the only way to achieve that goal is not providing methods that mutate the state. In fact this approach makes sense. Somewhat you make the language simpler without really losing much. And C++ newbies really do not get the whole constness thing very well.


mutable-immutable.png

Essentially, in Java you do not have the possibility to have a mutable object that some clients cannot mutate. There are options, however. For example, in Figure 1) we have two interfaces, one mutable and one immutable. We have the mutable interface extend the immutable one and the appropriate base classes.

Immutability at class level can be obtained both (a) with a true immutable implementation implementing the immutable interface and a mutable implementation implementing the mutable one; or (b) with just a mutable implementation: clients that should not mutate the object will use the immutable interface. This is quite similar to the const in C++ in the sense that a const_cast is usually possible (and in this case we could just cast to the mutable interface). Such things somewhat break the whole immutability thing, but sometimes have their uses.

And what is the big deal with immutability? Basically, in this context immutable stuff can be shared with no fear. And copying huge datasets is too inefficient to be considered.

Dynamic OOP

The essential problem here is that the OO language we have discussed so far are built around the idea that your co-workers will screw the project if they can do stuff. So the objective is not letting them do it. Constness shall be enforcedby the language (you had the opinion that I was happy about C++ const, did you?) because otherwise someone will foobar the project.

On the other hand in languages such as Python you may well do everything to every object and consequently the const-enforcement does not fare very well. A bit more could be done (formally) in Ruby. Still, even then you could always hack the objects to let you do whatever you want. And believe me, you could do that also in C++ and Java, provided you have sufficient control of the environment where the program is going to be run. It is just way harder.

In fact, I believe approaches where good policies about code isolation can be also (easily) implemented in Python. Good API design is of paramount importance. A C++ wise advise (from Meyers) was "Avoid returning Handles to object internals" (Item 28, Effective C++, 3rd ed, Scott Meyers, Addison-Wesley).

Essentially the idea is never to let your object guts exposed and never ever let someone mess with it. This is not about trust. This is about such handles are just a sure way to break your object constraints (why I'm talking like a static programmer anyway?). The point is that such handles change state independently by the core object and this is probably going to be bad, because the corruption of the state will be revealed in a place and time extremely distant from when and where it actually happen.

So, we have to carefully design our APIs, even (shall I say, especially?) if we are dynamic programmers. For example, we can return views on our object internals. Since our languages are very dynamic, such views can be easily constructed: they just have to quack like the original objects. When it makes sense, it is probably just better place the functionality in the "large" object and to delegate to the attribute (delegation is so trivial to implement in dynamic languages!). Notice that strongly interface based languages such as Java could make this approach even more natural, provided that formally specified interfaces make sense for the specific case.

Sometimes it makes also sense to return object which can mutate and where their mutation influences the state of the object from which they come from. However, in this situations such objects shall be built in a way that they do not break the behavior of the object from which they were gotten. Essentially here we are just obeying principle like SRP (single responsibility principle) and design things to work together. In fact, they are not handles to the object internals at all. We are not exposing the implementation of the object: we are just exposing an interface to a part of the object state (perhaps even state that cannot be changed through the main object interface).

What are the problems with this approach? As long as things are not modified, copying is fine. A view is a good thing, because it may be as efficient as possible for reading, while being completely safe. The problem essentially arises when we want to mutate the objects state: internals handles are bad, so we have to:

1. carefully craft the object interface to allow modifications efficiently and that make sense to the problem at hand, without making it excessively general (because it clashes with efficiency) or excessively big (because it clashes with almost every good property OOP tries to give to programs)

2. Perhaps create special objects that are able to perform controlled modifications on the original object. This may give lot of generality, in a sense, but also complicates the class hierarchy significantly.

Graph example

Back to our example… we may have many solutions. Suppose that this "get the list of nodes" operation is frequent enough. It may make sense to memorize such list separately from the dictionary. If node removal and addition is not too frequent, the additional memory may well be worth it (well, perhaps not, if we really have lots of nodes). Even if such operations are frequent, we double the cost of the addition and make deletion O(N)… but if instead of a list we use a set, we have both operation O(1) simply with an increased multiplicative constant. Of course a language could offer a dict implementation which essentially offers an efficient view over the set of keys, so that separate memorization is not needed.

We could use a mutable datatype to hold the list, but then we should make a copy before returning it (this what actually happens with networkx). Not making a copy has the same problems of returning an internal handle. If we make a copy, then we could return something immutable or mutable. Essentially returning something immutable has not a lot of sense, as modifications would not affect the graph andmodifications to the graph are not reflected in the node-list. The simplest thing to do is plainly return a list of nodes.

The true solution would be that dictionary supported a "true" view object which is able to modify the original dictionary. And actually Python 2.7 and Python 3 have it. At this point we could just return such thing and have both efficiency and functionality… were it not for a simple issue: a networkx graph has more than one internal structure holding the nodes. Thus a higher level view would need to be created which could work across the different point were the same information is memorized inside a Graph. And we are back to the "complicating the class hierarchy thing".

Immutable by default

The thing is that actually having to specify things to be const, is a bit a pain. And perhaps it is just me... but consider the Java solutions (this apply to things which roughly work like Java): we are talking about having two class and two interfaces (or just one class and two interfaces) for lots of objects. In my opinion, this is not practical. And if we want to create "well-behaved handles" things become even more complex.

In fact, this is probably why it is not done (most of the times). Probably it should be sufficient to limit such strategies for things where it really matters. Think about the collections framework.

On the other hand... think about a world where most things are just immutable. I think it is just a safer mind model of programming. It is not about limiting your colleagues (or yourself) on not doing things which are licit in the model and that we want to restrict.

If we thinkimmutable, things are just easier. But then we are definitely moving towards the functional side of things. I'm not claiming that functional languages have onlyimmutable stuff. Even though many functional languages (Clojure, Haskell) have mostly immutable stuff. However, reasoning in terms of flows of functions and immutable objects is just easier than thinking about immutable immutable objects. At least, it should be, if we were trained to think functionally from the beginning.

Here we are used to deal with const objects. Sometimes we needto change the state. Two typical scenarios spring to mind. We aren't doing "Object Oriented Programming": we are just writing an algorithm and the algorithm was conceived for imperative languages. Sometimes there is no clear conversion into the functional world. Not an efficient one, at least. In this case we may want to use some special mutable object (arrays?) to perform our computation efficiently. And this may even generally work.

In the second case, it is simply not practical to structure the state of the world as some function parameters. In fact most of the times the global state is to big to be wisely represented as a huge set of parameters. In this case we probably want to express the computation as a set of transformations (functions, basically) that shall be executed one after the other on the world. Here I am mostly thinking about Haskell's monads. Though, even different from a syntactical and semantical point of view, we are not far from the realm of refs/agents.

The issue of efficiency, however, remains. We should still keep in mind that well buried under layers of object orientations there may be lots of hidden costs. Interfaces often get in the way of really efficient implementations, because costs are not part of the interface. The collections framework is beautiful… but sort is still implemented copying everything to an array and sorting the array.

Welcome under the sign of the Lambda

Not only it is better to have const object by default, that is to say object mutability shall be an opt-in rather than an opt-out. In fact, a part from the famous koanabout objects and closures, we have to avoid returning handles to our objects guts… but I do not see often closures that open up the enclosed state to the world.

The point is that avoiding all the copy costs may be simply thething to do when we have to deal with huge datasets. Restrict mutability where needed (e.g., implementing the algorithms) but mostly use mutable input and outputs from functions. Moreover, functional code is generally flatter, which can also, in the long run, improve efficiency.

Eventually, with languages such as clojure, even the perceived drawbacks of lists can be avoided using vectors, which support efficiently a different sets of primitives. Lazyness is also extremely helpful: actions that are not performed do not cost.

Wednesday, January 25, 2012

Social Network Analysis

This is a presentation I gave some days ago.
It is rather maths heavy and the point of view is more maths than computer science.



Thursday, January 5, 2012

Java and Clojure - How data structures influence language usage

One of the essential points is that we are using object for at least two different things: i) data-types and ii) domain objects. Data-types are usually general entities that can make sense in every program and are not particularly related to a specific application. Examples of these objects are numbers and strings, but also vectors, lists, maps. On the other hand domain objects are domain specific and depend on the specific context. Both a payroll system and a web-server have probably some use for a vector class, however web-server probably does not have an employee class.

The logic level of the objects of the two different kinds is different and code using both should not be written. Essentially, the idea is that low-level code should be encapsulated in methods or functions that work at an higher level, thus making the whole source more details because low level stuff does not clutter the flow of information. Also notice that most standard library code uses low-level abstractions, because it shall be rather domain independent. Thus better data structures mean better code.

So what are these data-types? Languages differ greatly in their choice of datatypes. C has only integers and arrays, for example (ok, it is not OO, but does not matter right now). C++ inherit this. Strings are a library function, so are more well behaved collections. The interesting thing is that the different parts of the STL do not depend much on each other. So for example streams do not play really well with strings (e.g., file names cannot be strings, but in fact streams do not play well with about everything else) and std::string does not have a method such as

vector<string> string::split(string sep);

or, better:

some_range<String> string::split(string sep);

In Java, we have Strings, but collections were retro-fitted. So for example String#split returns an array of strings. In my opinion, one of the essential problems of many "static" programming languages such as C++ and Java is right here: they have the wrong primitives. Writing low level code, is very cumbersome and unnecessarily imperative (C++ STL somewhat makes an exception as it has a somewhat functional flavor the rest of the language lack).

I think that having entities that fit a similar role but have very different usage places a heavy conceptual burden. In Java arrays are really different from collections: they have a nice ugly syntax (but at least they have one) and cannot be used in the same ways collections can. Have different methods, names, facilities. On the other hand, collections lack a literal syntax, which is something that makes them feel "first-class". C++ has basically the very same problem; however, the STL does a great job in normalizing access patterns between STL collections and regular arrays.

Back to the main subject, I feel that Java actually lacks algorithmic code that eases manipulation of collections. In fact, the only "user-friendly" feature is the "for-each" statement. Other than that, code is very imperative and comparable with C code manipulating the same structures. Plus, Maps are really ugly because a first class tuple data-type is missing.

Clojure, on the other hand, has plenty of functions to manipulate low level stuff. This is rather relevant as there is a lot of code that works at this level. Being able to write it quicky is very important. Moreover, using the correct abstractions, avoid bugs. Consider for example the map functions? In a col like

(map f coll)

there can be only one single point of failure: what f does on the single elements. There is no explicit looping, no offset problems, nothing. Consider for example the number of things that can go wrong imperatively writing a loop that calls a function on pairs of consecutive elements in a collection. Compare it with its functional equivalent:

(let [coll (range 10)] (map + coll (rest coll)))

Here nothing can go really wrong. It is easier to write and easier to understand (provided you know a bit of clojure). It is worth noting that real high level object oriented languages have almost this kind of abstraction. But hey, they also have FOF and LOL (first order functions and let ovel lambda -- actually, they do not have let, but the name doesn't matter, does it?). Besides, that kind of code may also be easier to parallelize (in the sense that the compiler could do it).

Then comes class specific code. Object are great, fine. However, sometimes they are just used to group data together. In a sense, it makes sense. In Java there are no tuples, maps are ugly as hell and have the stupid requirement of type uniformity. Which means that either we abuse Object or we can't use them at all.

As a consequence, many specific classes are created. They have no true purpose, but carrying around pieces of data together. And that is just the task for maps, tuples, vectors. Maps, tuples and vectors have a uniform interface, "do the right thing" regardless of the types (in a decently typed language -- new definition… both static, e.g., Haskell, and dynamic, e.g., Clojure, Ruby, Python, languages can qualify! ). And there are plenty of functions to manipulate such stuff.

I think that one of the clearest examples is the Python tuple. A tuple is not magic. Is not clever. It is simple. Still, while in Python functions return only single values, tuples in fact allow to simulate multiple return values. And that is something which is overly useful in many different situations. In Clojure vectors can be used with a similar meaning (in fact, Clojure vectors are quite similar to Python tuples). And the destructuring syntax is amazing (once again, both in Python and in Clojure).

for k, v in some_map:
    do_something_with(k, v)

or

(for [[k v] some-map] (do-something-with k v))
(doseq [[k v] some-map] (do-something-with k v))

Amazingly enough, the set of orthogonal language features used in both languages is roughly the same. Consider that with using a Set[Map.Entry[K,V]] Map#getEntrySet method... here we have this new Entry class. And perhaps somewhere else we have a Pair class which basically does the same thing, but it is a separate type, with different methods and different usage patterns. In dynamic languages at least, as long as the signature of the methods is compatible, something could be saved... but in statically typed object oriented languages, good luck with that.

I think these are some of the reasons I find especially pleasing writing algorithmic code in Python or Clojure and rather despise the very same thing in Java.

Friday, December 23, 2011

Does Clojure fix "Fundamental problems with the Common Lisp language (citation)"?

I was looking for some Common Lisp libraries to implement an idea of mine. For lots of reasons I was considering not using Python or Clojure and going directly with Common Lisp (in fact, I think it is going to be Scheme... but it is hard to tell).

As it often happens when following semi-random links on Google, I stumbled on something quite interesting:
Fundamental problems with the Common Lisp language

Nothing extremely new, indeed. The only point that really surprised me was the "hard to compile efficiently" thing. I would have said that in general SBCL is a pretty fast environment. Not as fast as C++, perhaps not even as Java (but I believe that this depends from the specific benchmarks used), but still fast.

However I was mostly interested in the other claimed problems:

  1. Too many concepts; irregular
  2. Hard to compile efficiently
  3. Too much memory
  4. Unimportant features that are hard to implement
  5. Not portable
  6. Archaic naming
  7. Not uniformly object-oriented
May seem like a lame argument... but I think that clojure actually addresses all the problems but number 2 and 3. Please notice that I'm not claiming that clojure is a memory hog or that it is slow. Simply put, right now my impression is that common lisp is still faster than clojure. I believe that this is due to Clojure being an additional layer over Java. Java is itself probably marginally faster than SBCL (though your mileage may vary). With marginally faster I mean that really depends on what you are doing and one or the other may result faster.

Alioth benchmarks are, like all benchmarks, not extremely relevant. Still, this is my general impression. SBCL does a wonderful job, in that CL is much higher level than Java and there are many more engineers optimizing the JVM. However, Clojure takes its toll, in that, according to my tests (and to alioth as well) there is quite a lot of work to do to make it catch up with Java.

Probably for high level stuff or specific problems (where in Java there would be essentially some part of clojure runtime/libraries to be re-implemented) they are on par.

About the memory, my feeling is that JVM is a rather memory intensive business, and Clojure can't do much about it. E.g., Python programs usually run using much less memory when confronted with similar data-sets.

I'm not saying that we should drop CL and switch to clojure. However, I believe that clojure addressed some of the problem that many (some?) in the CL community feel CL has.

Thursday, December 15, 2011

Better toys for us programmers

Today PyCharm 2.0 is out. A couple of days ago, the last version of IntelliJ (11) was released as well.

I'm somewhat reluctant to discuss the matter here; I'm not a free software integralist by any means (like for example posting from a beautiful MacBook Air, with OS X), still when discussing commercial software I feel a vague sense of guilt because I feel like I am doing advertisement. This is especially true as most of the times there it does not involve only reporting facts (which would be acceptable, as the truth tends to be true, even if in favor of a commercial entity) but impressions, which could make me seem biased.

These are the most interesting improvements I found in PyCharm/IntelliJ (it should be clear when stuff applies only to one of the two -- and amazingly enough, the what's new page on PyCharm is more detailed):

  • Support for pypy (this is going to be immensely important for me, as I'm planning to move part of my development environment on pypy)
  • Support for ipython (more on this later)
  • Cython support (which may be something I'll be using soon enough)
  • Git graphs (been a bit of a PITA lately to remember the proper log options to have them in the console, my memory ain't what it used to be)
  • Gist support (I love that)
There is more. I did not even realize that before it was not there... but for example now PyCharm completes the keyword arguments in Python stuff. Which is extremely nice, in my opinion. And also the refactorings seem to work more accurately.

Eventually, I want to point out a feature I was sorely missing in all environments I tried, i.e., choosing the method to step into when debugging. First, I'm not the kind of guy that spends lots of time in the debugger (see., that would be a clear smell on the quality of my unit tests). But when I do, I often feel rather boring having to walk step by step irrelevant code.

Consider this:

my_object.that_is_some_method(
    ILuvDIP(...), self.that_is_interesting(), self.may_be_a_property)

and suppose that I feel the bug is in that_is_interesting. Now, it may well be that my Python debugging skills are not excellent. Afterall only recently I stopped instrumenting my code with prints and always use a proper debugger. Before that I relied almost only on unittests and prints.

Before PyCharm 2, it was hard to step into that_is_interesting and not in ILuvDIP. I believe that the same logic also applies to Java and IntelliJ and hopefully to Clojure. Then, back to us.

I really ain't lots of problems with IntelliJ. I feel that an IDE is a very valuable asset when developing Java. In fact, I would say it is a PITA to do without. Perhaps with Emacs and some modules like JDEE. Still, I don't know... I try to avoid Emacs these days (as I'm finding vi more and more natural to me).

The question is more interesting regarding Clojure and Python (and Ruby...). Probably if I would use Django a lot, PyCharm would be a clear winner. Support is awesome and you have to work in a frameworkish way in any case. There is nothing wrong with that, of course.

These days I'm mostly writing library/algorithmic code in Python. And I feel like ipython+vim is a great tool here. It's got an almost Mathematica/Matlab vibe that is nice for what I'm doing. I also try using that approach with Clojure more often than not. Tests as documentation and specification, REPL as an integrated development environment. It is possible that with ipython builtin in PyCharm I could just move that workflow to PyCharm itself.

There is however, the issue of code complete. Emacs fares pretty well to complete Clojure, but as far as I remember is not so good regarding completing Java (perhaps I should have installed JDEE). As a consequence, I used IntelliJ a lot even with Clojure.

Recently, I started exploring Clojure+vim too. And it is a wonderful world. I have most of what I need and it is extremely lightweight. I have to investigate the issues further. However, IntelliJ remains a solid environment for Clojure development.

Now the essential question is... I quite need IntelliJ for Java. And having it working with Clojure is a big plus (even if maybe not a strict necessity). But should I buy a separate PyCharm or just rely on IntelliJ plugin?

Monday, December 12, 2011

Erlang and OTP in Action (review)

First time I got into Erlang, it was with "Programming Erlang: Software for a Concurrent World" (Joe Armstrong). It is a very nice book, in my opinion, and I enjoyed immensely reading it. That was like 4 years ago or something. Back then, the functional revolution was just at the beginning: no widespread Scala, no Clojure at all, essentially no F#. Back then it looked like OO was going to rule the industry for years and years, with no contender of sort. Rails was fresh, Django was fresher (back then the APress book was just being released).

I bought the book because I wanted to see this "brand new" technology (20 year old, but just going to make it through in the circles I did frequent). And really, the language looked like 20 years old. Full of Prolog legacy, Unicode who's that guy? and so on. However, it no other piece of software I knew could as easily. Massive concurrency, hot swapping code. Wow.

The language, I did not especially like. The runtime… WOW! As a language, I love Python or Clojure because the way apparently distant functionalities work together and create something even more beautiful. Erlang does not have that at language level. It has at a framework level.

Think about hot swapping code in an object oriented software. First, I somewhat believe that object oriented modules are somewhat more tangled that functional equivalents. Partly because of the object reuse OO promotes (that can really be against you if you want to swap code). Then, there is the whole problem of references vs. addresses. Addresses have an additional level of indirection that makes it far easier to swap a process than it is to swap an object.

But the very idea that state is in the function parameters and that process linking and easy restarting thing are at the very root of how easy it is to swap code in Erlang. But then… back to the books.

The essential problem was that after seeing a bit of Erlang, I thought OTP was not a big deal. Yes, it is easier to use. But also plain Erlang is. And I convinced myself that should I need Erlang, I could just use plain Erlang. Than things changed: I read some OTP using code and I understood not so much of it. That was this year. What I understood, was that it could be helpful. I had to write a concurrent prototype and I welcomed the idea not to write as much code as possible.

In the meantime, I forgot many things about Erlang. In this situations, instead of reading the same book twice, I buy another book to gain perspective. So I decided to buy another book. One of the candidates was "ERLANG Programming" (Francesco Cesarini, Simon Thompson). It also had excellent reviews. Essentially I believe it contains more material than Armstrong's book and is also a bit more recent. However, as far as I understand, is still a bit terse on OTP.

As a consequence, I bought "Erlang and OTP in Action" (Martin Logan, Eric Merritt, Richard Carlsson) instead. And I'm very happy of this choice. It complements Armstrong's book well and extensively covers OTP. In fact, I also believe that the approach is very interesting. Introduce OTP first and learn to use it, then when you know what it can do, you are going just to use that. Then, learn plain Erlang in order to extend OTP when your use case is not covered. And a nice plus was a detailed description of JInterface, which I could need as well.

In fact, I do think that as it may make sense to introduce objects as early as possible in a book on an object oriented language, starting with OTP is a very big plus from a learning perspective. Then perhaps the point is that I did not need to get into a functional mindset (which I think Armstrong book does with more attention.

If the question is however just "learn to think functionally" I believe that "The Joy of Clojure: Thinking the Clojure Way" (Michael Fogus, Chris Houser), LYHFGG or Land of Lisp are probably better alternatives. Another interesting one is "Functional Programming for Java Developers: Tools for Better Concurrency, Abstraction, and Agility" (Dean Wampler), even if it has a whole different perspective.

Wednesday, December 7, 2011

Good and not so good reasons to learn Java (or any other language) [part 2.5]

So this is the third part after the not so good reasons to learn Java and the other good reasons to do it.


Java could be fun


Really. There is some stuff that is really nicely done. I like Akka, for example. And I find the idea of hacking with assertion really funny. It is a totally different approach to meta-programming that I really liked. I also like Antlr... after that every other library for imperative/oo-languages to build parsers seemed primitive.

There is some nice stuff in the Python world as well (and yeah, in Lisp you have Lisp and Haskell has wonderful stuff to). But I can tell: implement a language in Java and in C++. In Java you will be finished so much earlier... of course, other languages are faster too. But your team may not include other Haskell hackers.

Libraries, libraries, libraries

In Java there is a library for everything. Quite often, they are very well done, even if somewhat over-engineered. Probably my idea of over-engineering is a bit extreme (it comes from having seen lots of lean languages). But really, they are robust and well tested.

Moreover, Java is usually efficient enough to be a decent contendant for more demanding tasks.. Maybe C/C++ can be avoided for your application (and Java libraries are usually easier to use than C++ ones).

Java as an intermediate level platform

Say you are interested in language design. As far as I can see, you have few choices.

  • Write your own runtime, vm, etc. That was the "old" approach (Python, Ruby)
  • Implement your language on the top of a Lisp[0] or Prolog[1] interpreter
  • Use LLVM
  • Use JVM
  • Use CLR/Mono
I would rule out the latter, because I don't do any windows. Between LLVM and JVM as far as I understand it may depend on the language. JVM has a few quirks (who said lack of TCO?), but has plenty of available documentation, real world examples (Scala & Clojure), a large number of developers working for the well-being of the plarform, and a huge amount of libraries, libraries, libraries you may want to use.

LLVM is probably going to be faster, though. And has lots of optimizations and stuff for static languages. In any case, to make an informed choice, you need to know Java

Tools (IDEs, Maven, Ant)

IDEs are the bless and the curse of Java. After having used for sometime IntelliJ I really feel that the amount of functionality that IDEs for other languages offer is puny. The possible exception is Emacs for Lisps.

The funny thing is that I'm not an IDE gui. However, really, when you do refactoring (even simple stuff like moving functions and files around) it is an invaluable time-saver. I miss vim as a text manipulating programming language, but still... for Java IDEs are almost necessary. And not only bridge part of the gap with other languages, they have lots of useful stuff.

Regarding Maven... well, I just like it. I also like the fact that I'm able to build IntelliJ projects from Maven scripts is wonderful. And also Ant is a very good tool (although rather over-engineered). My humble opinion is that Ant is far easier to use than the whole autotools company. This may also have something to do with the fact that the whole Java deploy process is easier.

In any case, the point here is not how cool is Maven. Developers from other platform may want to know what is boiling up in Java-land. Sometimes we have sub-par tools and we do not even know it. Of course quite often Java tools fix "Java problems" that are different from the ones we have in other languages. Sometimes not.

E.g., the design of Maven could inspire similar tools for the other platform. Both for its strengths and for its weaknesses (to avoid them, of course).

Learn "classical" threads

Ah, this is weak. However, other languages have a "threading" module that is heavily inspired by the way Java does threading. I don't particularly like it, but being familiar with it may be a very good idea. As far as I can tell, is also what most people have in mind when thinking about threading. Who am I to say that they are wrong? I can just tell them there is better stuff.

And yes, pthreads are even more a PITA.

Find Java in other languages. Sometimes.

This is basically the extended version of the older argument. Java is hugely popular. Many "new" things are created in Javaland, and then are ported to other platforms. Or perhaps are not created but became first popular inside the Java community.

For example, xUnit libraries are available everywhere, but I think that most people first started working with it in Java. Some of the authors of the original Smalltalk library actively work on JUnit, etc etc etc.


Conclusion


I do not think that Java is particularly good. Not particularly bad, either. It is not the kind of enlightening language Scheme is. It is not easy to use and predictable like Python. However, its popularity may make it unwise not to learn it. Especially for communication reasons:

  • Books
  • Other developers (talking about OOP, libraries)
  • New ideas brewed in Javaland

wow... that's it. ;)

---
[0] Scheme, Clojure, Common Lisp...
[1] Erlang was created this way, even if now it has its own (wonderful) runtime