Thursday, January 26, 2012

Handle this! (views, const, state in Clojure, Java, C++ and Python)

Introduction

The original vision Alan Key had on object oriented programming was about separate entities communicating through message passing. A logical consequence is that the global programming state is the sum of the individual states of these entities (called objects). State of such objects is naturally hidden from the outside and state modifications occur only as a consequence of the exchanged messages.

I would like to mention that in this model the "privacy" of internal variables is not exactly simply a matter of a keyword, but a consequence of a programming philosophy. This is not the kind of limitation you get in Java classes or C++, where the field is there, you just cannot access it. It somewhat more similar to calling a black-box with a state that is its own business; there are no fieldsand if there are, they are just an implementation detail. Or even more so, private variables are not accessed in the same sense that the physical address of an object in Java is not part of the programming model.

Such objects do not necessarily have their own thread of execution (in the sense that they are concurrently in control). However, if they had, the logical model would not be overly different. But back to the objects…

I somewhat believe that objects are an overloaded metaphor. In fact, there are at least two types of objects. And while the object oriented message only metaphor well applies for domain specific objects, I somewhat feel that it is not appropriate for some data structures. Sometimes, it is a nice property that "similar structures" have a common interface so that, for example, switching from an array to a linked-list is a painless transition, because it eases experimentation with different trade-offs regarding computational efficiency (although such problems are better solved with pen and paper).

However, in other situations, accessing the internals of some complex structure is plainly the "right thing to do". It is a walking-horror from the object oriented point of view, but it plainly makes sense for computational reasons. I often have to deal with graphs with billions of nodes, and more often than not I feel that usual OO laws are too restrictive.

Graph example

A clear example here is the design of networkx.Graph: I have nothing against the design, by the way. I believe they do the right thing. Here the idea is that they have implemented their Graph internals in some way (does not matter how, right now). However, you may want to get a list of all the nodes in the graph. Now, how to do this? The first issue, is that the nodes may not be memorized in a way which is easier to return. This is actually the case: nodes are a dictionary keys, under the hood. So essentially there is no easy way to return them without calling some dict member which returns a newstructure holding the nodes.

State and "Static" OOP

OOP is all about state change. Perhaps just local state change, if done correctly. And hopefully the state's effect do not propagate too far from where the state is hidden. About C++, I found no other very mainstream OO language that makes it clear what you shall change and what you shall not.

The C++ pragmatics is really precise on consting whatever you can const. And to solve issues where it is not practical to have a logically const object which actually mutates something inside, you can use the mutable modifier to support the idea that the object realstate did not change while some irrelevant parts of it indeed changed. Examples are forms of caching, counting stuff, logging to a logger we hold a reference to.

Another important aspect of C++ is that it quite distinguishes between a const pointer (a pointer that cannot change) and a pointer to a const object (the pointed object cannot change). As always all this leads to additional complexity. However, declaring stuff const is good: first it is a rather strong safety guarantee, second it really leads to optimizations otherwise impossible. Still, it is tragically inadequate wrt. plainly immutable objects.

Moreover, although many other languages do not have pointer arithmetics, they do have references. In Java it is possible, for example, to mark such a reference final, which essentially means it will always refer to a given object. However, there is no way to state that the actual object could not be mutated by accesses through that specific reference.

In Java, the only way to achieve that goal is not providing methods that mutate the state. In fact this approach makes sense. Somewhat you make the language simpler without really losing much. And C++ newbies really do not get the whole constness thing very well.


mutable-immutable.png

Essentially, in Java you do not have the possibility to have a mutable object that some clients cannot mutate. There are options, however. For example, in Figure 1) we have two interfaces, one mutable and one immutable. We have the mutable interface extend the immutable one and the appropriate base classes.

Immutability at class level can be obtained both (a) with a true immutable implementation implementing the immutable interface and a mutable implementation implementing the mutable one; or (b) with just a mutable implementation: clients that should not mutate the object will use the immutable interface. This is quite similar to the const in C++ in the sense that a const_cast is usually possible (and in this case we could just cast to the mutable interface). Such things somewhat break the whole immutability thing, but sometimes have their uses.

And what is the big deal with immutability? Basically, in this context immutable stuff can be shared with no fear. And copying huge datasets is too inefficient to be considered.

Dynamic OOP

The essential problem here is that the OO language we have discussed so far are built around the idea that your co-workers will screw the project if they can do stuff. So the objective is not letting them do it. Constness shall be enforcedby the language (you had the opinion that I was happy about C++ const, did you?) because otherwise someone will foobar the project.

On the other hand in languages such as Python you may well do everything to every object and consequently the const-enforcement does not fare very well. A bit more could be done (formally) in Ruby. Still, even then you could always hack the objects to let you do whatever you want. And believe me, you could do that also in C++ and Java, provided you have sufficient control of the environment where the program is going to be run. It is just way harder.

In fact, I believe approaches where good policies about code isolation can be also (easily) implemented in Python. Good API design is of paramount importance. A C++ wise advise (from Meyers) was "Avoid returning Handles to object internals" (Item 28, Effective C++, 3rd ed, Scott Meyers, Addison-Wesley).

Essentially the idea is never to let your object guts exposed and never ever let someone mess with it. This is not about trust. This is about such handles are just a sure way to break your object constraints (why I'm talking like a static programmer anyway?). The point is that such handles change state independently by the core object and this is probably going to be bad, because the corruption of the state will be revealed in a place and time extremely distant from when and where it actually happen.

So, we have to carefully design our APIs, even (shall I say, especially?) if we are dynamic programmers. For example, we can return views on our object internals. Since our languages are very dynamic, such views can be easily constructed: they just have to quack like the original objects. When it makes sense, it is probably just better place the functionality in the "large" object and to delegate to the attribute (delegation is so trivial to implement in dynamic languages!). Notice that strongly interface based languages such as Java could make this approach even more natural, provided that formally specified interfaces make sense for the specific case.

Sometimes it makes also sense to return object which can mutate and where their mutation influences the state of the object from which they come from. However, in this situations such objects shall be built in a way that they do not break the behavior of the object from which they were gotten. Essentially here we are just obeying principle like SRP (single responsibility principle) and design things to work together. In fact, they are not handles to the object internals at all. We are not exposing the implementation of the object: we are just exposing an interface to a part of the object state (perhaps even state that cannot be changed through the main object interface).

What are the problems with this approach? As long as things are not modified, copying is fine. A view is a good thing, because it may be as efficient as possible for reading, while being completely safe. The problem essentially arises when we want to mutate the objects state: internals handles are bad, so we have to:

1. carefully craft the object interface to allow modifications efficiently and that make sense to the problem at hand, without making it excessively general (because it clashes with efficiency) or excessively big (because it clashes with almost every good property OOP tries to give to programs)

2. Perhaps create special objects that are able to perform controlled modifications on the original object. This may give lot of generality, in a sense, but also complicates the class hierarchy significantly.

Graph example

Back to our example… we may have many solutions. Suppose that this "get the list of nodes" operation is frequent enough. It may make sense to memorize such list separately from the dictionary. If node removal and addition is not too frequent, the additional memory may well be worth it (well, perhaps not, if we really have lots of nodes). Even if such operations are frequent, we double the cost of the addition and make deletion O(N)… but if instead of a list we use a set, we have both operation O(1) simply with an increased multiplicative constant. Of course a language could offer a dict implementation which essentially offers an efficient view over the set of keys, so that separate memorization is not needed.

We could use a mutable datatype to hold the list, but then we should make a copy before returning it (this what actually happens with networkx). Not making a copy has the same problems of returning an internal handle. If we make a copy, then we could return something immutable or mutable. Essentially returning something immutable has not a lot of sense, as modifications would not affect the graph andmodifications to the graph are not reflected in the node-list. The simplest thing to do is plainly return a list of nodes.

The true solution would be that dictionary supported a "true" view object which is able to modify the original dictionary. And actually Python 2.7 and Python 3 have it. At this point we could just return such thing and have both efficiency and functionality… were it not for a simple issue: a networkx graph has more than one internal structure holding the nodes. Thus a higher level view would need to be created which could work across the different point were the same information is memorized inside a Graph. And we are back to the "complicating the class hierarchy thing".

Immutable by default

The thing is that actually having to specify things to be const, is a bit a pain. And perhaps it is just me... but consider the Java solutions (this apply to things which roughly work like Java): we are talking about having two class and two interfaces (or just one class and two interfaces) for lots of objects. In my opinion, this is not practical. And if we want to create "well-behaved handles" things become even more complex.

In fact, this is probably why it is not done (most of the times). Probably it should be sufficient to limit such strategies for things where it really matters. Think about the collections framework.

On the other hand... think about a world where most things are just immutable. I think it is just a safer mind model of programming. It is not about limiting your colleagues (or yourself) on not doing things which are licit in the model and that we want to restrict.

If we thinkimmutable, things are just easier. But then we are definitely moving towards the functional side of things. I'm not claiming that functional languages have onlyimmutable stuff. Even though many functional languages (Clojure, Haskell) have mostly immutable stuff. However, reasoning in terms of flows of functions and immutable objects is just easier than thinking about immutable immutable objects. At least, it should be, if we were trained to think functionally from the beginning.

Here we are used to deal with const objects. Sometimes we needto change the state. Two typical scenarios spring to mind. We aren't doing "Object Oriented Programming": we are just writing an algorithm and the algorithm was conceived for imperative languages. Sometimes there is no clear conversion into the functional world. Not an efficient one, at least. In this case we may want to use some special mutable object (arrays?) to perform our computation efficiently. And this may even generally work.

In the second case, it is simply not practical to structure the state of the world as some function parameters. In fact most of the times the global state is to big to be wisely represented as a huge set of parameters. In this case we probably want to express the computation as a set of transformations (functions, basically) that shall be executed one after the other on the world. Here I am mostly thinking about Haskell's monads. Though, even different from a syntactical and semantical point of view, we are not far from the realm of refs/agents.

The issue of efficiency, however, remains. We should still keep in mind that well buried under layers of object orientations there may be lots of hidden costs. Interfaces often get in the way of really efficient implementations, because costs are not part of the interface. The collections framework is beautiful… but sort is still implemented copying everything to an array and sorting the array.

Welcome under the sign of the Lambda

Not only it is better to have const object by default, that is to say object mutability shall be an opt-in rather than an opt-out. In fact, a part from the famous koanabout objects and closures, we have to avoid returning handles to our objects guts… but I do not see often closures that open up the enclosed state to the world.

The point is that avoiding all the copy costs may be simply thething to do when we have to deal with huge datasets. Restrict mutability where needed (e.g., implementing the algorithms) but mostly use mutable input and outputs from functions. Moreover, functional code is generally flatter, which can also, in the long run, improve efficiency.

Eventually, with languages such as clojure, even the perceived drawbacks of lists can be avoided using vectors, which support efficiently a different sets of primitives. Lazyness is also extremely helpful: actions that are not performed do not cost.

Wednesday, January 25, 2012

Social Network Analysis

This is a presentation I gave some days ago.
It is rather maths heavy and the point of view is more maths than computer science.



Thursday, January 5, 2012

Java and Clojure - How data structures influence language usage

One of the essential points is that we are using object for at least two different things: i) data-types and ii) domain objects. Data-types are usually general entities that can make sense in every program and are not particularly related to a specific application. Examples of these objects are numbers and strings, but also vectors, lists, maps. On the other hand domain objects are domain specific and depend on the specific context. Both a payroll system and a web-server have probably some use for a vector class, however web-server probably does not have an employee class.

The logic level of the objects of the two different kinds is different and code using both should not be written. Essentially, the idea is that low-level code should be encapsulated in methods or functions that work at an higher level, thus making the whole source more details because low level stuff does not clutter the flow of information. Also notice that most standard library code uses low-level abstractions, because it shall be rather domain independent. Thus better data structures mean better code.

So what are these data-types? Languages differ greatly in their choice of datatypes. C has only integers and arrays, for example (ok, it is not OO, but does not matter right now). C++ inherit this. Strings are a library function, so are more well behaved collections. The interesting thing is that the different parts of the STL do not depend much on each other. So for example streams do not play really well with strings (e.g., file names cannot be strings, but in fact streams do not play well with about everything else) and std::string does not have a method such as

vector<string> string::split(string sep);

or, better:

some_range<String> string::split(string sep);

In Java, we have Strings, but collections were retro-fitted. So for example String#split returns an array of strings. In my opinion, one of the essential problems of many "static" programming languages such as C++ and Java is right here: they have the wrong primitives. Writing low level code, is very cumbersome and unnecessarily imperative (C++ STL somewhat makes an exception as it has a somewhat functional flavor the rest of the language lack).

I think that having entities that fit a similar role but have very different usage places a heavy conceptual burden. In Java arrays are really different from collections: they have a nice ugly syntax (but at least they have one) and cannot be used in the same ways collections can. Have different methods, names, facilities. On the other hand, collections lack a literal syntax, which is something that makes them feel "first-class". C++ has basically the very same problem; however, the STL does a great job in normalizing access patterns between STL collections and regular arrays.

Back to the main subject, I feel that Java actually lacks algorithmic code that eases manipulation of collections. In fact, the only "user-friendly" feature is the "for-each" statement. Other than that, code is very imperative and comparable with C code manipulating the same structures. Plus, Maps are really ugly because a first class tuple data-type is missing.

Clojure, on the other hand, has plenty of functions to manipulate low level stuff. This is rather relevant as there is a lot of code that works at this level. Being able to write it quicky is very important. Moreover, using the correct abstractions, avoid bugs. Consider for example the map functions? In a col like

(map f coll)

there can be only one single point of failure: what f does on the single elements. There is no explicit looping, no offset problems, nothing. Consider for example the number of things that can go wrong imperatively writing a loop that calls a function on pairs of consecutive elements in a collection. Compare it with its functional equivalent:

(let [coll (range 10)] (map + coll (rest coll)))

Here nothing can go really wrong. It is easier to write and easier to understand (provided you know a bit of clojure). It is worth noting that real high level object oriented languages have almost this kind of abstraction. But hey, they also have FOF and LOL (first order functions and let ovel lambda -- actually, they do not have let, but the name doesn't matter, does it?). Besides, that kind of code may also be easier to parallelize (in the sense that the compiler could do it).

Then comes class specific code. Object are great, fine. However, sometimes they are just used to group data together. In a sense, it makes sense. In Java there are no tuples, maps are ugly as hell and have the stupid requirement of type uniformity. Which means that either we abuse Object or we can't use them at all.

As a consequence, many specific classes are created. They have no true purpose, but carrying around pieces of data together. And that is just the task for maps, tuples, vectors. Maps, tuples and vectors have a uniform interface, "do the right thing" regardless of the types (in a decently typed language -- new definition… both static, e.g., Haskell, and dynamic, e.g., Clojure, Ruby, Python, languages can qualify! ). And there are plenty of functions to manipulate such stuff.

I think that one of the clearest examples is the Python tuple. A tuple is not magic. Is not clever. It is simple. Still, while in Python functions return only single values, tuples in fact allow to simulate multiple return values. And that is something which is overly useful in many different situations. In Clojure vectors can be used with a similar meaning (in fact, Clojure vectors are quite similar to Python tuples). And the destructuring syntax is amazing (once again, both in Python and in Clojure).

for k, v in some_map:
    do_something_with(k, v)

or

(for [[k v] some-map] (do-something-with k v))
(doseq [[k v] some-map] (do-something-with k v))

Amazingly enough, the set of orthogonal language features used in both languages is roughly the same. Consider that with using a Set[Map.Entry[K,V]] Map#getEntrySet method... here we have this new Entry class. And perhaps somewhere else we have a Pair class which basically does the same thing, but it is a separate type, with different methods and different usage patterns. In dynamic languages at least, as long as the signature of the methods is compatible, something could be saved... but in statically typed object oriented languages, good luck with that.

I think these are some of the reasons I find especially pleasing writing algorithmic code in Python or Clojure and rather despise the very same thing in Java.

Friday, December 23, 2011

Does Clojure fix "Fundamental problems with the Common Lisp language (citation)"?

I was looking for some Common Lisp libraries to implement an idea of mine. For lots of reasons I was considering not using Python or Clojure and going directly with Common Lisp (in fact, I think it is going to be Scheme... but it is hard to tell).

As it often happens when following semi-random links on Google, I stumbled on something quite interesting:
Fundamental problems with the Common Lisp language

Nothing extremely new, indeed. The only point that really surprised me was the "hard to compile efficiently" thing. I would have said that in general SBCL is a pretty fast environment. Not as fast as C++, perhaps not even as Java (but I believe that this depends from the specific benchmarks used), but still fast.

However I was mostly interested in the other claimed problems:

  1. Too many concepts; irregular
  2. Hard to compile efficiently
  3. Too much memory
  4. Unimportant features that are hard to implement
  5. Not portable
  6. Archaic naming
  7. Not uniformly object-oriented
May seem like a lame argument... but I think that clojure actually addresses all the problems but number 2 and 3. Please notice that I'm not claiming that clojure is a memory hog or that it is slow. Simply put, right now my impression is that common lisp is still faster than clojure. I believe that this is due to Clojure being an additional layer over Java. Java is itself probably marginally faster than SBCL (though your mileage may vary). With marginally faster I mean that really depends on what you are doing and one or the other may result faster.

Alioth benchmarks are, like all benchmarks, not extremely relevant. Still, this is my general impression. SBCL does a wonderful job, in that CL is much higher level than Java and there are many more engineers optimizing the JVM. However, Clojure takes its toll, in that, according to my tests (and to alioth as well) there is quite a lot of work to do to make it catch up with Java.

Probably for high level stuff or specific problems (where in Java there would be essentially some part of clojure runtime/libraries to be re-implemented) they are on par.

About the memory, my feeling is that JVM is a rather memory intensive business, and Clojure can't do much about it. E.g., Python programs usually run using much less memory when confronted with similar data-sets.

I'm not saying that we should drop CL and switch to clojure. However, I believe that clojure addressed some of the problem that many (some?) in the CL community feel CL has.

Thursday, December 15, 2011

Better toys for us programmers

Today PyCharm 2.0 is out. A couple of days ago, the last version of IntelliJ (11) was released as well.

I'm somewhat reluctant to discuss the matter here; I'm not a free software integralist by any means (like for example posting from a beautiful MacBook Air, with OS X), still when discussing commercial software I feel a vague sense of guilt because I feel like I am doing advertisement. This is especially true as most of the times there it does not involve only reporting facts (which would be acceptable, as the truth tends to be true, even if in favor of a commercial entity) but impressions, which could make me seem biased.

These are the most interesting improvements I found in PyCharm/IntelliJ (it should be clear when stuff applies only to one of the two -- and amazingly enough, the what's new page on PyCharm is more detailed):

  • Support for pypy (this is going to be immensely important for me, as I'm planning to move part of my development environment on pypy)
  • Support for ipython (more on this later)
  • Cython support (which may be something I'll be using soon enough)
  • Git graphs (been a bit of a PITA lately to remember the proper log options to have them in the console, my memory ain't what it used to be)
  • Gist support (I love that)
There is more. I did not even realize that before it was not there... but for example now PyCharm completes the keyword arguments in Python stuff. Which is extremely nice, in my opinion. And also the refactorings seem to work more accurately.

Eventually, I want to point out a feature I was sorely missing in all environments I tried, i.e., choosing the method to step into when debugging. First, I'm not the kind of guy that spends lots of time in the debugger (see., that would be a clear smell on the quality of my unit tests). But when I do, I often feel rather boring having to walk step by step irrelevant code.

Consider this:

my_object.that_is_some_method(
    ILuvDIP(...), self.that_is_interesting(), self.may_be_a_property)

and suppose that I feel the bug is in that_is_interesting. Now, it may well be that my Python debugging skills are not excellent. Afterall only recently I stopped instrumenting my code with prints and always use a proper debugger. Before that I relied almost only on unittests and prints.

Before PyCharm 2, it was hard to step into that_is_interesting and not in ILuvDIP. I believe that the same logic also applies to Java and IntelliJ and hopefully to Clojure. Then, back to us.

I really ain't lots of problems with IntelliJ. I feel that an IDE is a very valuable asset when developing Java. In fact, I would say it is a PITA to do without. Perhaps with Emacs and some modules like JDEE. Still, I don't know... I try to avoid Emacs these days (as I'm finding vi more and more natural to me).

The question is more interesting regarding Clojure and Python (and Ruby...). Probably if I would use Django a lot, PyCharm would be a clear winner. Support is awesome and you have to work in a frameworkish way in any case. There is nothing wrong with that, of course.

These days I'm mostly writing library/algorithmic code in Python. And I feel like ipython+vim is a great tool here. It's got an almost Mathematica/Matlab vibe that is nice for what I'm doing. I also try using that approach with Clojure more often than not. Tests as documentation and specification, REPL as an integrated development environment. It is possible that with ipython builtin in PyCharm I could just move that workflow to PyCharm itself.

There is however, the issue of code complete. Emacs fares pretty well to complete Clojure, but as far as I remember is not so good regarding completing Java (perhaps I should have installed JDEE). As a consequence, I used IntelliJ a lot even with Clojure.

Recently, I started exploring Clojure+vim too. And it is a wonderful world. I have most of what I need and it is extremely lightweight. I have to investigate the issues further. However, IntelliJ remains a solid environment for Clojure development.

Now the essential question is... I quite need IntelliJ for Java. And having it working with Clojure is a big plus (even if maybe not a strict necessity). But should I buy a separate PyCharm or just rely on IntelliJ plugin?

Monday, December 12, 2011

Erlang and OTP in Action (review)

First time I got into Erlang, it was with "Programming Erlang: Software for a Concurrent World" (Joe Armstrong). It is a very nice book, in my opinion, and I enjoyed immensely reading it. That was like 4 years ago or something. Back then, the functional revolution was just at the beginning: no widespread Scala, no Clojure at all, essentially no F#. Back then it looked like OO was going to rule the industry for years and years, with no contender of sort. Rails was fresh, Django was fresher (back then the APress book was just being released).

I bought the book because I wanted to see this "brand new" technology (20 year old, but just going to make it through in the circles I did frequent). And really, the language looked like 20 years old. Full of Prolog legacy, Unicode who's that guy? and so on. However, it no other piece of software I knew could as easily. Massive concurrency, hot swapping code. Wow.

The language, I did not especially like. The runtime… WOW! As a language, I love Python or Clojure because the way apparently distant functionalities work together and create something even more beautiful. Erlang does not have that at language level. It has at a framework level.

Think about hot swapping code in an object oriented software. First, I somewhat believe that object oriented modules are somewhat more tangled that functional equivalents. Partly because of the object reuse OO promotes (that can really be against you if you want to swap code). Then, there is the whole problem of references vs. addresses. Addresses have an additional level of indirection that makes it far easier to swap a process than it is to swap an object.

But the very idea that state is in the function parameters and that process linking and easy restarting thing are at the very root of how easy it is to swap code in Erlang. But then… back to the books.

The essential problem was that after seeing a bit of Erlang, I thought OTP was not a big deal. Yes, it is easier to use. But also plain Erlang is. And I convinced myself that should I need Erlang, I could just use plain Erlang. Than things changed: I read some OTP using code and I understood not so much of it. That was this year. What I understood, was that it could be helpful. I had to write a concurrent prototype and I welcomed the idea not to write as much code as possible.

In the meantime, I forgot many things about Erlang. In this situations, instead of reading the same book twice, I buy another book to gain perspective. So I decided to buy another book. One of the candidates was "ERLANG Programming" (Francesco Cesarini, Simon Thompson). It also had excellent reviews. Essentially I believe it contains more material than Armstrong's book and is also a bit more recent. However, as far as I understand, is still a bit terse on OTP.

As a consequence, I bought "Erlang and OTP in Action" (Martin Logan, Eric Merritt, Richard Carlsson) instead. And I'm very happy of this choice. It complements Armstrong's book well and extensively covers OTP. In fact, I also believe that the approach is very interesting. Introduce OTP first and learn to use it, then when you know what it can do, you are going just to use that. Then, learn plain Erlang in order to extend OTP when your use case is not covered. And a nice plus was a detailed description of JInterface, which I could need as well.

In fact, I do think that as it may make sense to introduce objects as early as possible in a book on an object oriented language, starting with OTP is a very big plus from a learning perspective. Then perhaps the point is that I did not need to get into a functional mindset (which I think Armstrong book does with more attention.

If the question is however just "learn to think functionally" I believe that "The Joy of Clojure: Thinking the Clojure Way" (Michael Fogus, Chris Houser), LYHFGG or Land of Lisp are probably better alternatives. Another interesting one is "Functional Programming for Java Developers: Tools for Better Concurrency, Abstraction, and Agility" (Dean Wampler), even if it has a whole different perspective.

Wednesday, December 7, 2011

Good and not so good reasons to learn Java (or any other language) [part 2.5]

So this is the third part after the not so good reasons to learn Java and the other good reasons to do it.


Java could be fun


Really. There is some stuff that is really nicely done. I like Akka, for example. And I find the idea of hacking with assertion really funny. It is a totally different approach to meta-programming that I really liked. I also like Antlr... after that every other library for imperative/oo-languages to build parsers seemed primitive.

There is some nice stuff in the Python world as well (and yeah, in Lisp you have Lisp and Haskell has wonderful stuff to). But I can tell: implement a language in Java and in C++. In Java you will be finished so much earlier... of course, other languages are faster too. But your team may not include other Haskell hackers.

Libraries, libraries, libraries

In Java there is a library for everything. Quite often, they are very well done, even if somewhat over-engineered. Probably my idea of over-engineering is a bit extreme (it comes from having seen lots of lean languages). But really, they are robust and well tested.

Moreover, Java is usually efficient enough to be a decent contendant for more demanding tasks.. Maybe C/C++ can be avoided for your application (and Java libraries are usually easier to use than C++ ones).

Java as an intermediate level platform

Say you are interested in language design. As far as I can see, you have few choices.

  • Write your own runtime, vm, etc. That was the "old" approach (Python, Ruby)
  • Implement your language on the top of a Lisp[0] or Prolog[1] interpreter
  • Use LLVM
  • Use JVM
  • Use CLR/Mono
I would rule out the latter, because I don't do any windows. Between LLVM and JVM as far as I understand it may depend on the language. JVM has a few quirks (who said lack of TCO?), but has plenty of available documentation, real world examples (Scala & Clojure), a large number of developers working for the well-being of the plarform, and a huge amount of libraries, libraries, libraries you may want to use.

LLVM is probably going to be faster, though. And has lots of optimizations and stuff for static languages. In any case, to make an informed choice, you need to know Java

Tools (IDEs, Maven, Ant)

IDEs are the bless and the curse of Java. After having used for sometime IntelliJ I really feel that the amount of functionality that IDEs for other languages offer is puny. The possible exception is Emacs for Lisps.

The funny thing is that I'm not an IDE gui. However, really, when you do refactoring (even simple stuff like moving functions and files around) it is an invaluable time-saver. I miss vim as a text manipulating programming language, but still... for Java IDEs are almost necessary. And not only bridge part of the gap with other languages, they have lots of useful stuff.

Regarding Maven... well, I just like it. I also like the fact that I'm able to build IntelliJ projects from Maven scripts is wonderful. And also Ant is a very good tool (although rather over-engineered). My humble opinion is that Ant is far easier to use than the whole autotools company. This may also have something to do with the fact that the whole Java deploy process is easier.

In any case, the point here is not how cool is Maven. Developers from other platform may want to know what is boiling up in Java-land. Sometimes we have sub-par tools and we do not even know it. Of course quite often Java tools fix "Java problems" that are different from the ones we have in other languages. Sometimes not.

E.g., the design of Maven could inspire similar tools for the other platform. Both for its strengths and for its weaknesses (to avoid them, of course).

Learn "classical" threads

Ah, this is weak. However, other languages have a "threading" module that is heavily inspired by the way Java does threading. I don't particularly like it, but being familiar with it may be a very good idea. As far as I can tell, is also what most people have in mind when thinking about threading. Who am I to say that they are wrong? I can just tell them there is better stuff.

And yes, pthreads are even more a PITA.

Find Java in other languages. Sometimes.

This is basically the extended version of the older argument. Java is hugely popular. Many "new" things are created in Javaland, and then are ported to other platforms. Or perhaps are not created but became first popular inside the Java community.

For example, xUnit libraries are available everywhere, but I think that most people first started working with it in Java. Some of the authors of the original Smalltalk library actively work on JUnit, etc etc etc.


Conclusion


I do not think that Java is particularly good. Not particularly bad, either. It is not the kind of enlightening language Scheme is. It is not easy to use and predictable like Python. However, its popularity may make it unwise not to learn it. Especially for communication reasons:

  • Books
  • Other developers (talking about OOP, libraries)
  • New ideas brewed in Javaland

wow... that's it. ;)

---
[0] Scheme, Clojure, Common Lisp...
[1] Erlang was created this way, even if now it has its own (wonderful) runtime

Saturday, December 3, 2011

Good and not so good reasons to learn Java (or any other language)[part 2]

So this is basically the second part of "Good and not so good reasons to learn Java".

And here I will discuss the good reasons to learn Java. Some of them...
I decided to split this in two posts. The next one will be published in a few days.

If you know me a bit, you probably know that I do not like Java particularly. In fact I found out that the essential problem is with Java being presented as something modern or "superior". It is not. However, it did something very good in the software environment (a part from electing over-engineering as a form of art): many technology that were outside the mainstream and was dubbed slow (garbage collector) are now widely accepted and that is partly because of Java. This is no reason to learn Java, though.

Moreover, my opinion is that Java is not a good first language. It is not a good second language either. However, after learning a high level language like Python or Ruby, a couple of functional languages (maybe Clojure and Haskell or Scheme and Scala) and something lower level, such as C, well... learning Java it could be a very good idea.

Lots of interesting stuff runs on the JVM

Yes... I'm talking about Clojure and Scala. And perhaps JRuby too. Sometimes understanding the underlying platform helps understanding some design choices. Moreover, there is stuff (e.g., in Clojure) that has not yet a fully clojurized variant and we have to use Java stuff.

Knowing Java is a plus, of course.

Lots of interesting books deal with Java

I'm not talking here about the wonderful Effective Java or Java Puzzles. They are great Java books, of course. And some suggestions may also apply to other languages. Still, if you are not interested into Java, you may miss them.

No, I'm talking about other great books...


Some of them are not entirely in Java and some have non-Java alternatives. Still reading Java is a huge plus in reading those books. I believe part of the reason is that:

Java is like a common-denominator object oriented language


Forget the Java platform. Consider just the language. Essentially Java is really a common-denominator of other object oriented language. Other languages typically offer more. Are more dynamic, offer more powerful type-systems, more expressive constructs.

Just thing about the OOP building blocks. Here we are talking about the mis-interpreted OOP as a matter of types vs. a matter of messages. That is what most people think about when referring to OOP. Unfortunately, as I said.

Java has those. But does not have much more. If you want to design OO stuff, Java gives you the building block, but does not stand above offering higher level construct. Think about Python or Ruby or Lisp... their OO facilities are so superior (plus they typically offer stuff outside the OOP model -- well, Lisp is so much more than an OO language...) that you are probably going to design stuff differently. Probably you are not going to over-engineer stuff.

Modifications are cheap. Abstractions are easy. And of course you can go outside the OO model when it makes sense.

On the other hand in Java you cannot. You can just get back at a procedural level, which is clearly inferior. The best thing you can do in Java is try to be as OO as possible: other choices usually do not pay.

As a consequence:

  1. Java is a great language to expose those "ill-conceived OOP" concepts that are used in every other language where ill-conceived OOP techniques are used.
  2. Java is a great language to truly learn to program like an object-oriented zealot

Please notice that I consider the latter a very good thing. It is a very good exercise that lets you understand the merits and the drawbacks of the paradigm. Probably when you think that something really sucks in Java, you hit a limit of OOP in a static language. Other languages may make that easier, but the wall is still there.

Java is good to learn "good OOP" too

Yes! As I already said there is a lot of smart people working with Java. They have already discovered and taught how to do "good" OOP in Java. You should learn it too.

Then, when using a different language things could only be easier. Some of the techniques you learned may be useless, because the language offers better abstractions. Still you will have a pretty clear understanding of what such abstractions are doing and perhaps you may develop a feeling for when not to use them.

As a matter of fact, too much magic is bad, even if it may seem cool. While sticking too much to simple things may actually complicate the design in the long run (meta-programming leads to code that you do not write and that is the only kind of code that needs no debugging or testing or maintenance, as it does not exist), too much magic has the same effect at the opposite position of the spectrum.

Avoid writing Java in other languages

If you know Java, at a certain point you may discover you would be writing much the same code if you were using Java. If the language you are using is C++, you probably have done something wise: chosen a restricted subset of C++ and used that one (than we may argue if you actually left out good stuff).

However, if you are using Python or Ruby, or, worse, Clojure, then you are writing awful code. $x code is not meant to be structured like Java. If it does, you are not using the language well. This is basically a side effect of "Java is like a common-denominator object oriented language".








Saturday, November 26, 2011

Good and not so good reasons to learn Java (or any other language)

The first thing to consider is there is really not such a thing like a reason not to learn a programming language. Or better, there is only one reason: lack of time. Learning a language is a tricky business[0]. Learning to use a whole platform is way trickier.

Please notice that some of the reasons presented here are general enough to be applied to any programming language (especially the bad reasons to learn a language). Others are specific of Java (especially the reasons to learn it).

Also keep in mind that the good reasons to learn Java will be presented in another post because this one is becoming far to long to be readable in the few minutes I feel entitled to ask you, dear reader. So believe me... you probably should learn Java. Still, not for the reasons I list here.

Not so good reasons to learn Java

It is widespread

You may be lead to think that since Java is a very widespread language, it is easier to find jobs if you know it. In the case of Java there is the question that I am convinced that it is not a bad thing to know Java and that can have pleasant effect on finding jobs (more on that later), still I would not consider it a good reason to learn Java.

Learning a widespread language means more jobs and more applicants. What really matters is the ratio between jobs demand and offer. And usually for very widespread languages it is not always very high. Skilled, experienced and "famous" developers do not care much. The others should.


It is object oriented

This and the variant "it is more object oriented" are clearly wrong reasons. There is plenty of good object oriented languages. And it is rather hard to decide which is more object oriented (lacking function looks more like a wrong design decision than a clue of being more object oriented).

Besides, I'm not even sure that being more object oriented is good (or bad for what matters). Well... not sure that being object oriented is inherently good either. Maybe in ten years the dominant paradigm will be another. Maybe in two. Three years ago "functional programming" was murmured in closed circles and outsiders trembled in fear at its very sound.

No, joking. Most people thought that "functional" meant having functions (and possibly no objects) and thought about C or Pascal. They did not know it was about not having crappy functions. Yeah, more than that, of course, but that's a start.

It is a good functional language

Just checking if you were reading carefully...
That is a bad reason because it is not true!

It is fast/slow

Oh, come on! Java implementations are decently fast, faster than most other language implementations out there and usually slower than implementations of languages such as C, C++, Fortran. Still, for some specific things it may be nearly as fast or faster. Sometimes it is not only CPU efficiency that matters.

It is good for concurrency

The "classic" Java threading model sucks. It's old. Nowadays other ways of doing concurrency are better (more efficient, easier to use).

Such more efficient easier to use methods are built-in in languages such as Clojure (or Scala, or Erlang). Still, Java is probably the most supported platform in the world. Such concurrency models are not inside the language, but you may have them using libraries.

Sometimes this is a real PITA (language support is a great thing, because the syntax is closer to the semantics).

Moreover having the "older" concurrency model visible may be a problem with the newer stuff. And some other libraries and frameworks may just assume you want to do things the old way.

It is good for the web

Ok... most people do not even know how widespread is Java on the web. In fact, it is. But is it really good? I do not really think so. There is lots of good stuff in Java world for the web, of course. The point is that there is also for other platforms. Here Rails and Django spring to mind.

Moreover, there is loads of extremely cool stuff coming from the functional world. Erlang and Haskell have some terrific platforms. Scala and Clojure also have some extremely interesting stuff and can leverage mature Java stuff but make it really easier to use.

Grails (web framwork) may seem on the same wavelength, still I think there is a very important difference. First, I don't like Groovy at all. In fact Groovy very author says that he would not have created it if he knew Scala. And of course since I do not like Groovy, I do not see why I should be using Grails, which, as far as I know, does not offer killer features over Rails or Django.

Scala and Clojure are different. They are not just "easier" than Java. They teach a different way of thinking and approaching development. And of couse, they are great for the web also.

I already know it a bit

This is quite an interesting point. Why, if you know a language a little, shouldn't you learn it well? This essentially makes it clear the difference between a not so good reason to learn something and a reason not to learn it.

The point is simple: if you are interested in learning Java (for the good reasons), do it. But from knowing a language "a bit" and knowing it "well" there is quite the same distance that from not knowing it and knowing it well. So don't do it because you think your prior experience may be extremely relevant.

There are languages which are just easier to master (where easier means that to learn them from scratch it takes less time that to become as proficient in Java as you would learning those languages -- yeah, Python, Ruby, [1]...)

Besides, I feel that the second step in Java is rather hard. I think that Java is very easy to learn for experienced developers (it is not a joke). The very first step in Java is relatively easy (a part from lots of useless crap like having to declare a class to put there a static main method and overly complicated streams). The step just after that, the one that takes you from writing elementary useless programs to writing elementary useful programs is quite harder, since lots of useful stuff in Java requires to understand OO principles quite well to be used without too many surprises.

So, you may have learnt Java at the college. Still, if you to hack something and "grow" there are languages that let you do it faster (Python or Ruby).

I have to

This is the worse possible reason because I don't see the spark in it. You have to. You probably are an experienced developers that got bored to death reading this post, you did not know Java and you have to learn it. Maybe your boss wants you to do some Java.

I'm sorry for you. Java is not easy to learn (especially to learn it at the point you can work with it). Mostly because everybody has been doing Java in the last fifteen years. Smart people and dumb people.

As a consequence there are truly spectacular pieces of software it is a pleasure to work with and worthless pieces of over-engineered crap. I truly hope that you are going to work with the great stuff.

Still I consider a "bad reason" to learn a language, because you are probably not going to enjoy the learning process. If you were, you would have used a different sentece... like "I want to".
And perhaps the thing would have been "My boss gives me the opportunity to increase my professional assets learning this new technology, Java".

It was easier to download/find a book/my cousin knows it.

One of the most popular reasons people pick up a language is because they find it installed on their system. That is not usually the case for Java, in the sense that usually the JDK is not installed, even if the VM is. Variants of this argument are a friend proficient in the language or, more frequently, availability of books at the local bookstores. This kind of arguments usually apply for complete programming beginners or people who had prior experience that has not been refreshed for years (e.g., old Basic programmers).

It is true that it is easy to find manuals for Java. The point is that not every manual is a good starting point. Specifically, this kind of user really needs a manual targeted at a true beginner. Unfortunately, that is the category of books where more craps piles. First beginners usually do not really understand if the manual they are using is crap.

If their learning process is too slow (supposed that they see it) they usually blame: (i) programming in general, (ii) the language or, worse (iii) themselves. Well, the language may have its faults (especially in the case of Java, still C++ is worse for a beginner), but it is important to understand that the culprit is usually the manual.

The funny thing is that the people who know how to tell apart a good manual from a bad one are usually the ones that do not really need an introductory book, are probably not going to buy it and more often than not are not enquired about a good manual. So unless their cousin is actually a skilled programmer, beginners are really risking buying a bad manual. And please, notice that even skilled programmers are susceptible to a similar argument: if you are interested in a new language and you read a terrible manual you may build a wrong opinion on the language (and perhaps you are not interested in spending another 40 gold coins to convince you that the language is really bad).

So please: read reviews on Amazon (or similar site -- notice that I'm not going to suggest to buy from Amazon, even if I often do and I am an associate -- here I'm just saying that the amount of reviews on books on Amazon -- com -- is usually large enough to make an idea). Find people that are experts and have teaching experience: their reviews are probably the one to base the judgment upon. Then buy the book from whatever source you want.

So do not buy a Java book because you can find books on Java and not on Clojure[2]/Python/Ruby/Whatever[3]. Choose the language (this is hard), choose the book (this is easier), buy, read it, study it, code code code.


---
  1. I kind of found out that for people really convinced that learning a language is easy one or more of the following apply:
    1. have an unmotivatedly high opinion of their knowledge of the languages they claim to have learned
    2. have a very inclusive definition of "learning" a language (e.g., writing hello world in it)
    3. only know and learn very similar languages (really, if you know C++ it's gonna be relatively easy to pick up Java, if you know Scheme probably Clojure is not going to be a big deal, etc.)
    4. and tend to write "blurb" in every language they know (so for example they write Python as if it was C++ -- and usually are very surprised when things go badly)
    Of course there is also a lucky minority for which learning languages is very easy.
  2. It is funny that I am suggesting to learn only "object oriented dynamic" languages such as Python or Ruby. I understand that many advocates of functional programming may disagree. But I somewhat think that while some kind of people greatly benefit from learning a functional language as the first language they truly master, many others are just not fit for functional programming because it is too abstract and working imperatively may be easier for them. As a consequence, languages such as Python or Ruby may naturally lead you towards functional programming if it is the kind of stuff you like, but are still usable if you are not that kind of person. I have seen things...
    And yes... if you are the kind of person that likes functional programming, you will get into functional programming sooner or later. This is just a bit later. 
  3. Notice Clojure here...
  4. Whatever is not a language. Still, it would be a great name for a language. Maybe not.

Friday, November 25, 2011

Brief morning delusion…

So basically it's a couple of days I try to use a given piece of software. And for some unfathomable reason the thing does not do a specific thing which I need and it shall do. I'm not going to be more specific, because I do not want to point fingers.

The point is that it does not give any clues on why it is failing or what is exactly trying to do (thus "how" it is failing). Since the project is open source I decided to take the sources and hack my way to the problem. I have a rather good understanding on the domain that I'm going to solve (it's related to unix processes -- though in OS X environment --). I'm familiar with both.

It's been a while since I last wrote some Objective-C, but this time I should just look at the sources, perhaps set a couple of breakpoints and find out what is happening. My plan is that after that I could fix the source so that:

  1. It logs what is doing
  2. It logs any errors that occur
  3. Perhaps I fix the specific problem in the code

The first bad piece of new is that I spot some obvious software engineering mistakes in the software. Unfortunately it is stuff that needs more than a casual hacking to be fixed (specifically, I'm talking about configuration stuff hardcoded in the sources). Still, it is not probably a big issue. It may even make sense in some contexts… well, not really. But anyway. Some wrong data-structures… but everything is basically fine: the code, a part from that, is well written and rather clear.

So I localize the piece of code that fails. It does a bloody lot of magic, in my opinion. My guts tell me that some of it is just unnecessary, some better design could lead to much simpler and less magic code. The only problem is that sometimes such kludges are just a consequence of unorthogonal design of the underlying systems (in this case OS X). But that is something I would do later on, after simply adding the code that logs possible errors (I think that is of paramount importance for the semi-technical users of the software, in the sense that they could better understand what is going awry when they customize it).

Then a thought strikes me: compile the whole thing just before making modifications. The svn repo should have been left in compilable state… but no. It is not. So I should have to find the last point where it can be compiled. Which I could do… well, another time.