Friday, May 29, 2015

Go, Rust, Python... some random impressions

During a discussion in the Italian Python mailing list, there were some comments about "C-family" languages and how they do compare to Python. As a consequence this post is partly a translation and partly a reworking of the opinions I expressed there. Since it is deprived of much of its context, some parts could not make too much sense; plus I suffer from a severe form of... well I tend to digress. So it might suck rat's ass, but there might be something interest, so bear with me. This post is neither meant to imply any inherent merit in being "similar" to Python nor to criticize any language. As a full disclosure, I love Python and quite like Go. Rust gave me a very good impression, too.

So, Rust syntax shows C influences (even though I feel like the main influence from the C family is actually C++ rather than plain C), but overall the influence of ML is much stronger, especially if we also consider the semantics. Why we ended up comparing Rust with Python is a pretty long story, which started from this post (and evolved here -- in Italian --; thanks Carlos). At the end, other languages were somewhat drawn in the comparison, including Go, C, C++, Java and a C# that is still under shock for having been freed but is alive and kicking nonetheless (my avoidance to make explicit what it is kicking is intentional). Actually, I revisited the thread and we also mentioned OCaml, Haskell, SML (that might have been my fault), Erlang , Elixir, F#, D, Nim (also my fault), Perl, Luc Besson (which is not a programming language, but does awesome movies), Ruby (the programming language), Clojure, Scala, even PHP (oh, well, this blog was supposed to be PG13) and Ceylon (which I did not even know it existed: either I am getting old or people are creating too many languages). Sadly nobody mentioned Lisp or Scheme and surprisingly nobody mentioned Javascript. If your favorite language is not listed here, add a comment. At the end of the day much of the discussion involved Python, Rust, Go, Java and C++, which was to be expected.

Rust and Python

All considered, I feel like Rust is philosophically one of the farthest languages from Python. Again, this is not a bad or a good thing; it is the way things are.

Static vs. Dynamic typing

Python is entirely on the dynamic side of typing. It does not support static typing at all, not even the new Python type hints go in that direction. See especially this paragraph for more context: it is always going to be only runtime checks, and they are meant to be entirely optional (even by convention) even if some linters could make use of it. Even then, hints are meant to be used for duck-typing (i.e., to hint that methods take a Number rather than specifically requiring an int). Most languages in the C family are based on static typing and features in the direction of dynamic typing are incomplete, missing and always frowned upon (think about designing Java interfaces working with Objects instead of  interfaces).

Early binding vs. static binding and dynamic dispatch vs. static dispatch

Again, Python is completely dynamic and late bound. Everything is bound at the latest possible moment. If it were possible to bind calls after the process terminated, Python would do that.
The other languages we are considering here have significantly different approaches among themselves: Java uses dynamic dispatch/late binding; Go uses dynamic dispatch when methods are called on an interface, static dispatch when called on a struct. C++ by default uses early binding and static dispatch. When virtual is explicitly used, dynamic dispatch is used. Rust provides both static and dynamic dispatch: to get dynamic dispatch a trait object has to be used. General consensus (and standard library design) favor static dispatch any time that it is possible and it makes sense.

Object model

The object model of Rust (which technically speaking does not claim to be an OO language, so here we are a bit overloading the term) is remarkably different from classic imperative object oriented languages (e.g., Python, Java, C++). Python, Java and C++ also have massively different object models, but in a sense they are closer to one another than Rust is. On a superficial analysis traits seem to behave somewhat like interfaces, but I feel the differences are really deep. They are much closer to Haskell type classes or ML modules; I do not have much experience with Rust, but designing stuff with Haskell type classes is much different from designing stuff in Java (or Python). Really. As a side note, when trait objects are used, the behavior is rather similar to Go interfaces: the difference is that interfaces in Go are implicitly implemented, while in Rust you have to explicitly implement the traits. Go does not provide anything similar to Rust traits when used statically (there are huge discussions on Go lack of templates and related stuff, won't repeat it here).


Python philosophy is to keep things as simple as possible and to abstract details away (especially low level details). On the other hand Rust was created with the idea of giving full control over low level details. If you do not want to do it (or do not have to do it) Rust might feel like an overkill. There might be still tons of reasons to use it, though.


Python has a very rich runtime. In fact, it is so rich that entirely reasonable optimizations are completely impossible by design (unless you are extremely smart about it: see for example the fact that the function frame is an object and technically you might end up accessing it in some call downstream, so a lot of stack related optimizations are not really feasible -- note to self: check what Pypy guys do about it). Rust, on the other hand, almost does not have a runtime by design. The runtime is really minimal by deliberate choice (which makes a lot of sense considering the target)
Just to wrap it up, if I have to pick a single language in the C family to be closer to Python esthetics, that would be Go (with a honor mention to C itself). Both want to abstract the irrelevant details and the system (in one case the VM, in the other the compiler + runtime) does "the right thing". Both have similar issues when you need to go outside the box and need more control (FFI + cython on one side, unsafe + low level syscalls + FFI in the other).
On the other hand, Rust is much closer to C++ or OCaml: it is a rather complex language that enables fine-grained control on everything; however, it is often hard to relinquish such control: truth to be told, it seems that Rust is much better in this area, the model is simpler to reason with and saner, so the net result is that even if you can exercise a lot of low level control, it is not as painful as in C++, where sometimes the abstractions are all wrong.

Go and Python

Another important similarity between Python and Go is that after few weeks of study (days? depends on how fast one is to learn a programming language) it is feasible to read and understand all the code out there. If something is unclear, it is because of lack of familiarity with the domain (e.g., low level networking details) but not because of complex or unusual language features or magic.
If possible, Go is even more extreme in this than Python itself. There are rather basic concepts in Python that lots of coders do not understand and do not know how to use. Albeit I disagree, lots of people consider hard stuff like meta-classes, the descriptor protocol, coroutines implemented with generators and even the relatively simple decorators.
In Go there are no features with a comparable level of cognitive overhead: the language is extremely simple. You get few basic blocks with very simple semantics that can be composed to obtain arbitrarily complex effect. The net result is that even if the newbie might not know how to compose the blocks to get the intended result, he would still be able to understand a solution created by a more experienced programmer, understand which does what and why. This does not mean that writing in Go is necessarily simpler than writing in Python, just that it might be even simpler to read.

Other languages (such as C++, OCaml, Haskell and perhaps Rust) need much more understanding to be able to cover the same amount of code. To be entirely sincere, I have dabbled with Haskell for about ten years and to this day there is a lot of code that I need to unwrap, disassemble and study line by line to actually understand how it is working (what is doing is somewhat simpler, but appreciating the nuances of the language takes time). With Go I was reading the standard library implementations after 2 days.

In fact, when I fell for Python I was mainly using other stuff (including, stuff like Perl and C++). I used to mock lots of Python tenets. Then a guy I knew from some usenet group asked me to review a couple of his scripts. That was before github, before git was even imagined. It was a world of pain and CVS, and SVN was brand new and people often sent code via email and similar primitive means. But I am digressing... I knew something about networking, back then. Did not know Python at all, though. I was dubious that I could actually help him out; but then I realized I was perfectly capable of reading his code, without having ever written a line of Python, without even having read a tutorial, I might say, without even having read the wikipedia page of Python. Which would be surprising now, but really, back then Wikipedia was probably hosted on something less powerful than my mobile phone (and loading pages took tens of seconds, sometimes minutes). Nonetheless I could read and immediately understand the scripts. I could also easily modify them and make them work. Go is the second language after Python that gave me the same impression, which was the decisive factor in my decision to give it a deeper look.

Rust and Python (again): what about the juniors?

Back to the comparison between Rust and Python... one of the points raised in the discussion was whether some freshly trained junior Python programmer could easily take on Rust. To put a disclaimer here, the archetype of said programmer is not the smart college guy that will end up working for some IT colossus or ├╝ber-cool startup, nor the garage hacker without formal education but with talent and brains in spades. It is more like the average IT guy, mildly skilled, mildly cable that can, with some supervision, work on a relatively trivial project with other people. To my surprise, there seem to be a lot of these guys. So... the question is, if one of such guys is trained in Python, which language from the C family can he more easily pick up?

The idea I have of Rust, is that it is not the language for him. I believe said novice would probably not even understand where to start. Lots of design decisions in Rust hold that some problems need to be solved in the most efficient way, so the language should provide the programmers control to specify everything to get to that most efficient way. The side effect is that somebody without experience in systems, low level programming and computer architecture would not even know that the choices Rust provides are solutions of real problems. Essentially, I am afraid, a lot of it would be taken just as magic, not as conscious trade offs between control and ease of programming. This is not a critique to Rust: I appreciate a lot its coherence to its goals. Among them there is not "being a language for unskilled novices".

I am also not suggesting that Java would be the best candidate here (really, I have no clues), but to make a comparison to write decent (not excellent, plainly decent) Java it is sufficient to have an adequate understanding of object oriented design. Essentially, it is quite impossible to write sensible Java without a solid understanding of object oriented programming (at least, of the flavor of OOP that Java promotes). And, on the other hand, with a solid understanding of OOP, the resulting Java tends to be fine (again, not perfect, not super smart or super efficient, but fine). The standard library is rich and well documented and there are widely used libraries that cut the few corners that were left (e.g., Guava, Apache Commons, Netty, etc.). So probably somebody with experience in Python and a decent understanding of the OOP part could be able to design in Java software that does not make me want to puke. Too much. And yes, a lot more needs to be taken into account to write great Java code (and the JVM is a tiny operating system on its own that needs to be understood, albeit that might be easier than getting the same level of understanding of the whole Posix model).

I would be very interested in seeing with would happen with Go. Probably the language is too young to spread in those contexts, but I would still love to see whether the transition from Python to Go is really as easy as it seems.

At the end of the day, the impression Rust left me (after a couple of days of study... so I might still be partial) is that of a language closer to a sensible version of C++ that does not drive people to Lovecraftian abysses of insanity and pain. The comparison is a huge stretch: I am talking about feelings here, not objective features. Rust is effectively much more functional than OOP and it does not really work "like C++". It does not at all. Again: the keyword here is "feeling". Also Rust is immensely agile for a language that offers such an astonishing level of detail on low level semantics.

On the other hand, Go seems as perfectly in between C and Python: it can be seen as a modernized C, or like a super-optimized Python. Go get it!

Thursday, September 6, 2012

Unix lover: snippet 1.

I love Unix so much... I had this latex file and I wrote the names of the files of the intended chapters to be included in the mainfile. Then I realized that I had to basically write all the filenames or do some crappy copy and paste.


cat <<EOF | sed -e's/\\input{\(.*\)}/\1.tex/' | xargs touch

Wednesday, September 5, 2012

Models of human skills: gaussian, bimodal or power-law?

The first thing I have to say is that this post actually contains my own reflections. I have not found clear scientific facts that actually prove these ideas wrong or right. On the other hand, if you have evidence (in both directions) I'd be glad to have it.

On of my first contacts with probability distributions was probably around 7-8th grade. My maths teacher was trying to explain to us the concept of gaussian distribution, and as an example, he took the grade distribution. He claimed that most of the students get average results and that comparatively fewer students got better or worse grades (and the more "extreme" grades were reserved for a few).

I heard similar claims from many teachers of different subjects and I have not been surprised. It looks reasonable that most people are "average" with fewer exceptionally good or exceptionally bad exemplars. The same applies to lots of different features (height, for example).


Lots of years later, I read a very interesting paper claiming that, instead the distribution of programming results usually has two humps. For me, the paper was sort of an epiphany. In fact, my own experience goes in that direction (but here, I may be biased because I have not collected data). The paper itself comes with quite lot of data and examples, though.

So apparently there are two models to explain the distribution of human skills: the bell and the camel. My intuition is that for very simple abilities, the distribution is expected to be a gaussian. Few people have motivation to become exceptionally good, but the thing is simple enough that few remain exceptionally bad either. On the other hand, sometimes there is simply some kind of learning barrier. It's like a big step that not everybody is able to make. This is common for non-entirely trivial matters.

In this case, the camel is easily explained. There are people who climbed the step and people who did not. Both groups are internally gaussian distributed. Another example could be an introductory calculus exam. If the concepts of limit and continuity are not clear, everything is just black magic. However, some person may still get decent marks because of luck or a lower level form of intuition. Among those that have the fundamental concepts clear there is the usual gaussian distribution.

However, both models essentially assume that the maximum amount of skill is fixed. In fact, they are used to fit grades, that have, by definition, a maximum and a minimum value. Not every human skill has such property. Not every human skill is "graded" or examined in a school like environment. And even for those that actually are, usually grades are coarse grained. There is probably a big difference between a programming genius (like some FOSS super stars) and a regular excellent student. Or between, say, Feynman and an excellent student taking full marks. The same thing may be even more visible for things usually perceived as artistic. Of course, it is hard to devise a metric that can actually rank such skills.

My idea is that for very hard skills, the distribution is more like a power law (at least in the tail). Very few people are good, most people are not even comparable to those and the probability to find someone twice as good as a very good candidate is just a fraction.

Just to conclude, I believe that for very simple skills most of the time we have a gaussian. If we have "learning steps" or some distinctive non-continuous feature that is related, then we have multi-modal distributions (like the camel for programming: those that are able to build mental models of the programs are successful, those that only use trial and error until the programs "compile" are unsuccessful). Eventually, for skills related to exceptionally hard (and challenging and gratifying) tasks we have power-law distributions. May we even use such distributions to classify the skills and the tasks? And in those cases, a gaussian distribution is a good thing or not?

So that, given someone that is able to make mental models of programs, maybe Haskell programming remains multi-modal, because there are some conceptual steps that go beyond basic programming, while Basic is truly gaussian? Is OOP gaussian? And FP?

And in OOP languages, what about static typing? Maybe dynamic typing is a camel (no, not because of Perl) because those that write tests and usually know what they are doing fall in the higher hump, while those that just mess around fall in the lower hump. Ideas?

Sunday, August 26, 2012

Does Object-Oriented Programming really suck?

I recently read Armstrong essay on how much OOP sucks. While such opinions would probably have considered pure blasphemy a few years ago, nowadays they are becoming more popular. So, what’s the matter with OOP?

The first thing that comes to mind is that OOP promised too much. As already occurred to other paradigms and ideas (e.g., Logic Programming and Artificial Intelligence) sometimes researchers promise too much and then they (or the industry) cannot stay true to their promises.

OOP promised better modularity, better code-reuse [0], better “whatever”. Problem is, bad developers write crap with every single programming paradigm out there. With crap I really mean “utter crap”. A good programmer using structured programming and a bad programmer using OOP, is that the good programmer’s structured program would probably look “more object oriented” than the other one.

Another problem occurred in the OOP community: OOP “done right” was about message-passing. Information Hiding, Encapsulation and the other buzz-words are a consequence of the approach (because object communicate with messages, their internal state is opaque) not a goal of OOP. Objects are naturally “dynamic” and the way we reason about code is separating concerns in objects having related concerns and having the objects communicate in order to achieve the task at hand. And I’m a bit afraid of talking about “OOP done right”: I really just have my vision. OOP is very under-specified, so that it becomes very hard to criticize (or defend) the approach.

However, OOP becomes “data + functions”. To me data is simply *not* part of OOP. It’s an implementation detail of how objects maintain state. As a consequence, I do not really see data-driven applications as good candidates to OOP. Once again, OOP was sold as universal. It is not. Consider the successes that functional programming is achieving (performance and concept-wise) in this area.

This “data + functions” comes from C++ being the first commercially successful OOP platform. The great advantage was that the transition from structured programming to OOP was quite more easier, that programs could be far more efficient (read “less runtime”) than message passing dynamic OOP systems, at least back in the day. However, there was so much missing from the original idea!

Since back then classes were considered good (and C++ had — and has — no separate concept of interface) + static typing and a relatively verbose language with no introspection, it became rather natural to focus on inheritance. Which later was proved to be a bad strategy. Consequently, there is less experience in building OO software than one would think, considering that for a large part of its history OOP was dominated by sub-optimal practices.

So, what is really bad with OOP? Following Armstrong:

Data structure and functions should not be bound together

Agreed! But I believe that Data Structures are not *truly* part of OOP. Let me explain.

Data is laid out in some way. There is a “physical layout”, for example. You can express that very precisely in C with extensions, to the point of exactly specifying the offsets of the various datas. A phyisical layout also exists in high level languages such as Python. However, it is not part of the Python model. Of course with the struct module or ctypes you can fiddle with it (but they are libraries), however, for the most part, it is just outside the conceptual language used to describe the problem.

Data is also laid out in logical ways. Computer Science defined many data structures and algorithms over them. You can use OOP to express such structures and such algorithms. Still they are not “part” of OOP, they are just expressed in an object oriented way (or maybe not, and remain at structured programming level).

One very good practice in OOP is not using together in the same body of code objects reasoning at different levels. So, for example, you have your low level business logic code that is expressed in terms of data structures, high level business logic expressed in terms of low level business code… and that’s it. Functions is not really together with data. You don’t have “data”. You have some layers of objects.

The point is that OOP is about creating languages at semantic level (you usually do not get to change the syntax). That’s it. If the language is good, well… good. If the language is bad, ok, we’ve got a problem.

Is this suboptimal? In a way, yes. All the indirection may be very expensive. And since abstractions, well… abstract, you may find yourself with a language that is not expressive enough (at least not at the expenses of additional performance costs). Still functions and data structure is not bound together. You just have objects and messages. No “functions” and no “data”. Please notice: this may not be the right thing to do. Still, the problem is not with data + functions, is just related to applying OOP to a specific domain that is ill suited for the approach. OOP is not for every task in the world. But the same objections applies to every system that is built as a stack of layered abstractions.

Please also consider the Qc Na story! Objects are not necessarily opposed to functional programming (you can see them as closures that make different actions available). Objects are just about state + behavior + identity.

Everything has to be an object

And this is something that I don’t think is a problem. It is like saying that Scheme is bad because “everything is a list” (which, strictly speaking, is not true, there are atoms and lambdas etc). If you do not want to program OO, then don’t. If you want, you probably want that everything is an object.

The only issue with everything is an object is, sometimes, performance. From a variety of points of view, such strategy may kill performance. Objects usually have indirections (read pointer). 64 bit pointer for every integer is bad. So we have hybrid stuff like Java and then we have to deal with boxing (either manually in the past or automatically right now). Performance issues can be somewhat “fixed” using proper JIT systems or other optimization techniques. Or providing libraries that do the right thing (see Numpy, for example).

Other than that, I prefer objections in the line of “this thing that should be an object is something else” than those requesting that something that is an object really is. So, I can favorably consider an objection that says that, in Python, “if” is not an object (true). Even though I’m convinced that having if as a statement is not bad either.

And the whole objections with “time”… well, time (and dates) are a bitch to handle. But I find that Python datetime module gets as close to perfection as humanely possible. I really can’t see describing time with a bunch of enums + some structures as an improvement. On the other hand, it looks to me as one of the cases where it is *easy* to see that the OO approach works better.

Consider the related problem of representing a date in a locale. If you introduce a new representation of time, you need either to modify the original function or create a new function (maybe one that works with both things). Creating an interface, on the other hand makes it very easy to introduce new representations of time.

Objection 3 - In an OOPL data type definitions are spread out all over the place

And once again, in OOPL languages data type definitions are not spread out all over the place. They are in C++ and perhaps in Java. And even in Java, if things are done properly, you reason in terms of interface, i.e., in terms of typed messages, not in terms of data structure.
“”“In an OOPL I have to choose some base object in which I will define the ubiquitous data structure, all other objects that want to use this data structure must inherit this object. Suppose now I want to create some “time” object, where does this belong and in which object…
Here it is pretty clear what he has in mind. I think that it is clear that, conceptually, time is an interface and that there is really no need for it to define “data structure”. And if your language is duck typed, the interface is implicit and you have finished before you started.

Objection 4 - Objects have private state

And here I have to agree: state is the root of all evil. Problem is that entirely stateless systems are simply unpractical. We reason in terms of state. We can describe lots of things as stateless, but somewhat we have state. We have files. We have documents. We want them saved and retrieved.
So, we have to deal with state. Agreed. But: 1. imperative programming is about state2. object oriented programming is not, strictly speaking
You can use a “functional” style in OOP, for example. Most of the times, you don’t do it because the code becomes harder to write. But in many situations, on the contrary, you do it because it makes it easier. Often I write immutable objects that cannot be modified: they are not more stateful than a parameter in a function.
Sometimes this is not practical. Ok. And surely using a very stateful style of programming is bad. State in OOP is expressed as behavior. That’s it. Sometimes is done properly, sometimes it is not. Some strategy to deal with state are extremely nice (STM, as an example). Others are not. Also consider how “modern” functional languages really merge concepts from OOP (without being OOP) and how “modern” OOP languages merge concepts from FP (without being functional).
Examples: Python from the OOP side, using generators, lazy evaluation, list comprehensions, closures, etc etc etc.Clojure from the FP side… and all the features it has to express what resembles interfaces, STM and so on…


[0] someone once told me they feel lucky when they can actually use the code effectively once, let alone reusing it.

Sunday, August 19, 2012

Java 7 for Mac OS X

And it was about time...

Download Page

Luckily now Java for OS X should not lag to much behind.

Saturday, July 28, 2012

Orwell Dev-C++: Dev-C++ Released

Not a big fan of Windows C++ IDEs.... well I'm not a big fan of Windows, I'm not a big fan of IDEs and I don't particularly like C++ either. As far as I'm concerned, if you really need a C++ IDE for Windows you should have started using CodeBlocks or Eclipse. However, if you really want to stick with Dev-C++, at least grab this new version.

The importance of using a modern C++ compiler should not be overlooked.

Orwell Dev-C++: Dev-C++ Released: Time for another pile of bug fixes. I've also added a few features, like an updated set of built in compiler options and full file path hint...

Monday, July 9, 2012

On language level

When historians started to name periods of time, they roughly divided human history in three main periods: Ancient History, Middle Ages and Modern History. Although there is not general consensus on the actual subdivision (I remember something about Burdach and Burckhardt), that is the general idea.

Modern history essentially starts with the 15th century (again, this is debated) and goes on. Then there is the idea of contemporary history, which is a moving target and regards the last 80 years. I somewhat believe that perhaps we need a better name to refer to what happened in the 20th century that is going to last after it is not contemporary anymore. Still, it is not my job.

Something similar occurred with Art History, where Modern Art goes from 1860 to 1970. Then we have Contemporary or PostModern. So in a few years, a new name will be needed... like Post^2Modern. Don't care.

And we basically have the same trouble. In 1960 Fortran was a high level language. So every non machine level language is a high level language. And we call some languages very high level.
In a few years will we have "overwhelmingly high level languages"?