sep.join(string_framents_sequence)which frequently just turns into:
''.join(string_framents_sequence)Of course, sometimes the advice goes as far as "never" ever concatenate two strings". In some sense, the advise is good, but it is often given for the wrong reasons. In fact, most of the times the '%' (or format) idiom is simply clearer. Compare:
'[%d:%d] %d %s>' % (min, hour, ret_code, path)with:
'[' + min + ':' + hour + '] ' + ret_code + ' ' + path + '>'Anyway, recently it has been disputed whether the str.join alternative is so much faster than repeated concatenation, given the fact that the += operator has been improved. My goal is not to investigate this matter.
BaselineAt first, I decided to benchmark this simple function:
def baseline(loop_count): for num in xrange(loop_count): str(num)Since my first testing code is creating strings from a bunch of integer, here I am just benchmarking performance of the bare operation, without actually building the string. The second step is to accumulate all the fragments. The idea is to discover the point where thrashing starts, that is to say the range where the operation behavior remains linear. Outside these zone, we must figure out other factors and in fact the benchmark is not relevant. A third baseline tests uses list comprehensions.
def baseline2(loop_count): lst =  for num in xrange(loop_count): lst.append(str(num)) return lst def baseline3(loop_count): return [str(int) for int in xrange(loop_count)]If we examine the plot of the execution times, we can see that when we keep in memory all the string fragments, thrashing starts after 107 iterations. The graph is logarithmic in both axis. As a side note, using list comprehensions is twice as fast than manually adding stuff to the list one by one.
Tested CodeThese are the functions I actually benchmarked. Full code for the benchmark can be found in my git repository.
def str_sum(loop_count): out_str = '' for num in xrange(loop_count): out_str += str(num) return out_str def str_join(loop_count): str_list =  for num in xrange(loop_count): str_list.append(str(num)) out_str = ''.join(str_list) return out_str def str_join_lc(loop_count): return ''.join(str(num) for num in xrange(loop_count)) def string_io(loop_count): file_str = StringIO() for num in xrange(loop_count): file_str.write(str(num)) out_str = file_str.getvalue() return out_strstr_sum is essentially string concatenation using operator +=. str_join and str_join_lc are string concatenations performed with str.join. The str_join_lc variant uses list comprehensions. string_io is a StringIO based method (here it is the StringIO from cStringIO). Code has been adapted from the code benchmarked here. However, I decided not to use the `` operator, even though it is faster. Since it is going away from Python, I believe it is not relevant anymore. Moreover, using str in place of `` uniformly slows down all the methods.
ResultsIn this graph, I present results relative to Python2.6. Essentially, nowadays the faster method to create the strings has been proven to be the once advised str.join. However, I believe that difference is not great enough to change habits.
def str_sum_b(loop_count): out_str = '' for num in xrange(loop_count): out_str = out_str + str(num) return out_strand I found no real difference with the variant with +=. Wow! The benchmark was executed on my MacPro. Python2.5 and Python2.6 are the Python variants distributed with the OS (Snow Leopard). Python 2.7 has been dowloaded from Python.org. In this other post I investigated string concatenation using pre-built list of string fragments. The results are quite different, indeed! Technorati Tags: Python, Optimization, Efficiency, String Concatenation, String Append