Python's mutable default data types
When I first started learning programming (using python), I loved how explicit things were. a=5
assigns the value 5
to the variable a
, b = [1,2,3]
creates a list with 1,2,3 and assigns it to the variable b
. Pretty quickly, you want to create a new list based on an old list. So you might naturally search for "adding an item to a list in python" and come accross the .append
method. So you might try this:
original_list = ["I was here first"]
new_list = original_list.append("I was there next")
print(new_list)
# None
What?
print(original_list)
# ['I was here first', 'I was there next']
What???? At that point I didn't have any exposure to immutable data structures and had not yet been inducted into the cult of functional programming, I was just a new programmer who found this behavior confusing and unintuitive. I hadn't done anything to my original variable, why was it changed? Why did my assignment fail?
The way to 'fix' this was to use the overloaded +
operator to 'concatenate' two lists:
original_list = ["I was here first"]
new_list = original_list + ["I was there next"]
print(new_list)
# ['I was here first', 'I was there next']
print(original_list)
# ['I was here first']
Of course, the documentation will explain why this occurs with the append method (and the group of other methods that act on lists in place). And this may not seem like a big deal to someone with object oriented programming experience (.append()
is a method for the mutable list
class), but it was not intuitive to me as a new programmer, and I filed that away as something to watch out for. Is this really confusing? How could it hurt? Well let's look at the following example posted on twitter by Jake VanderPlas (@jakevdp):
What do you think the answer is? And what do you think it should be?
When I looked at this, >50% of respondents said it should print [1] [1]
, yet if you try it you get: [1,1] [1,1]
. People following Jake are likely not that new to Python, so I think it is safe to say this behavior (especially in a function) is not intuitive. That type of behavior is almost acceptable outside of functions, but this seems totally crazy. Here is another example:
def append_unsafe(l):
return l.append("from the function")
original_list = ["I was here first"]
function_return = append_unsafe(original_list)
print(function_return)
# ['I was here first', 'from the function']
print(original_list)
# ['I was here first', 'from the function']
Functions are supposed to be islands of sanity in a mutable, stateful world. They should take arguments, operate on them in isolation, then return the result:
def add_is_safe(n):
return n+2
n_add = 2
n_result = add_is_safe(n_add)
print(n_result)
# 4
print(n_add)
# 2
That is what you expect, but you can see in the case with the list we have this horrible mutation of an argument (the original list) outside of the function scope. I'm sure you can imagine the bugs and confusion that can result from this even in simple scripts.
Immutable data hack
To get around this, and work as if I was in a functional language with immutable data structures, I started doing the following:
def append_immutable(l):
l = l.copy()
return l.append("from the function")
original_list = ["I was here first"]
function_return = append_immutable(original_list)
print(function_return)
# ['I was here first', 'from the function']
print(original_list)
# ['I was here first']
This creates a copy of the argument in the function scope assigned to the same name (in the function scope) so that any operations are done (as they should be) on the local variable in the function. This local variable is then returned.
I imagine this is rough on memory (I'll test it at some point), but to me the safety is worth the memory abuse. I do this whenever I operate on the mutable types in python. It has saved me from a number of mutable data-induced headaches since being spoiled by Clojure. Did I mention how awesome Clojure and functional programming are? If you haven't tried it, you should really try Clojure.