The Perl guy in Python land: Getting past the whitespace issue
From AJS.COM
- Part of the The Perl guy in Python land series of articles.
Whitespace. Many C-like language adherents just can't get past the idea that languages like Python (and there are quite a few now), have done away with blocking syntax. It's tricky, and has its downsides, but frankly, it's just a stylistic choice. But let's start from the beginning. What is the whitespace issue and why does it (or doesn't it?) matter?
Contents |
Blocking in C and Perl
In C, and almost all C-derived languages like Perl, C++, Java, etc., braces are used to denote blocks of code which exist within a similar scope. It looks like this:
for(i=0;i<x;i++) {
do_stuff(i);
printf("%d\n",i);
}
In this case, the example is C, but Perl looks about the same:
for($i=0;$i<$x;$i++) {
do_stuff($i);
print "$i\n";
}
Blocking in Python
However, in Python, the indentation serves to tell the interpreter what level of scope any given line of code is in:
for i in range(0,x): do_stuff(i) print i
Notice that there is an indication that a new scope starts (the :), but there is no indication where it ends. It is only the indentation that tells the parser that print i is part of the inner block which the for loop will execute.
The good
This sort of blocking can substantially reduce the amount of otherwise empty lines in a program which really do nothing but close (or open, depending on how you format your braces) blocks. This, for example, is a very common sight in Perl programs:
}
}
}
return $x;
}
This can stretch code out and make it harder to read and comprehend at a glance. While this hasn't been that serious of an issue, it's one Python never has to contend with. No matter how indented code is, you can always start your next line of code in the outermost scope by simply not indenting it.
There is also the matter of enforcing correct indentation. Indentation in Perl is not at all a requirement, and in fact the language doesn't know anything about your indentation or care. You can indent code in such a way that it seems to imply one set of scoping and actually has another. This can be difficult to debug.
In practice, I've not yet seen any benefit from this latter theoretical advantage of Python's blocking, but it's a fair point to make in the abstract.
The bad
The primary problem with whitespace as syntax is that it imposes rules on how whitespace can be used and represented. Primary among these problems is the historical use of the tab character. A tab is a single ASCII
character that represents a horizontal shift to the next "tab stop". Sadly, since tab stops have no universal defined layout, many programmers choose to set them differently, and this simply can't be in Python because changing indentation changes the code. To solve this problem, Python simply doesn't allow tabs in code. Most programmers who write Python use an editor (like vi, vim, emacs, eclipse, etc.) which inserts spaces when tab is typed, but this isn't universally implemented as a default, so many beginning Python programmers suffer through a few errors until they get their editor configured not to insert tabs.
Other than the editing impact, the primary problem with whitespace as syntax is that editors do want to manipulate it, and horrible, awful things can happen if your editor decides to indent a section of code to its taste!
Verdict
So far, I have yet to be significantly burned or benefited by Python's blocking style. In essence, it's been a non-issue except for the mental hurdle that it represents to a programmer used to Perl and C-derived languages. There's nothing more or less attractive or readable to someone who knows the language, though people who are new to Python will often tell you that it makes code harder to read and Python programmers who are not as well versed in C-derived languages will often tell you that it makes Python easier to read. This, as with many language comparisons is simply a matter of familiarity and a few edge cases.
Side note: The parser
I should mention, here, that Python's default parser under Linux systems and many other platforms is truly a step backwards from the modern parsers available for C, C++, Perl, and most other languages. Error messages are often cryptic and lack sufficient context to understand what exactly went wrong. For example:
for x in (5): pass File "<stdin>", line 1, in <module> TypeError: 'int' object is not iterable
This isn't really helpful to the newbie, and the lack of inline annotation makes it hard to be sure where the error actually happened unless you know the error and what specific types of mistakes it is likely to indicate.
This sort of lax reporting in parser errors is probably the most significant problem I've had with the language thus far. Just to compare, here's a similar Perl error:
for $x 1 {}
Number found where operator expected at -e line 1, near "$x 1"
(Missing operator before 1?)
As you can see, the Perl error points out exactly where in the line the error happened and suggests what might be wrong in a more general sense. This is what modern programmers have come to expect, and it's rather shocking that a modern programming language so often (but not always) produces errors that don't help the programmer to resolve the issue.
BlogMarks
del.icio.us
digg
Fark
Furl
Newsvine
reddit
Segnalo
Simpy
Slashdot
smarking
Spurl
StumbleUpon
Wists