Meditation on the Zen of Python

Read it

If you have ever programmed anything in Python, you probably used the “import” statement: the modules of the Python standard library  can be imported into your code or into the interpreter. Take a look at the standard library folders and you’ll find the “this.py” module… what is that? Not much a self-explicative name for a Python module, huh? And you – Java lovers – forget about the Java “this” keyword: you’re far afield.

This module is the mystic “Zen of Python”:

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Woohaaaa!!! What?!?! A sort of mantra???

The Pythonic view of the software universe

Kidding apart, the Zen states the high-level development guidelines that were followed in the design of the Python language itself; it was formerly stated into the PEP-20 by Tim Peters, one of the fathers of the language along with Guido Van Rossum (BDFL). Ok, I’m curious about it: I open the this.py code in my favourite text editor and I notice that…

The Zen of Python does not obey the Zen of Python

What??? Here is the source code:

s = """Gur Mra bs Clguba, ol Gvz Crgref

Ornhgvshy vf orggre guna htyl.
Rkcyvpvg vf orggre guna vzcyvpvg.
Fvzcyr vf orggre guna pbzcyrk.
Pbzcyrk vf orggre guna pbzcyvpngrq.
Syng vf orggre guna arfgrq.
Fcnefr vf orggre guna qrafr.
Ernqnovyvgl pbhagf.
Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf.
Nygubhtu cenpgvpnyvgl orngf chevgl.
Reebef fubhyq arire cnff fvyragyl.
Hayrff rkcyvpvgyl fvyraprq.
Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff.
Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg.
Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu.
Abj vf orggre guna arire.
Nygubhtu arire vf bsgra orggre guna *evtug* abj.
Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn.
Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn.
Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr!"""

d = {}
for c in (65, 97):
for i in range(26):
d[chr(i+c)] = chr((i+13) % 26 + c)

print "".join([d.get(c, c) for c in s])

The first approach to this code might be bewildering… but it’s not so hard to understand, in the end: basically, you have a huge string containing the whole crypted Zen and then you decode it into readable English characters and print it out loud. A few hints:

  • 65 is the ASCII for ‘A’
  • 97 is the ASCII for ‘a’
  • there are 26 letters in the English alphabet
  • the “d” dictionary turns out to have uppercase/lowercase chars as keys and their corresponding translitterated chars as values. The “crypting magic” is given by: i+13 % 26 + c
  • You have that “A”= decrypt[crypt[“A”]] = crypt[crypt[“A”]]Oddity: the Zen does not follow many of its aphorisms! In fact, its code is far from being explicit, and if it’s true that readability counts, well, the Zen doesn’t shine at it. Ok, practicality beats purity but this is complex (not complicated) to read out; in fact the implementation could be simpler to explain, which conveys that this could be done in a better way.

    A metaphor

    My intention is not to disapprove Tim Peters’s work (never be it! I am just a silly rookie!!!) but to show what I think about the Zen: I think that it is a metaphor. It basically poses a problem to its readers, who need to “decipher” it in order to understand how it really works: this is a strong metaphor of life – if you dig deep on problems/difficulties you come up to be sage about them. And so goes for Python design guidelines.

    … and considering that “now is better than never”…

    … while writing this post, I scribbeld (it was funny!) a revised version of the Zen of Python. It shows a few additional features (get random aphorisms, seek for specified keywords) that can help developers to better read and lookup the original Zen of Python. Features that – hopefully – comply with what the Zen itself says Occhiolino

How to use Memcached with PyOWM

This is just a little demonstration on how you can quickly change the basic cache provider provided by the PyOWM library. For this purpose we’ll use Memcached, which – simply put – is a key/value in-memory data store: this turns it into a perfect caching mechanism. I’ve never used Memcached before writing this post: this shall be a good moment for me to get to know it. This demo requires that you work on a Linux env, as Memcached originally is shipped for Unix-like systems via packet distribution systems (but can nevertheless be compiled from source).

I’ll use Ubuntu, with Memcached 1.4.6 and PyOWM 0.4.0. Let’s dive into it.

First we install Memcached and the relative Python bindings:

sudo apt-get install memcached python-memcache

Then we install PyOWM library and check the installation:

sudo pip install pyowm
ls /usr/local/lib/python2.6/dist-packages # check installation

(in my distro, Python packages are installed by pip in the /usr/local/lib/python2.6/dist-packages folder: change accordingly to yours) In order to “plug” Memcached support into PyOWM we are going to leverage the installed Python bindings by creating an adapter class that can conform the SW interface that PyOWM expects into the Memcached API for getting/setting cache elements. Fortunately, the Memcached API is very close to the PyOWM expected interface (which is stated into the pyowm.abstractions.owmcache.OWMCache class), so we have chances that our adapter will be simple enough. Let’s name it “memcachedadapter.py“: you can put it anywhere, provided that this anywhere is “seen” by the Python intepreter: in example, you can put it into a folder listed into the PYTHONPATH variable or you can place it directly into the PyOWM install folder. I did the latter:

cd /usr/local/lib/python2.6/dist-packages/pyowm
sudo vim memcachedadapter.py

The module will contain the MemcachedAdapter class:

#!/usr/bin/env python

class MemcachedAdapter():
  """
  Realizes the pyowm.abstractions.owmcache.OWMCache interface
  adapting a memcache.Client object
  """
  __ITEM_LIFETIME_MILLISECONDS = 1000*60*10 # Ten minutes

  def __init__(self, hostname="127.0.0.1",
    port="11211", item_lifetime=__ITEM_LIFETIME_MILLISECONDS):
    from memcache import Client
    self._memcached = Client([hostname+":"+port])
    self._item_lifetime = item_lifetime

  def get(self, request_url):
    return self._memcached.get(request_url)

  def set(self, request_url, response_json):
    self._memcached.set(request_url, response_json, self._item_lifetime)

I wrote this adapter in 5 minutes, so please don’t blame me for errors 😉 It can surely be improved. Now what is left to do is to tell the PyOWM library how to use the adapter: this is done via configuration. The library uses OWMCache concrete instance which is created into a configuration file and injected into the code; a separate configuration file exist for the code supporting each Open Weather Map web API version. Currently only API version 2.5 is supported, so we’ll put our adapter into the pyowm.webapi25.configuration25.py file, commenting out the default cache object:

# Cache provider to be used
# cache = NullCache()  # <-- comment this line
from memcachedadapter import MemcachedAdapter
cache = MemcachedAdapter("127.0.0.1", "11211")

As you can see, we are adapting a local Memcached instance listening on the default 11211 port, but you can change this configuration as needed. Now let’s try it out – let’s start Memcached and use the PyOWM library:

memcached &
python

in example:

>>> from pyowm import OWM
>>> owm = OWM()
>>> f = owm.daily_forecast("London,uk")  # This first call to the API is not cached, obviously
>>> g = owm.daily_forecast("London,uk")  # This second call is cached

Time saving should be at a glance.

In a similar way it is possible to write adapters for plugging other cache/storage providers (Redis, MongoDB, etc..) into the PyOWM library.

EDIT: this post stimulated me to write more adapters, you can find them here.

My first GitHub project

It’s been a while since I wrote here:  lately I’ve spent a lot of my spare time to organize and code my first  GitHub project ever.

Why an open source project on GitHub?

The reasons I decided to setup a GitHub account and launch an open source project are quite simple:

  • I’ve been living on the shoulders of the open source community for years and I’ve always been proud of what it gave me. The best way to be thankful is to give my commitment and code for free to everyone!
  • GitHub is a nice place where programmers can show their skills to the world (friends, fellow programmers, potential new employers). I mean: not only coding skills, also organizational and communication skills, as well as mind openness.
  • My desire is to use GitHub to link and cohoperate with others like me, sharing my same interests
  • I’m sure that open source cohoperation will teach me a lot of things: I have a lot to learn from the code masters
  • Last, but not least, it’s a good chance to practice with a few languages – first of all, Python

The PyOWM library

So the question was: what will my open source project be about? A few minutes after that question raised in my mind I ran into the OpenWeatherMap website, which basically is a webportal disseminating world weather data that are openly contributed by the user community. I noticed that the site provided a data web API, that had been created ages before and, of course, lots of code projects have been popping out since regarding this API. I took a look at the client wrapping libraries that have been created for the API and noticed that no Python client wrapper were mentioned; I also googled a bit and I found that only one attempt of Py-wrapping this API had been made since (pretty rough, not supporting the latest API version and its last commit dates back to more to the beginning of the year).

So, it was a deal: a Python client wrapping library that could allow users to interact with the OpenWeatherMap web API via a simple object-oriented model.

The PyOWM library was conceptually born.

State of the art

I worked hard to shape the library, and now most of the web API features are covered. I’ve developed it using a Test-Driven approach and keeping it as minimalist as possible. I hope this work will be useful to as many people as possible.

Now I need to “sponsorize” my creation with the OpenWeatherMap keepers, the OWM community users and gather help to test and improve the library.

How to contribute

Do you want to help my open source project grow? There are infinite ways you can help: report issues, submit new feature requess, test on specific architectures, port to different Python versions, mention it in your blogs/user communities… and of course help in coding if you are able to!

Visit the GitHub page of the PyOWM library

Thank you and cheers! 😉

Command-line software design: 5 more advices

Ok, folks, ready to take off with 5 more CLMs (Command-Line Modules) design advices?  This is part II of a posts strip, part I (with the first 5 advices) is in my previous post.

1. Provide meaningful messages

AKA: “What am I doing? I am existing…”

Your CLM should provide insight into what it is currently doing. The difficult part is to decide how much detail you want to provide to the user…and you might argue: “Ok, but you can always use log level filtering and then let the user decide the verbosity” – this is perfectly right, but I’m talking about on-screen messages. My advice is to print out a specific message which conveys what the CLM is currently doing, with a detail level which should be just enough for the user not to say “It is talking rubbish”! So, what is really vital is that you avoid using simple and generalistic messages like “Computing” or “Executing” and – on the other hand – that you avoid using hyper-detailed expressions such as “Inverting matrix – computing determinant of the 3rd 2×2 submatrixif they are not meaningful to the user. Of course if the focus of your CLM is matricial inversion that shall be fine, but it shouldn’t be if your CLM is – in example – focused on a higher-level problem which is solved using matricial inversion.

…And, please, never print out the raw counters in nested for loops. It happened to me just a couple of days ago to run an image-processing CLM provided by a project partner: this was the output of a successful run

claudio@laptop:~$ python img_processing_clm.py input.tif output.tif
Conversion to 8bit took 23.567 seconds
1
2
3
4
5
6
#2000 or so more lines
The variance computation took 367.145 seconds
...

Each and every row index is printed out….It is just irritating!!!

2. Gracefully fail

AKA: “I don’t want to see each blood drop spreading from your wound”

As a CLM user, would you prefer seeing this:

claudio@laptop:~$ python myclm.py /var/clmdata/testoutdir  #we are missing the first parameter
  Traceback (most recent call last):
  File "myclm.py", line 3, in <module>
  inputfile = sys.argv[1]
  IndexError: list index out of range
claudio@laptop:~$ echo $?
1

..or this?

claudio@laptop:~$ python myclm.py /var/clmdata/testoutdir  #we are missing the first parameter
  ERROR: you must specify an input file
  Usage:
    myclm.py <inputfile_path> <output_path>
claudio@laptop:~$ echo $?
1

The correct answer would be: none of them! But you can’t expect that your CLM is working fine every time. So it is important to let users know what reasons made the CLM stop running. A nice design choice is to detect possible error conditions and treat them so that your CLM “says something of interest” and terminates with a known exit status: this can be done quite easily if you use languages (eg: Java, Python, etc..) that provide formal exception/error handling constructs – in other terms, the usual try/catch blocks.
Graceful failures are delightful for the user, but may not the best approach to handle error situations while you are still writing your CLM because they may not give you enough information if you need to debug. So my advice is to add them only when you are pretty sure that you won’t make further heavy changes or do any more refactoring on your CLM.

3. Organize your CLM folder

AKA: “I am the Borg … I bring order to chaos” (Borg Queen – Star Trek: First Contact)

Order in organizing your code is good. This translates directly into the fact that a well-structured CMS is easy to understand and modify, and can be efficiently used in a small amount of time. My advice is to adhere to widely adopted or standard program folder structuration patterns: I usually have my CLM’s folder in this fashion

CLM-folder/
  |--bin/     #Binaries: main CLM program and dependancies
  |--doc/     #Documentation about CLM usage/installation
  |--src/     #Source files
  |--static/  #Static data: config files, static inputs, etc.
  `--test/    #Tests

4. Minimize filesystem usage and leverage temporary folders

AKA: “Forbidden: you don’t have enough permissions to write the file”

As a general advice, don’t rely on the safety of filesystem operations. If your CLM needs to store intermediate data try to do that in-memory, and if it’s not possible and therefore you are compelled to use the filesystem, your target should be to put the least complexity between your CLM and your data. Reading data from filesystems seldom is a problem, but writing often is, and the amount of adversities you might face depends on a variety of factors such as the architecture (never tried to write in a folder for which you don’t have ‘w’ permissions?), the possible concurrency in data modification, the remoteness of the target filesystem and so on.

Another misused – but smooth and clever – technique is to leverage temporary folder support provided by the operating systems. In my experience with bash programming, I’ve always seen people doing local computations as follows: input files are copied into the same folder of the executing binaries, then intermediate files are written in that folder (usually, a lot  of files), and in case of successful CLM end intermediate files are deleted. This always made me angry, because often their programs were  buggy and therefore never got to their natural end, which forced me to press CTRL+C…leaving all of those intermediate files undeleted in the folder. And this meant: I myself would have to delete them!!! :-$ To solve this issue, I simply suggested those people to leverage the “mktemp” Linux command, which creates a temporary folder with a pseudo-random name under /tmp and returns its name: one can then use this folder to do whatever she/he likes – i.e. writing the CLM execution’s intermediate rubbish.

It’s as easy as follows:

claudio@laptop:~$ tempdir=$(mktemp)
claudio@laptop:~$ echo $tempdir
/tmp/tmp.hyYKY21864

5. Leverage absolute paths

AKA: “Time – as well as folder location – is relative”

When you provide paths as arguments for CLMs it is a very good practice to give them in an absolute fashion. If you give absolute paths, there’s a pretty good chance that your CLM  addresses files and folders in the right way. And my advice is: always handle absolute paths internally to your command-line softwares…in fact, this will prevent you from using terrible solutions like the “cd” (change directory) command, which will mess the whole thing up if you are using relative paths because the root folder they are resolved against changes!

A little coding exercise: let us write a small bash script (copier.bash) that takes reads a file and echoes its contents to a file named “results.out” which will be created in a directory of our choice. We want it to have this interface:

copier.bash <inputfile_path> <output_path>

and here is the code (as you can see I’m using the “cat” executable which lies in the /bin path on my Linux system):

#!/bin/bash

inputfile="$1"
outputdir="$2"

bindir="/bin"

cd "$bindir"
cat "$inputfile" > "$outputdir/result.out"

Now if we setup the environment like this:

claudio@laptop:~$ cd /opt/copier
claudio@laptop:~$ mkdir output  #we create the output folder
claudio@laptop:~$ tree .
.
|-- copier.bash
`-- output
1 directory, 1 file
claudio@laptop:~$ echo "italia has got talent" > input.txt #we create the input file
claudio@laptop:~$ bash copier.bash input.txt output        #we run the script
copier.bash: line 9: output/result.out: No such file or directory

As we expected, the “cd” inside our script is messing up everything and the bash shell is complaining about the fact that after it, it is impossible to find the “output” subfolder (which, in absolute terms, is: “/bin/output” !!!)

Also the following command-line fail:

claudio@laptop:~$ bash copier.bash input.txt /opt/copier/output
cat: input.txt: No such file or directory

This time it’s the “cat” executable complaining for the missing input.txt file, which it expects to be here: “/bin/input.txt

The right way of running this script would be:

claudio@laptop:~$ bash copier.bash /opt/copier/input.txt /opt/copier/output
claudio@laptop:~$ cat output/result.out
italia has got talent

You can see that: one must know in advance that absolute paths must be used. And consider that we were lucky to have a textual CLM, what if we had a compiled one? Lesson learn: never use “cd” and leverage absolute paths!

How to deploy Flask applications to Apache webserver

This is a simple guide explaining how I managed to configure Apache 2.2 httpd server on a Windows platform so that it can serve a Python webapplication I wrote using the Flask micro-framework. The guide is valid, with a very little modification, also on Linux environments (you geeks know how to do)

Why I needed to to this

I developed this application at work and ‘ve been serving it from the beginning via the Flask’s built-in minimal webserver: unfortunately this  is not enough for production stage as I need a more robust server with SSL capabilities, which Flask’s has not. This was my first time in deploying a Python webapp…So, after googling a bit and reading the Flask deployment notes, I came up with the answer: what I needed was a WSGI-compliant server running on my target platform, a Windows 2012 server. The natural choice to me was to enable the WSGI module on the “good ole” Apache webserver, which I’m experienced with.

Steps

Flask app

We choose a folder in which we place the Python code. For instance,

D:\webapps\test

In this folder we create the real Flask webapplication that we want to deploy (file “test.py”):

from flask import Flask, request
app = Flask(__name__)

@app.route('/hello')
def hello_world():
    name = request.args.get('name','')
    return 'Hello ' + name + '!'
if __name__ == '__main__':
    app.run()

The Apache server won’t be aware of “test.py” at all. What you need to do now is to write in the same folder a Python file named “test.wsgi” that we will link into the webserver’s configuration: the code in this file will import the main Flask application object (built in our case as a singleton) and will be actually executed by the WSGI module of Apache. In the code, it is vital that you DON’T change the name of the “application” variable, as it is exactly what the server expects to find. Also please note that we are extending the Python classes path to include our own webapplication’s folder. This is “test.wsgi”:

import sys

#Expand Python classes path with your app's path
sys.path.insert(0, "d:/webapps/test")

from test import app

#Put logging code (and imports) here ...

#Initialize WSGI app object
application = app

As an additional remark, if you want to put any logging code (e.g: file/e-mail/console loggers) into your Flask app, you must put it before the if __name__ == ‘__main__’ block, otherwise it won’t log anything! Add your loggers to the app object.

Apache setup

Ok, what’s next? Now it’s all about installing and properly configuring Apache.

First: install Apache webserver. I downloaded and executed the .msi installer. Apache was installed at

"C:\Program Files (x86)\Apache Software Foundation\Apache2.2"

Second: install the WSGI Apache module. Pay attention to download the module compiled for your specific combination of platform and Python and Apache versions: I downloaded this module. Once downloaded, rename the .so file into “mod_wsgi.so” and put it under the “modules” subfolder of your Apache installation folder. Then you have to tell Apache to use it: open in a text editor the “httpd.config” file which is under the “conf” subfolder and add the following line at the bottom:

LoadModule wsgi_module modules/mod_wsgi.so

Third: restart Apache.

Now Apache is ready to serve WSGI webapplications. What is left is to tell about where our application is and match it to a URL alias. It’s child’s play: open in a text editor the “httpd.config” file we used before and add these lines to the bottom:

<Directory d:/webapps/test>
    Order allow,deny
    Allow from all
</Directory>
WSGIScriptAlias /flasktest d:/webapps/test/test.wsgi

(nevertheless, I prefer to place the per-virtual-host or per-alias configurations’ stuff into separate files and then use an Include directive into the main “httpd.conf”)

Now restart Apache again and if you open a browser and point it to:

http://localhost/flasktest/hello?name=claudio

and you should see the webapp’s greetings!

Further references

  • This guide helped me a lot in understanding how to setup Apache WSGI.
  • I also found this tutorial which is far more comprehensive than mine and covers Flask deployment on Apache on Debian/Ubuntu environments

Command-line software design: 5 advices

During the last years I developed several command-line utility tools, using several languages and for different environments. Attempts, learning and – of course – errors led me to clear my mind up and to adopt a series of design guidelines which I find very useful for any kind of command-line tool development – ranging from the simplest script to the most articulated modules – and which I’m willing to share. As you’ll notice, the guidelines can be generalized, as they simply represent common sense approaches in SW design!

Here I’m reporting just a few in “humurous” terms Sorriso (I’ll share more with you in future posts as they come out from oblivion).

From now on, CLM = Command Line Module

1. Provide a synopsis describing the module’s purposes

AKA: “What am I supposed to do with you, weird little script?”

It might sound strange but one of the most recurring difficulties I’ve ever had when using CLM written by others (fellow workers, project partners) is to understand what they actually do. As all lazy users, I hate asking people what is the aim of a CLM and the last thing on Earth for discovering would be looking at the code itself! That’s the reason why I always put a “synopsis” in my CLMs’ help messages and comment headers, just like this:

import os, sys
help_msg = """
    WORDSCOUNTER.py
    Synopsis:
        counts the number of words contained into the provided
        input file and prints it on standard output
    Usage:
        python wordscounter.py <input_file>
    [...]
    """

This way, I’m just letting users – and you yourself could be among them – know exactly what my CLM is going to do, and save them a lot of headaches. This state of intentions is also useful for you as a developer, as you could use it as a top-down problem analysis trace to go through when coding down your CLM. Had your CLM any side effect (eg: modify files, erase DB tables, etc), let the users know via the synopsis as well. Provide a short and effective synopsis.

2. Minimize the module’s responsibilities

AKA: “Largo al factotum” [air from Gioacchino Rossini’s “Il Barbiere di Siviglia”, scene II, act I]

As you certainly know, OOP teaches to identify programming units (classes) by spotting single responsibilities into the program’s main frame. This means that a class should go with one – and possibly only one – responsibility: this helps writing clean, testable and well-designed code. This should be our aim when designing and coding ANY piece of software, also CLMs: the piece of software should do just one thing, and in the best possible way. In the world of CLMs, things tend to get a little bit fuzzy when complexity grows up, as CMLs are meant as a quick tools to accomplish multiple repetitive and boring tasks – therefore the word “multiple” here is not handshaking with OOP dogmas at all.

So, what to do? I firmly believe that our code should not behave like Figaro in “Il Barbiere di Siviglia”: it should not be meant to do everything!!! Please consider the pluses of modular software: reusability, ease of use, composability, testability…in a single word: quality!

My personal advice is that you code complex CLMs using a top-down approach which – in a way – resembles OOP’s one. You should first try to break down your main task into sub-steps and then code each sub-step into a separate CLM or into a separate function of your main CLM (it’s up to you to decide which approach is the best one, depending also on the programming language you are using).  Functions and small scripts are easy to be called, can be tested and documented on their own; functions can be collected into libraries and imported by client codes, as well as small scripts can be used stand-alone or can be imported by bigger modules.

By the way, I usually don’t rely on OOP when coding simple or medium-complexity CLMs, but there are cases when this is more than an advantage.

3. Provide open interfaces

AKA: “Dont’ work out of my sight”

I recently worked on a Python wrapper for a complicated .exe file, let’s call it example.exe. This executable takes a few parameters, runs an algorithm and finally outputs 3 different curves in a tabular format. This module was provided me as a Commercial-Off-The-Shelf, which means that I could not modify it nor I have its source code.

They told me: “It’s so easy! You just need to invoke the executable using this command-line:”

C:\> example.exe

I could already smell that lots of work would be needed. The following questions came instantly to my mind:

  1. how can I state the CLM’s inputs? what are they, files, strings, directories? how many of them? in which order?
  2. how can I state the CLM’s outputs? what are they, files, strings, directories? how many of them? in which order?
  3. is the CLM going to need additional configuration resources (eg: files)?
  4. is the CLM going to write logfiles or other kinds of additional resources? how can I state them?

Ok, let’s put an end to the tale: I investigated a little bit further and discovered that example.exe was reading an input file containing lots of parameters (many of them were optional) and wrote the output data into a file which was arbitrarily put into the .exe’s folder and whose name was arbitrarily given. This is a complete mess! This crap needs wrapping and its creators need to be publicly humiliated!

This is the typical situation when the CLM does not have an open interface. I’m referring to “interface” of the CLM as to the way you can launch it by a certain enviroment (bash shell, python interpreter, command prompt, etc): as a user, your desire is to provide all of the input stuff to your module and obtain all of the output stuff you EXPECT. And this is where many CLMs fall.

You should always provide open interfaces: this means that your CLM should not use or write anything without letting you explicitly specifying it! So, my advice is that you design your CLM’s interface clearly using the following best practices:

  • specifiy all the parameters (even if you end with a long command-line, don’t worry)
  • when giving names to parameters, try to provide meaningful and speficific designations  so that users can instantly understand what a parameter name stands for
  • the interface should accept the least information letting the module work (no unuseful info!)
  • avoid duplicating parameters: don’t provide many times the same value (especially under different names: that would be ugly to discover)
  • input parameters come first, output parameters come after inputs
  • logfiles come at the end and could also be omitted – as the runtime environments (eg: bash, prompt) provide ways redirect messages to files
  • configfiles come at the end as well: use them only if you have a high number of parameters (tens)

4. Provide help

AKA: “No one can hear you cry in space”

“Ok, I’m willing to launch this fucking CLM but I really don’t know how to…where are the docs? Oh damn, they just gave me the binary, no documentation…so what do I do now?” How many times did you think something similar to this?

No one should cry loud in the dark in order to get help (which – more than often – won’t come), because every CLM should have a help switch! It’s such a simple and wise trick: embed in your CLM one or more help strings that can help users to know how to invoke execution. The more is the help message verbose, the better for your user. I suggest you to include in your help messages the following sections:

  • Synopsis – (see Advice n.1)
  • Usage  – how to launch the CLM, in other words the command-line interface along with parameters explanation)
  • Usage examples [optional] – two or three command-line invocation examples
  • Prerequisites – anything your CLM is relying on…watch out: don’t exceed with them. If something goes wrong and one or more prerequisites are missing, your module must signal this lack using exit codes
  • Help switch [optional] – tells how to print the help message
  • Exit codes – a list of error conditions your CLM could encounter. Each category has an associated number (zero is reserved for successful execution)
  • Authors, Copyright [optional] – if you really want/need to sign your “creation”

At this point, one should ask herself/himself: “Ok, when I execute it for the first time, how can I know how to print the help message?”. This question should be answered by making as simple as possible the printing of help message by the users. So, I suggest to provide help messages whenever a user provides no parameters to your CLM (only – of course – if your CLM do have one or more parameters) or whenever switches such as [ help | -help | h | -h | /? ] are provided.

Example of help message in Python:

import sys
help_msg = """
   WORDSCOUNTER.py
   Synopsis:
     counts the number of words contained into the provided
     input file and prints it on standard output
   Usage:
     python wordscounter.py <input_file>
   Parameters:
     <input_file> = the text file whose words are to be counted
   Help:
     you can print this message using one of the followings
     python wordscounter.py
     python wordscounter.py [ help | /? ]
   Exit codes:
    -1 - showed help
     0 - successful execution
     1 - input file does not exist
     2 - input file is not a file
     3 - input file is not a text file
    90 - internal error"""
if len(sys.argv) == 1:
  print help_msg
  sys.exit(-1)
elif len(sys.argv) == 2 and sys.argv[1] == 'help' or sys.argv[1] == '/?':
  print help_msg
  sys.exit(-1)

5. Tell the user what is happening

AKA: “It’s thinking, I will have a cofee in the meantime”

How many times I started a CLM with a terminal looking like this:

claudio@laptop:~$ bash install.bash package.tar.gz

and after minutes or tens of minutes the terminal looked like this:

claudio@laptop:~$ bash install.bash package.tar.gz
claudio@laptop:~$

How many times? Countless! This is because the module is not telling me what it is currently doing. This way, I can not state how much it will take for it to complete the task, I can not even know whether it’s performing well or not and I can not know at which stage of the whole computation it is running… I can not schedule my time, as I depend on the module’s outputs, therefore I will be less productive!

So the basic advice is: whenever the tool starts to do something new (e.g: enters a specific computational stage, starts parsing parameters, writing output files or inverting matrices or whatever) please print something onscreen and/or onto a logfile. This will save a lot of headaches to the CLM’s users and it will be easy also for recognize that bugs are coming (such as execution stuck into infinite loops). I suggest you to make your CLM verbose, but not  “gossipy”: you don’t have to make it echo out every single line of code that is executed (and if you really need to, use something like: bash -x)

Another idea is to make your CLM print the amount of work (percentage?) done against the overall, better if along with a gross estimation of the time needed to complete the task: this is very useful when dealing with long-running tasks such as matrices inversion, recursive algorithms, and so on.