“नमस्ते , world!” Programming goes multilingual.

I turned in my Perl Programming midterm assignment late last night. It’s hard to believe that the semester is half over after only four weeks . . . thought I’m not complaining. I enjoy the class, though I’m still undecided about Perl as a language. It has some serious flaws from a software engineering standpoint; though it does so many thing so effortlessly that you can almost forget about them — at least when you’re writing the code the first time around.

But Perl has one feature that every programming language should have: support for multiple character sets as part of the language. And not just in strings and comments. That’s so 2002. Perl allows letters and digits from all of UTF8 to be used in variable and function names. So instead of the 60-or-so characters most languages allow (including one dear to my heart), programmers have their choice of thousands of characters.

Here’s an example, which you can download if your browser is challenged. And if your Perl is rusty, those things with “$” in front of them are variables, and “@_” contains the arguments to a subfunction. Notice how I can use Hindi characters in variable names alongside Latin characters.


use utf8;
use strict;

binmode STDOUT, "utf8:";

my $नमस्ते = "namaste";
my $सलाम = "salaam";
my $word1 = "नमस्ते";
my $word2 = "सलाम";

print "$नमस्ते ($word1) and $सलाम ($word2)!\n";

findChar("न", $word1);
findChar("े", $word2);
findChar("म", $word1);
findChar("म", $word2);
findChar("स", $word1);
findChar("स", $word2);

sub findChar {

  my $character = shift(@_);
  my $word = shift(@_);

  if ($word =~ /$character/)
    print "Yes, I found \"$character\" in \"$word\".\n";
    print "No, I couldn't find \"$character\" in \"$word\".\n";

Here’s what this looks like when you execute hindichars.pl:

namaste (नमस्ते) and salaam (सलाम)!
Yes, I found "न" in "नमस्ते".
No, I couldn't find "े" in "सलाम".
Yes, I found "म" in "नमस्ते".
Yes, I found "म" in "सलाम".
Yes, I found "स" in "नमस्ते".
Yes, I found "स" in "सलाम".

Consider what this means. As America worried about Y2K, the Perl folks flattened the programming world. (Apologies to Thomas Friedman; but, hey, Nandan Nilekani had to tell him about the rather obvious facts of globalization, so I don’t feel so bad.) Anyway. Software engineers no longer need to learn English (or another language that uses the Roman alphabet) in order to develop software. Of course, they will still need to know enough English or French or German or Spanish to understand other people’s code and use many public APIs; but everyone, everywhere can program in their own language with comments and variables that make sense to everyone in their community. One day, when my job gets outsourced to India or China, the work I do now may be implemented by someone writing the whole thing in Hindi or Tamil or Chinese.

Hmm. This sounds like (a) the continuation of globalization in high tech, and (b) the next step in the evolution of software programming. Companies that have already harnessed English-speaking talent to produce quality software will now have a larger pool to choose from. And programming languages that don’t support Unicode as an essential part of their syntaxes are going to go the way of the Cobol and Fortran dinosaurs. Maybe not overnight — I don’t see an enormous Hindi comet on the horizon — but think about VMS, people. VMS.

And as long as I’m speculating, I see something else in my crystal ball: a translator that converts software’s source code from one natural language’s lexicon to another without changing the way the code works. It will probably be written in Perl. In India.

This entry was posted in Computing, Software Engineering. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>