Tcl Programming

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Tcl : Tool Command Language

Introduction

Tcl/Tk examples

The name Tcl is derived from "Tool Command Language" and is pronounced "tickle." Tcl is a radically simple open-source, interpreted programming language that provides common facilities, such as variables, procedures, and control structures as well as many useful features that are not found in any other major language. Tcl runs on almost all modern operating systems such as Unix, Macintosh, and Windows (including Windows Mobile).

While Tcl is flexible enough to be used in almost any application imaginable, it does excel in a few key areas, including: automated interaction with external programs, embedding as a library into application programs, language design, and of course plain old scripting.

Tcl was created in 1988 by John Ousterhout and is distributed under a BSD style license (which allows you everything GPL does, plus commercial use). The current stable version, in September 2005, is 8.4.11, while the alpha of its successor is 8.5a3.

The first major GUI extension that works with Tcl is Tk, a toolkit that aims to rapid GUI development. That is why Tcl is now more commonly called Tcl/Tk.

The language features far-reaching introspection, and the syntax, while simple, is very different from the Fortran/Algol/C++/Java world. Although Tcl is a string based language there are quite a few object-oriented extensions for it like Snit, incr Tcl, and XOTcl to name a few.

Tcl was originally developed as a reusable command language for experimental computer aided design (CAD) tools. The interpreter is implemented as a C library that could be linked into any application. It is very easy to add new functions to the Tcl interpreter, so it is an ideal reusable "macro language" that can be integrated into many applications.

However, Tcl is a programming language in its own right, which can be roughly described as a cross-breed between

  • LISP/Scheme (but with fewer parens),
  • C (control structure keywords, expr syntax) and
  • Unix shells (but with more powerful structuring).

Although a language where "everything is a command" appears like it must be "imperative" and "procedural", the flexibility of Tcl allows to use functional or object-oriented styles of programming very easily. See "Tcl examples" below for ideas what one can do.

Also, it is very easy to implement other programming languages (be they (reverse) polish notation, or whatever) in Tcl for experimenting. One might call Tcl a "CS Lab". For instance, here's how to compute the average of a list of numbers in Tcl (implementing a J-like functional language - see Tacit programming below):

Def mean = fork /. sum llength

or, in a RPN language similar to FORTH or Postscript,

: mean  dup sum swap size double / ;

while a more traditional, "procedural" approach would be

proc mean list {
   set sum 0.
   foreach element $list {set sum [expr {$sum + $element}]}
   return [expr {$sum / [llength $list]}]
}

Note that all of these are in Tcl, just that the first two require some additional code to implement Def resp. ':'

One might call Tcl the "Play-Doh of languages" - incredibly flexible and colorful :)

Where to get Tcl/Tk

On most Linux systems, Tcl/Tk is already installed. You can find out by typing tclsh at a console prompt (xterm or such). If a "%" prompt appears, you're already set. Just to make sure, type info pa at the % prompt to see the patchlevel (e.g. 8.4.9) and info na to see where the executable is located in the file system.

For all major platforms, you can download a binary ActiveTcl distribution fromActiveState.

Alternatively, you can get Tclkit: a Tcl/Tk installation wrapped in a single file, which you don't need to unwrap. When run, the file mounts itself as a virtual file system, allowing access to all its parts.

To see whether your installation works, you might save the following text to a file hello.tcl and run it (type tclsh hello.tcl at a console on Linux, double-click on Windows):

package require Tk
pack [label .l -text "Hello world!"]

It should bring up a little grey window with the greeting.

To make a script directly executable (on Unix/Linux, and Cygwin on Windows), use this first line (the # being flush left):

#!/usr/bin/env tclsh

This way, the shell can determine which executable to run the script with.


Syntax


A Tcl script is a string that is a sequence of commands, separated by newlines or semicolons.

A command is a string that is a list of words, separated by blanks. The first word is the name of the command, the other words are passed to it as its arguments. In Tcl, "everything is a command" - even what in other languages would be called declaration, definition, or control structure. A command can interpret its arguments in any way it wants - in particular, it can implement a different language, like expr.

A word is a string that is a simple word, or one that begins with { and ends with the matching }, or one that begins with " and ends with the matching ". Braced words are not evaluated by the parser. In quoted words, substitutions can occur before the command is called: $[A-Za-z0-9_]+ substitutes the value of the given variable. (Part of) a word can be an embedded script: a string in [] brackets whose contents are evaluated as a script (see above) before the current command is called.

In short: Scripts and commands contain words. Words can again contain scripts and commands.

Arithmetic and logic expressions are not part of the Tcl language itself, but the language of the expr command (also used in some arguments of the if, for, while commands) is basically equivalent to C's expressions, with infix operators and functions. Examples:

set a [expr {($b + sin($c))/2.}]
if {$a > $b && $b > $c} {puts "ordered"}
for {set i 10} {$i >= 0} {incr i -1} {puts $i...} ;# countdown

The 11 rules


Here is the complete manpage for Tcl (8.4) with the "endekalogue", the 11 rules.

The following rules define the syntax and semantics of the Tcl language:

(1) Commands A Tcl script is a string containing one or more commands. Semi-colons and newlines are command separators unless quoted as described below. Close brackets are command terminators during command substitution (see below) unless quoted.

(2) Evaluation A command is evaluated in two steps. First, the Tcl interpreter breaks the command into words and performs substitutions as described below. These substitutions are performed in the same way for all commands. The first word is used to locate a command procedure to carry out the command, then all of the words of the command are passed to the command procedure. The command procedure is free to interpret each of its words in any way it likes, such as an integer, variable name, list, or Tcl script. Different commands interpret their words differently.

(3) Words Words of a command are separated by white space (except for newlines, which are command separators).

(4) Double quotes If the first character of a word is double-quote (") then the word is terminated by the next double-quote character. If semi-colons, close brackets, or white space characters (including newlines) appear between the quotes then they are treated as ordinary characters and included in the word. Command substitution, variable substitution, and backslash substitution are performed on the characters between the quotes as described below. The double-quotes are not retained as part of the word.

(5) Braces If the first character of a word is an open brace ({) then the word is terminated by the matching close brace (}). Braces nest within the word: for each additional open brace there must be an additional close brace (however, if an open brace or close brace within the word is quoted with a backslash then it is not counted in locating the matching close brace). No substitutions are performed on the characters between the braces except for backslash-newline substitutions described below, nor do semi-colons, newlines, close brackets, or white space receive any special interpretation. The word will consist of exactly the characters between the outer braces, not including the braces themselves.

(6) Command substitution If a word contains an open bracket ([) then Tcl performs command substitution. To do this it invokes the Tcl interpreter recursively to process the characters following the open bracket as a Tcl script. The script may contain any number of commands and must be terminated by a close bracket (``]). The result of the script (i.e. the result of its last command) is substituted into the word in place of the brackets and all of the characters between them. There may be any number of command substitutions in a single word. Command substitution is not performed on words enclosed in braces.

(7) Variable substitution If a word contains a dollar-sign ($) then Tcl performs variable substitution: the dollar-sign and the following characters are replaced in the word by the value of a variable. Variable substitution may take any of the following forms:

$name

Name is the name of a scalar variable; the name is a sequence of one or more characters that are a letter, digit, underscore, or namespace separators (two or more colons).

$name(index)

Name gives the name of an array variable and index gives the name of an element within that array. Name must contain only letters, digits, underscores, and namespace separators, and may be an empty string. Command substitutions, variable substitutions, and backslash substitutions are performed on the characters of index.

${name}

Name is the name of a scalar variable. It may contain any characters whatsoever except for close braces. There may be any number of variable substitutions in a single word. Variable substitution is not performed on words enclosed in braces.

(8) Backslash substitution If a backslash (\) appears within a word then backslash substitution occurs. In all cases but those described below the backslash is dropped and the following character is treated as an ordinary character and included in the word. This allows characters such as double quotes, close brackets, and dollar signs to be included in words without triggering special processing. The following table lists the backslash sequences that are handled specially, along with the value that replaces each sequence.

\a 
Audible alert (bell) (0x7). 
\b 
Backspace (0x8). 
\f 
Form feed (0xc). 
\n 
Newline (0xa). 
\r 
Carriage-return (0xd). 
\t 
Tab (0x9). 
\v 
Vertical tab (0xb). 
\<newline>whiteSpace 

A single space character replaces the backslash, newline, and all spaces and tabs after the newline. This backslash sequence is unique in that it is replaced in a separate pre-pass before the command is actually parsed. This means that it will be replaced even when it occurs between braces, and the resulting space will be treated as a word separator if it isn't in braces or quotes.

\\ 
Backslash (\). 
\ooo 

The digits ooo (one, two, or three of them) give an eight-bit octal value for the Unicode character that will be inserted. The upper bits of the Unicode character will be 0.

\xhh 

The hexadecimal digits hh give an eight-bit hexadecimal value for the Unicode character that will be inserted. Any number of hexadecimal digits may be present; however, all but the last two are ignored (the result is always a one-byte quantity). The upper bits of the Unicode character will be 0.

\uhhhh 

The hexadecimal digits hhhh (one, two, three, or four of them) give a sixteen-bit hexadecimal value for the Unicode character that will be inserted.

Backslash substitution is not performed on words enclosed in braces, except for backslash-newline as described above.

(9) Comments If a hash character (#) appears at a point where Tcl is expecting the first character of the first word of a command, then the hash character and the characters that follow it, up through the next newline, are treated as a comment and ignored. The comment character only has significance when it appears at the beginning of a command.

(10) Order of substitution Each character is processed exactly once by the Tcl interpreter as part of creating the words of a command. For example, if variable substitution occurs then no further substitutions are performed on the value of the variable; the value is inserted into the word verbatim. If command substitution occurs then the nested command is processed entirely by the recursive call to the Tcl interpreter; no substitutions are performed before making the recursive call and no additional substitutions are performed on the result of the nested script. Substitutions take place from left to right, and each substitution is evaluated completely before attempting to evaluate the next. Thus, a sequence like

set y [set x 0][incr x][incr x]

will always set the variable y to the value, 012.

(11) Substitution and word boundaries Substitutions do not affect the word boundaries of a command. For example, during variable substitution the entire value of the variable becomes part of a single word, even if the variable's value contains spaces.

Data types


Tcl used to have only a single data type, namely the string. For performance reasons, versions since 8.0 maintain that impression, while providing in parallel an internal representation. This section will cover the "type" implications for values; any variable as discussed in the next section can hold a value of any of these types.

Strings

The Tcl mantra goes: "Everything is a string". The fundamental data type is the string, a sequence of zero or more characters (where all 16-bit Unicodes are accepted in almost all situations, see in more detail below). The size of strings is automatically administered, so you hardly ever have to worry about that (only if the string length exceeds the virtual memory size :)

In contrast to many other languages, strings in Tcl don't need quotes for markup. Quotes (or braces) are rather used for grouping:

set example "this is one word"
set another {this is another}

The difference is that inside quotes, substitutions (like of variables, embedded commands, or backslashes) are performed, while in braces, they are not (similar to single quotes in shells, but nestable):

set amount 42
puts "You owe me $amount" ;#--> You owe me 42
puts {You owe me $amount} ;#--> You owe me $amount

In source code, quoted or braced strings can span multiple lines, and the physical newlines are part of the string too:

set test "hello
world
in three lines"

Example: To reverse a string, we let an index i first point at its end, and decrementing i until it's zero, append the indexed character to the end of the result res:

proc sreverse str {
   set res ""
   for {set i [string length $str]} {$i > 0} {} {
       append res [string index $str [incr i -1]]
   } 
   set res
}
% sreverse "A man, a plan, a canal - Panama"
amanaP - lanac a ,nalp a ,nam A

Lists

Most strings are also well-formed lists. Every simple word is a list of length one, and elements of longer lists are separated by whitespace. For instance, a list of three elements:

set example {foo bar grill}

Only strings that contain unbalanced braces or quotes cannot directly be used as lists. Lists can contain lists again, to any depth, which makes modelling of matrixes and trees easy. Here's a string that represents a 4 x 4 unit matrix as a list of lists (the braces are not part of the lists, just indications for the parser):

{{1 0 0 0}
 {0 1 0 0}
 {0 0 1 0}
 {0 0 0 1}}

The newlines are valid list element separators, too.

Tcl's list operations are demonstrated in some examples:

set      x {foo bar}
llength  $x        ;#--> 2
lappend  x  grill  ;#--> foo bar grill
lindex   $x 1      ;#--> bar (indexing starts at 0)
lsearch  $x grill  ;#--> 2 (the position, counting from 0)
linsert  $x 2 and  ;#--> foo bar and grill
lreplace $x 1 bar, ;#--> foo bar, and grill 
lsort    $x        ;#--> bar foo grill

Example: To draw a random element from a list L, we first determine its length (using llength), multiply that with a random number > 0.0 and < 1.0, truncate that to integer (so it lies between 0 and length-1), and use that for indexing (lindex) into the list:

proc ldraw L {
   lindex $L [expr {int(rand()*[llength $L])}]
}

Example: Transposing a matrix (swapping rows and columns), using integers as generated variable names:

proc transpose matrix {
   foreach row $matrix {
       set i 0
       foreach el $row {lappend [incr i] $el}
   }
   set res {}
   set i 0
   foreach e [lindex $matrix 0] {lappend res [set [incr i]]}
   set res
}
% transpose {{1 2} {3 4} {5 6}}
{1 3 5} {2 4 6}

Numbers

Numbers are strings that can be parsed as such. Tcl supports integers (32-bit or even 64-bit wide) and "double" floating-point numbers. From Tcl 8.5 on, bignums (integers of arbitrarily large precision) will be supported.

In expressions with numbers of mixed types, integers are coerced to doubles:

% expr 1+2.
3.0

It is important to know that division of two integers is done as integer division:

% expr 1/2
0

You get the (probably expected) floating-point division if at least one operand is double:

% expr 1/2.
0.5

Control the display format of numbers with the format command which does appropriate rounding:

% expr 2/3.
0.666666666667
% format %.2f [expr 2/3.]
0.67

Up to the present 8.4 version, Tcl still honors the C convention that an integer starting with 0 is parsed as octal, so

0377 == 0xFF == 255

This will change in the next major version, though - too often people stumbled over "08" meant as hour or month, raised a syntax error, because 8 is no valid octal number. In the future you'd have to write 0o377 if you really mean octal (few do :). You can do number base conversions with the format command, where the format is %x for hex, %d for decimal, %o for octal, and the input number should have the C-like markup to indicate its base:

% format %x 255
ff
% format %d 0xff
255
% format %o 255
377
% format %d 0377
255

There is no separate boolean type in Tcl values. Instead, like in C, integers are used: 0 is false, all others are true. The canonical true value is 1 (again, like in C). You can force a canonical truth value with twice the not operator (!):

% expr !!42
1

Characters

Characters are abstractions of writing elements (e.g. letters, digits, punctuation characters, Chinese ideographs, ligatures...). In Tcl since 8.1, characters are internally represented with Unicode, which can be seen as unsigned integers between 0 and 65535 (recent Unicode versions have even crossed that boundary, but the Tcl implementation currently uses a maximum of 16 bits). Any Unicode U+XXXX can be specified as a character constant with an \uXXXX escape. It is recommended to only use ASCII characters (\u0000-\u007f) in Tcl scripts directly, and escape all others.

Convert between numeric Unicode and characters with

set char [format %c $int]
set int  [scan $char %c]

Watch out that int values above 65535 produce 'decreasing' characters again, while negative int even produces two bogus characters. format does not warn, so better test before calling it.

Sequences of characters are called strings (see above). Characters are no separate data type in Tcl, but represented as strings of length one (everything is a string). Represented as UTF-8, a character can be one to three bytes long in memory or file. Find out the bytelength of a character with

string bytelength $c ;# assuming [string length $c]==1

String routines can be applied to single characters too, e.g [string toupper] etc. Find out whether a character is in a given set (a character string) with

expr {[string first $char $set]>=0}

As Unicodes for characters fall in distinct ranges, checking whether a character's code lies withing a range allows more or less rough classification of its category:

proc inRange {from to char} {
    # generic range checker
    set int [scan $char %c]
    expr {$int>=$from && $int <= $to}
}
interp alias {} isGreek {}    inRange 0x0386 0x03D6
interp alias {} isCyrillic {} inRange 0x0400 0x04F9
interp alias {} isHangul {}   inRange 0xAC00 0xD7A3

This is a useful helper to convert all characters beyond the ASCII set to their \u.... escapes (so the resulting string is strict ASCII):

proc u2x s {
   set res ""
   foreach c [split $s ""] {
     scan $c %c int
     append res [expr {$int<128? $c :"\\u[format %04.4X $int]"}]
   }
   set res
}

Internal representation

In the Tcl implementation (written in C), values can have up to two representations at the same time with the Tcl core (or extension code) converting transparently between them as required. One of the representations is as a string (using a modified UTF-8 encoding), and the other one is an internal representation, which can be a signed integer, a double-precision floating point number, a list, or one of many other "types" on the C side (only one of these representations is valid at a time, but an individual value may have many different representations over its lifetime). This way, repeated parsing or "stringification" is avoided if you deal with a value both as a string and as a list, for instance. But to the programmer, the view that "everything is a string" is still maintained.

These values are stored in reference-counted structures termed (somewhat inaccurately) objects. From the perspective of all code that uses values (as opposed to code implementing a particular representation), they are immutable. In practice, this is implemented using a copy-on-write strategy.


Variables


Variables can be local or global, and scalar or array. Their names can be any string not containing a colon (which is reserved for use in namespace separators) but for the convenience of $-dereference one usually uses names of the pattern [A-Za-z0-9_]+, i.e. one or more letters, digits, or underscores.

Variables need not be declared beforehand. They are created when first assigned a value, if they did not exist before, and can be unset when no longer needed:

set foo    42     ;# creates the scalar variable foo
set bar(1) grill  ;# creates the array bar and its element 1
set baz    $foo   ;# assigns to baz the value of foo
set baz [set foo] ;# the same effect
info exists foo   ;# returns 1 if the variable foo exists, else 0
unset foo         ;# deletes the variable foo

Retrieving a variable's value with the $foo notation is only syntactic sugar for [set foo]. The latter is more powerful though, as it can be nested, for deeper dereferencing:

set foo   42
set bar   foo
set grill bar
puts [set [set [set grill]]] ;# gives 42

Some people might expect $$$grill to deliver the same result, but it doesn't, because of the Tcl parser. When it encounters the first and second $ sign, it tries to find a variable name (consisting of one or more letters, digits, or underscores) in vain, so these $ signs are left literally as they are. The third $ allows substitution of the variable grill, but no backtracking to the previous $'s takes place. So the evaluation result of $$$grill is $$bar. Nested [set] commands give the user more control.

Local vs. global

A local variable exists only in the procedure where it is defined, and is freed as soon as the procedure finishes. By default, all variables used in a proc are local.

Global variables exist outside of procedures, as long as they are not explicitly unset. They may be needed for long-living data, or implicit communication between different procedures, but in general it's safer and more efficient to use globals as sparingly as possible. Example of a very simple bank with only one account:

set balance 0 ;# this creates and initializes a global variable
proc deposit {amount} {
   global balance
   set balance [expr {$balance + $amount}]
}
proc withdraw {amount} {
   set ::balance [expr {$::balance - $amount}]
}

This illustrates two ways of referring to global variables - either with the global command, or by qualifying the variable name with the :: prefix. The variable amount is local in both procedures, and its value is that of the first argument to the respective procedure.

Introspection:

info vars ;#-- lists all visible variables
info locals
info globals

To make all global variables visible in a procedure (not recommended):

eval global [info globals]

Scalar vs. array

All of the value types discussed above in Data types can be put into a scalar variable, which is the normal kind.

Arrays are collections of variables, indexed by a key that can be any string, and in fact implemented as hash tables. What other languages call "arrays" (vectors of values indexed by an integer), would in Tcl rather be lists. Some illustrations:

#-- The key is specified in parens after the array name
set         capital(France) Paris
#-- The key can also be substituted from a variable:
set                  country France
puts       $capital($country)
#-- Setting several elements at once:
array set   capital         {Italy Rome  Germany Berlin}
#-- Retrieve all keys:
array names capital    ;#-- Germany Italy France -- quasi-random order
#-- Retrieve keys matching a glob pattern:
array names capital F* ;#-- France 

Note that arrays themselves are not values. They can be passed in and out of procedures not as $capital (which would try to retrieve the value), but by reference. The dict type (available from Tcl 8.5) might be better suited for these purposes, while otherwise providing hash table functionality, too.

System variables

At startup, tclsh provides the following global variables:

argc
number of arguments on the command line
argv
list of the arguments on the command line
argv0
name of the executable or script (first word on command line)
auto_index
array with instructions from where to load further commands
auto_oldpath
(same as auto_path ?)
auto_path
list of paths to search for packages
env
array, mirrors the environment variables
errorCode
type of the last error, or {}, e.g. ARITH DIVZERO {divide by zero}
errorInfo
last error message, or {}
tcl_interactive
1 if interpreter is interactive, else 0
tcl_libPath
list of library paths
tcl_library
path of the Tcl system library directory
tcl_patchLevel
detailed version number, e.g. 8.4.11
tcl_platform
array with information on the operating system
tcl_rcFileName
name of the initial resource file
tcl_version
brief version number, e.g. 8.4

Dereferencing variables

A reference is something that refers, or points, to another something (if you pardon the scientific expression). In C, references are done with *pointers* (memory addresses); in Tcl, references are strings (everything is a string), namely names of variables, which via a hash table can be resolved (dereferenced) to the "other something" they point to:

puts foo       ;# just the string foo
puts $foo      ;# dereference variable with name of foo
puts [set foo] ;# the same

This can be done more than one time with nested set commands. Compare the following C and Tcl programs, that do the same (trivial) job, and exhibit remarkable similarity:

#include <stdio.h>
int main(void) {
  int    i =      42;
  int *  ip =     &i;
  int ** ipp =   &ip;
  int ***ippp = &ipp;
  printf("hello, %d\n", ***ippp);
  return 0;
}

...and Tcl:

set i    42
set ip   i
set ipp  ip
set ippp ipp
puts "hello, [set [set [set [set ippp]]]]"

The C asterisks correlate to set calls in derefencing, while in Tcl similar markup is not needed in declaring. But why four sets for three asterisks? Because also the first mention of i in c is a dereference, to pass its value into printf; Tcl makes all four of them explicit (otherwise, you'd see hello, i). One dereference is so frequent that it is typically abbreviated with $varname, e.g.

puts "hello, [set [set [set $ippp]]]"

has set where C uses asterisks, and $ for the last (default) dereference.

The hashtable for variable names is either global, for code evaluated in that scope, or local to a proc. You can still "import" references to variables in scopes that are "higher" in the call stack, with the upvar and global commands. (The latter being automatic in C, given the names are unique; else, the innermost scope wins).


Commands


Commands are basically divided into C-defined ones, procedures, and aliases (and, from Tcl 8.5 onwards, ensembles). You can rename any command with

rename oldname newname

To delete a command (make it no more reachable), use the empty string as new name:

rename oldname {}

Introspection: Get a list of all defined commands with

info commands

C-defined commands

These are implemented in C and registered to be available in the Tcl interpreter. They can come from the Tcl core, or a loaded shared library (DLL) - for instance Tk.

To get a list of built-in commands, subtract the result of info procs from that of info commands:

set builtins {}
set procs [info procs]
foreach cmd [info commands] {
   if {[lsearch -exact $procs $cmd] == -1} {lappend builtins $cmd}
}

The following C-defined commands are available in a fresh tclsh. For detailed documentation, see the respective man pages, e.g. at http://www.tcl.tk/man/tcl8.4/TclCmd/ - I will characterize each only very briefly:

after
group of commands for timed events
after msec ?script?
waits, or executes the script after, some time
append varName arg..
appends the arguments to a string variable
array
group of commands for arrays
binary
group of commands for binary scanning and formatting
break
terminate current loop
case
deprecated, use switch
catch script ?varName?
catch possible error in script
cd path
change working directory
clock
group of commands dealing with date and time
close handle
closes a channel (file, socket, etc.)
concat list..
make one space-separated list of the arguments
continue
start next turn of current loop
encoding
group of commands dealing with character set encoding
eof handle
1 if channel is at end of file, else 0
error message ?info? ?code?
raise an error with the given message
eval arg..
evaluate the arguments as script
exec file arg..
execute a separate process
exit ?int?
terminate this process, return status 0..127
expr arg..
arithmetic and logic engine, using C-like syntax and functions (variables referenced with $name). In addition, from Tcl 8.4 there are eq and ne operators for string equal or not; from 8.5, also in and ni operators for list inclusion or not
expr {"foo" in {foo bar grill}} == 1

The argument to expr should in most cases be {braced}. This prevents the Tcl parser from substituting variables in advance, while expr itself has to parse the value from the string. In a braced expression expr can parse variable references itself, and get their numeric value directly where possible. Much faster, usually. The only exception, where bracing should not be used, is if you want to substitute operators from variables:

foreach op {+ - * /} {puts [expr 1 $op 2]}
fblocked handle
returns 1 if the last input operation exhausted all available input, else 0
fconfigure handle -option value...
configure a channel, e.g. its encoding or line-end translation
fcopy handle1 handle2
copy data from handle1 to handle2
file
group of commands dealing with files
fileevent
group of commands dealing with events from channels (readable, writable) but not files
flush handle
make sure the channel's buffer is written out. Useful after puts -nonewline
for initbody condition stepbody body
loop, somehow similar to C's for
foreach varlist list ?varlist list...? body
loop over one or more lists, The varlists can be a single or multiple varNames. Example:
% foreach {x y} {1 0  1 2  0 2  0 0} {puts "x:$x, y:$y"}
x:1, y:0
x:1, y:2
x:0, y:2
x:0, y:0
format fstring arg..
put the arguments %-formatted into fstring, similar to C's sprintf()
gets handle ?varName?
read a line from handle. If variable is given, assigns the line to it and returns the number of characters read; else returns the line. Guaranteed to be safe against buffer overflows
glob ?-options? pattern..
list of files matching the glob pattern (which can contain * and ? wildcards)
global varName..
declare the given variable(s) as global
history
list the last interactive commands
if condition ?then? body1 ?elseif condition body2...? ??else? bodyN?
conditional
incr varName ?amount?
increments the integer variable by given amount (defaults to 1). Use negative amount to decrement
info
group of commands for introspection
interp
group of commands for interpreters
join list ?separator? : Turn a list into a string, with separator between elements (defaults to " ")
lappend varName arg..
appends the arguments to the list variable
lindex list int..
retrieve an element from the list by integer index(es)
linsert list int arg..
inserts the arguments at int position into list
list ?arg..?
creates a list form the arguments
llength list
length of the list
load filename ?name?
loads a shared library (DLL)
lrange list from to
returns a sublist at integer indexes from-to
lreplace list from to arg..
replaces the sublist in list with the arguments
lsearch ?-options? list element
searches the list for the element, returns its integer index, or -1 if not found. Can be used to select a subset of elements from a list (using the -all option)
lset varName int.. value
sets an existing element in the named list variable, indexed by integer(s), to the given value
lsort ?-options? list
sorts the list
namespace
group of commands dealing with namespaces
open name ?mode ?permissions??
opens a file or pipe, returns the handle
package
group of commands dealing with packages
pid ?handle?
returns the id of the current process. Can also return the list of pids for a pipeline given the pipeline channel
proc name arglist body
defines a procedure
puts ?-nonewline? ?channel? string
outputs a line to the given channel (default stdout)
pwd
returns the current working directory
read handle ?int?
reads int bytes from handle (all if int not given)
regexp ?-options? re string ?varName...?
regular expression matching of re in string, possibly assigning parenthesized submatches to the given variables
regsub ?-options? re value substring ?varName?
substitutes occurrences of the regular expression re in value with substring. If varName is given, assigns the new value to it, and returns the number of substitutions; else returns the new value
rename cmdName1 cmdName2
renames a command from cmdName1 to cmdName2, or deletes cmdName1 if cmdName2 is {}
return ?value?
exits from the current proc or sourced script
scan string format ?varName...?
extracts values by %-format in string to the given variables. Similar to C's sscanf()
seek channelId offset ?origin?
moves pointer in file to the given position
set varName ?value?
sets variable to value if given, and returns the variable's value
socket ?-myaddr addr? ?-myport myport? ?-async? host port
open the client side of a TCP connection as a channel
socket -server command ?-myaddr addr? port
open the server side of a TCP connection, register a handler callbacvk command for client requests
source filename
evaluate the contents of the given file
split list ?charset?
splits a string into a list, using any of the characters in charset string as delimiters (defaults to " ")
string
group of commands dealing with strings
subst ?-options? string
performs command, variable, and/or backslash substitutions in string
switch ?-options? ?--? value alternatives
performs one of the alternatives if the value matches
tell handle
return the byte position inside a file
time body ?int?
runs the body for int times (default 1), returns how many microseconds per iteration were used
trace
group of commands to tie actions to variables or commands
unset varName..
delete the given variable(s)
update ?idletasks?
services events
uplevel ?level? body
evaluates the body up in the call stack
upvar ?level? varName localVarName...
ties the given variables up in the call stack to the given local variables. Used for calling by reference, e.g. for arrays
variable varName ?value ?varName value...??
declare the variables as being non-local in a namespace
vwait varName
suspend execution until the given variable has changed. Starts event loop, if not active yet
while condition body
performs body as long as condition is not 0

Procedures

Procedures in Tcl cover what other languages call procedures, subroutines, or functions. They always return a result (and be it the empty string ""), so to call them functions might be most appropriate. But for historical reasons, the Tcl command to create a function is called proc and thus people most often call them procedures.

proc name argumentlist body

Examples:

proc sum {a b} {return [expr {$a+$b}]}

The return is redundant, as the proc returns anyway when reaching its end, and returning its last result:

proc sum {a b} {expr {$a+$b}}

The following variant is more flexible, as it takes any number of arguments (the special argument name args collects all remaining arguments into a list, which is the value of the parameter args):

proc sum args {
    set res 0
    foreach arg $args {set res [expr {$res + $arg}]}
    return $res
}

An elegant but less efficient alternative builds a string by joining the args with plus signs, and feeds that to expr:

proc sum args {expr [join $args +]}

If an argument in a proc definition is a list of two elements, the second is taken as default value if not given in the call ("Sir" in this example):

proc greet {time {person Sir}} {puts "good $time, $person"}
% greet morning John
good morning, John
% greet evening
good evening, Sir

Introspection: Get the names of all defined procedures with

info procs

There are also info subcommands to get the argument list, and possible default arguments, and body of a proc. The following example combines them to recreate the textual form of a proc, given its name (corp being proc in reverse):

proc corp name {
   set argl {}
   foreach arg [info args $name] {
      if [info default $name $arg def] {lappend arg $def}
      lappend argl $arg
   }
   list proc $name $argl [info body $name]
}

Using rename, you can overload any command, including the C-coded ones. First rename the original command to something else, then reimplement it with the same signature, where ultimately the original is called. Here is for instance an overloaded proc that reports if a procedure is defined more than once with same name:

rename proc _proc
_proc proc {name argl body} {
   if {[info procs $name] eq $name} {
       puts "proc $name redefined in [info script]"
   }
   _proc $name $argl $body
}

Argument passing by name or value

Normally, arguments to commands are passed by value (as constants, or with $ prefixed to a variable name). This securely prevents side-effects, as the command will only get a copy of the value, and not be able to change the variable.

However, sometimes just that is wanted. Imagine you want a custom command to set a variable to zero. In this case, at call time specify the name of the variable (without $), and in the proc use upvar to link the name (in the scope "1 up", i.e. the caller's) to a local variable. I usually put a "_" before arguments that are variable names (e.g. _var), and upvar to the same name without "_" (e.g. var):

% proc zero _var {upvar 1 $_var var; set var 0}
% set try 42
42
% zero try
0
% set try
0

Aliases

One can also define a command as an alias to a sequence of one or more words, which will be substituted for it before execution. (The funny {} arguments are names of the source and target interpreter, which typically is the current one, named by the empty string {} or ""). Examples:

interp alias {} strlen {} string length
interp alias {} cp     {} file copy -force

Introspection: Get the names of all defined aliases with

interp aliases

Ensembles

An ensemble is a command that is composed out of sub-commands according to a standard pattern. Examples include Tcl's built-in chan and clock commands. More Detail Needed!


Interpreters


Tcl being an interpreted (plus on-the-fly byte-compiled) language, an interpreter is of course a central kind of object. One can also create further "slave" interpreters to encapsulate data and processes. Examples:

% interp create helper
helper
% helper eval {expr 7*6}
42
% interp delete helper
% helper eval {expr 1+2}
invalid command name "helper"

By deleting an interpreter, all its global (and namespaced) variables are freed too, so you can use this for modularisation and encapsulation, if needed.

In particular, safe interpreters have intentionally limited capabilities (for instance, access to the file system or the Web) so that possibly malicious code from over the Web cannot create major havoc.

Introspection: The following command lists the sub-interpreters ("slaves") of the current interpreter:

% interp slaves

Namespaces

Namespaces are containers for procedures, non-local variables, and other namespaces. They form a tree structure, with the root at the global namespace named "::". Their names are built with :: as separators too, so ::foo::bar is a child of ::foo, which is a child of :: (similar to pathnames on Unix, where / is the separator as well as the root).

To create a namespace, just evaluate some script (which may be empty) in it:

namespace eval ::foo {}

Now you can use it to define procedures or variables:

proc ::foo::test {} {puts Hello!}
set  ::foo::var 42

Introspection:

namespace children ::

The following code gives an approximate size in bytes consumed by the variables and children of a Tcl namespace (of which ::, the global namespace, is of particular interest - all the (grand)*children are also added). If you call this proc repeatedly, you can observe whether data are piling up:

proc namespace'size ns {
  set sum [expr wide(0)]
  foreach var [info vars ${ns}::*] {
      if {[info exists $var]} {
          upvar #0 $var v
          if {[array exists v]} {
              incr sum [string bytelength [array get v]]
          } else {
              incr sum [string bytelength $v]
          }
      }
  }
  foreach child [namespace children $ns] {
      incr sum [namespace'size $child]
  }
  set sum
}

Usage example:

% puts [namespace'size ::]
179914

Packages

Packages are Tcl's recommended way to modularize software, especially supportive libraries. The user most often uses the command

package require name ?version?

The following script creates the trivial but educational package futil, which in a namespace of the same name implements two procs for reading and writing complete text files, and the little introspection helper function, futil::?. The command to register the package (package provide) is executed only after all went well - this way, buggy source code, which raises an error during package require, will not be registered. (Other bugs you'd have to spot and fix for yourself...)

Common Tcl distribution practice has the good habit of profound testing, typically in a separate test directory. On the other hand, including a self-test in the same file with the code makes editing easier, so after the package provide comes a section only executed if this file is sourced as a top- level script, which exercises the commands defined in futil. Whether the string read should be equal to the string written is debatable - the present implementation appends \n to the string if it doesn't end in one, because some tools complain or misbehave if they don't see a final newline.

If the tests do not run into an error either, even the required construction of a package index is fired - assuming the simplified case that the directory contains only one package. Otherwise, you'd better remove these lines, and take care of index creation yourself.

A script that uses this package will only have to contain the two lines

lappend ::auto_path <directory of this file>
package require futil

You can even omit the first line, if you install (copy) the directory with the source and pkgIndex.tcl below ${tcl_install_directory}/lib. }

namespace eval futil {
    set version 0.1
}
proc futil::read {filename} {
   set fp [open $filename]
   set string [::read $fp] ;# prevent name conflict with itself
   close $fp
   return $string
}
proc futil::write {filename string} {
   set fp [open $filename w]
   if {[string index $string end]!="\n"} {append string \n}
   puts -nonewline $fp $string
   close $fp
}
proc futil::? {} {lsort [info procs ::futil::*]}
# If execution comes this far, we have succeeded ;-)
package provide futil $futil::version
#--------------------------- Self-test code
if {[info ex argv0] && [file tail [info script]] == [file tail $argv0]} {
   puts "package futil contains [futil::?]"
   set teststring {
       This is a teststring
       in several lines...}
   puts teststring:'$teststring'
   futil::write test.tmp $teststring
   set string2 [futil::read test.tmp]
   puts string2:'$string2'
   puts "strings are [expr {$teststring==$string2? {}:{not}}] equal"
   file delete test.tmp ;# don't leave traces of testing
   # Simple index generator, if the directory contains only this package
   pkg_mkIndex -verbose [file dirn [info scr]] [file tail [info scr]]
}

Tcllib

Tcllib is a collection of packages in pure Tcl. It can be obtained from sourceForge, but is also part of ActiveTcl. The following list of contained packages may not be complete, as Tcllib is steadily growing...

  • aes - Advanced Encryption Standard.
  • asn - asn.1 BER encoder/decoder
  • http/autoproxy - code to automate the use of HTTP proxy servers
  • base64 - Base64 encoding and decoding of strings and files.
  • bee - BitTorrent serialization encoder/decoder.
  • bibtex - Neil Madden's parser for bibtex files. Not fully complete yet, therefore not set for installation.
  • calendar - Calendar operations (see also tcllib calendar module).
  • cmdline - Various form of command line and option processing.
  • comm - Socket based interprocess communication. Emulates the form of Tk's send command.
  • control - procedures for tcl flow structures such as assert, do/until, do/while, no-op
  • counter - procedures for counters and histograms
  • crc - Computation of various CRC checksums for strings and files.
  • csv - manipulate comma separated value data
  • des - Data Encryption Standard. ::DES::des (not yet installed)
  • dns - interact with the Domain Name Service. dns::address, dns::cleanup, dns::cname, dns::configure, dns::name, dns::reset, dns::resolve, dns::status, dns::wait,
  • doctools - System for writing manpages/documentation in a simple, yet powerful format.
  • exif - exif::analyze exif::fieldnames
  • fileutil - Utilities for operating on files, emulating various unix command line applications (cat, find, file(type), touch, ...).
  • ftp - Client side implementation of FTP (File Transfer Protocol). In dire need of a rewrite.
  • ftpd - Server side implementation of FTP
  • grammar_fa - Operations on finite automatons.
  • html - generate HTML from a Tcl script. html::author, html::author, html::bodyTag, html::cell, html::checkbox, html::checkSet, html::checkValue, html::closeTag, html::default, html::description, html::description, html::end, html::eval, html::extractParam, html::font, html::for, html::foreach, html::formValue, html::getFormInfo, html::getTitle, html::h, html::h1, html::h2, html::h3, html::h4, html::h5, html::h6, html::hdrRow, html::head, html::head, html::headTag, html::if, html::init, html::init, html::keywords, html::keywords, html::mailto, html::meta, html::meta, html::minorList, html::minorMenu, html::openTag, html::paramRow, html::passwordInput, html::passwordInputRow, html::radioSet, html::radioValue, html::refresh, html::refresh, html::row, html::select, html::selectPlain, html::set, html::submit, html::tableFromArray, html::tableFromList, html::tagParam, html::textarea, html::textInput, html::textInputRow, html::title, html::title, html::urlParent, html::varEmpty, html::while,
  • htmldoc - This is not a true module but the place where tcllib 1.3 installed the tcllib documentation in HTML format.
  • htmlparse - procedures to permit limited maniulation of strings containing HTML. ::htmlparse::parse, ::htmlparse::debugCallback, ::htmlparse::mapEscapes, ::htmlparse::2tree, ::htmlparse::removeVisualFluff, ::htmlparse::removeFormDefs,
  • ident - RFC 1413 ident client protocol implementation
  • imap4 - currently undocumented code for interacting with an IMAP server
  • inifile - code to manipulate an initialization file. ::ini::open, ::ini::close, ::ini::commit, ::ini::sections, ::ini::keys, ::ini::value
  • dns/ip - Manipulation of IP addresses. ::ip::version, ::ip::is, ::ip::normalize, ::ip::equal, ::ip::prefix
  • irc - Internet Relay Chat procedures. irc::config, irc::connection,
  • javascript - generate Javascript for including in HTML pages. javascript::BeginJS, javascript::EndJS, javascript::MakeMultiSel, javascript::MakeClickProc, javascript::makeSelectorWidget, javascript::makeSubmitButton, javascript::makeProtectedSubmitButton, javascript::makeMasterButton, javascript::makeParentCheckbox, javascript::makeChildCheckbox
  • jpeg - edit comment blocks, get image dimensions and information, read exif data of images in the JPG format
  • ldap - Client side implementation of LDAP (Lightweight Directory Access Protocol).
  • log - general procedures for adding log entries to files ::log::levels, ::log::logMsg, ::log::lv2longform, ::log::lv2color,::log::lv2priority,
  • logger - ::logger::walk, ::logger::services, ::logger::enable, ::logger::disable (part of the log module)
  • math - general mathematical procedures. ::math::calculus, ::math::combinatorics, ::math::cov, ::math::fibonacci, ::math::integrate, ::math::interpolate, ::math::max, ::math::mean, ::math::min, ::math::optimize, ::math::product, ::math::random, ::math::sigma, ::math::statistics, ::math::stats, ::math::sum
  • md4 -  ::md4::md4, ::md4::hmac, ::md4::MD4Init, ::md4::MD4Update, ::md4::MD4Final
  • md5 - [fill in the description of this module] ::md5::md5, ::md5::hmac, ::md5::test, ::md5::time, ::md5::<<<
  • md5crypt -  ::md5crypt::md5crypt, ::md5crypt::aprcrypt
  • mime - ::mime::initialize, ::mime::parsepart, ::mime::finalize, ::smtp::sendmessage
  • multiplexer - [fill in the external interfaces]
  • ncgi - procedures for use in a CGI application. ::ncgi::reset, ::ncgi::urlStub, ::ncgi::urlStub
  • nntp - routines for interacting with a usenet news server. ::nntp::nntp, ::nntp::NntpProc, ::nntp::NntpProc, ::nntp::okprint, ::nntp::message,
  • ntp - network time protocol procedure ::ntp::time
  • png - edit comment blocks, get image dimensions and information for Portable Network Graphics format.
  • pop3 - Post Office Protocol functions for reading mail from a pop3 server. ::pop3::open, ::pop3::close, ::pop3::status,
  • pop3d - Post Office Protocol Server. pop3d::new
  • profiler -  ::profiler::tZero, ::profiler::tMark, ::profiler::stats, ::profiler::Handler, ::profiler::profProc, ::profiler::init
  • rc4 - stream encryption. ::rc4::rc4
  • report - format text in various report styles. ::report::report , ::report::defstyle, ::report::rmstyle,
  • sha1 -  ::sha1::sha1, ::sha1::hmac
  • smtpd - ::smtpd::start, ::smtpd::stop, ::smtpd::configure, ::smtpd::cget
  • snit - Snit's Not Incr Tcl - OO package. Delegation based. ::snit::type, ::snit::widget, ::snit::widgetadaptor

soundex::knuth - string matching based on theoretical sound of the letters

  • stooop - OO package. stooop::class, stooop::virtual, stooop::new, stooop::delete, stooop::classof
  • struct1 - Version 1 of struct (see below), provided for backward compatibility.
struct::list, ::struct::graph, ::struct::matrix, ::struct::queue, ::struct::stack, ::struct::Tree, ::struct::record, ::struct::skiplist, ::struct::prioqueue, new: ::struct::sets
  • tar - untar, list, and stat files in tarballs and create new tarballs
  • textutil - Utilities for working with larger bodies of texts.

textutil::expand - the core for the expand macro processor.

  • tie - Persistence for Tcl arrays.
  • treeql - Tree Query Language, inspired by COST.
  • uri - Handling of uri/urls (splitting, joining, ...)
  • uuid - Creation of unique identifiers.

Interaction and debugging

Tcl itself is quite a good teacher. Don't be afraid to do something wrong, you'll most often get a helpful error message. When tclsh is called with no arguments, it starts in interactive mode and displays a "%" prompt. You type something in, and see what comes out (the result, or an error message).

Trying isolated test cases interactively, and pasting the command into the editor when satisfied, can greatly reduce debugging time (no need to restart your app after every little change - just make sure it's the right one, before restarting).

A quick tour

Here's a commented session transcript:

% hello
invalid command name "hello"

OK, so we're supposed to type in a command. Although it doesn't look so, here's one:

% hi
    1  hello
    2  hi

Interactive tclsh tries to guess what we mean, and "hi" is the unambiguous prefix of the "history" command, whose results we see here. Another command worth remembering is "info":

% info
wrong # args: should be "info option ?arg arg ...?"

The error message tells us there should be at least one option, and optionally more arguments.

% info option
bad option "option": must be args, body, cmdcount, commands, complete, default,
exists, functions, globals, hostname, level, library, loaded, locals, nameofexecutable,
patchlevel, procs, script, sharedlibextension, tclversion, or vars

Another helpful error: "option" is not an option, but the valid ones are listed. To get information about commands, it makes sense to type the following:

% info commands
tell socket subst open eof pwd glob list exec pid auto_load_index time unknown
eval lrange fblocked lsearch auto_import gets case lappend proc break variable llength 
auto_execok return linsert error catch clock info split array if fconfigure concat
join lreplace source fcopy global switch auto_qualify update close cd for auto_load
file append format read package set binary namespace scan trace seek while flush
after vwait uplevel continue foreach lset rename fileevent regexp upvar unset
encoding expr load regsub history interp exit puts incr lindex lsort tclLog string

Oh my, quite many... How many?

% llength [info commands]
86

Now for a more practical task - let's let Tcl compute the value of Pi.

% expr acos(-1)
3.14159265359

Hm.. can we have that with more precision?

% set tcl_precision 17
17
% expr acos(-1)
3.1415926535897931

Back to the first try, where "hello" was an invalid command. Let's just create a valid one:

% proc hello {} {puts Hi!}

Silently acknowledged. Now testing:

% hello
Hi!

The errorInfo variable

This global variable provided by Tcl contains the last error message and the traceback of the last error. Silly example:

% proc foo {} {bar x}
% proc bar {input} {grill$input}
% foo
invalid command name "grillx"
% set errorInfo
invalid command name "grillx"
   while executing
"grill$input"
   (procedure "bar" line 1)
   invoked from within
"bar x"
   (procedure "foo" line 1)
   invoked from within
"foo"

If no error has occurred yet, errorInfo will contain the empty string.

In addition, there is the errorCode variable that returns a list of up to three elements:

  • category (POSIX, ARITH, ...)
  • abbreviated code for the last error
  • human-readable error text

Examples:

% open not_existing
couldn't open "not_existing": no such file or directory
% set errorCode
POSIX ENOENT {no such file or directory}
% expr 1/0
divide by zero
% set errorCode
ARITH DIVZERO {divide by zero}
% foo
invalid command name "foo"
% set errorCode
NONE

Stepping through a procedure

To find out how exactly a proc works (and what goes wrong where), you can also register commands to be called before and after a command inside a procedure is called (going down transitively to all called procs). You can use the following step and interact procedures for this:

proc step {name {yesno 1}} {
   set mode [expr {$yesno? "add" : "remove"}]
   trace $mode execution $name {enterstep leavestep} interact
}
proc interact args {
   if {[lindex $args end] eq "leavestep"} {
       puts ==>[lindex $args 2]
       return
   }
   puts -nonewline "$args --"
   while 1 {
       puts -nonewline "> "
       flush stdout
       gets stdin cmd
       if {$cmd eq "c" || $cmd eq ""} break
       catch {uplevel 1 $cmd} res
       if {[string length $res]} {puts $res}
   }
}
#----------------------------Test case, a simple string reverter:
proc sreverse str {
   set res ""
   for {set i [string length $str]} {$i > 0} {} {
       append res [string index $str [incr i -1]]
   }
   set res
}
#-- Turn on stepping for sreverse:
step sreverse
sreverse hello
#-- Turn off stepping (you can also type this command from inside interact):
step sreverse 0
puts [sreverse Goodbye]

The above code gives the following transcript when sourced into a tclsh:

{set res {}} enterstep -->
==>
{for {set i [string length $str]} {$i > 0} {} {
       append res [string index $str [incr i -1]]
   }} enterstep -->
{string length hello} enterstep -->
==>5
{set i 5} enterstep -->
==>5
{incr i -1} enterstep -->
==>4
{string index hello 4} enterstep -->
==>o
{append res o} enterstep -->
==>o
{incr i -1} enterstep -->
==>3
{string index hello 3} enterstep -->
==>l
{append res l} enterstep -->
==>ol
{incr i -1} enterstep -->
==>2
{string index hello 2} enterstep -->
==>l
{append res l} enterstep -->
==>oll
{incr i -1} enterstep -->
==>1
{string index hello 1} enterstep -->
==>e
{append res e} enterstep -->
==>olle
{incr i -1} enterstep -->
==>0
{string index hello 0} enterstep -->
==>h
{append res h} enterstep -->
==>olleh
==>
{set res} enterstep -->
==>olleh
eybdooG

Debugging

The simplest way to inspect why something goes wrong is inserting a puts command before the place where it happens. Say if you want to see the values of variables x and y, just insert

puts x:$x,y:$y

(if the string argument contains no spaces, it needs not be quoted). The output will go to stdout - the console from where you started the script. On Windows or Mac, you might need to add the command

console show

to get the substitute console Tcl creates for you, when no real one is present.

If at some time you want to see details of what your program does, and at others not, you can define and redefine a dputs command that either calls puts or does nothing:

proc d+ {} {proc dputs args {puts $args}}
proc d- {} {proc dputs args {}}
d+ ;# initially, tracing on... turn off with d-

For more debugging comfort, add the proc interact from above to your code, and put a call to interact before the place where the error happens. Some useful things to do at such a debugging prompt:

info level 0    ;# shows how the current proc was called
info level      ;# shows how deep you are in the call stack
uplevel 1 ...   ;# execute the ... command one level up, i.e. in the caller of the current proc
set ::errorInfo ;# display the last error message in detail

Assertions

Checking data for certain conditions is a frequent operation in coding. Absolutely intolerable conditions can just throw an error:

  if {$temperature > 100} {error "ouch... too hot!"}

Where the error occurred is evident from ::errorInfo, which will look a bit clearer (no mention of the error command) if you code

  if {$temperature > 100} {return -code error "ouch... too hot!"}

If you don't need hand-crafted error messages, you can factor such checks out to an assert command:

proc assert condition {
   if {![uplevel 1 expr $condition]} {
       return -code error "assertion failed: $condition"
   }
}

Use cases look like this:

  assert {$temperature <= 100}

Note that the condition is reverted - as "assert" means roughly "take for granted", the positive case is specified, and the error is raised if it is not satisfied.

Tests for internal conditions (that do not depend on external data) can be used during development, and when the coder is sure they are bullet-proof to always succeed, (s)he can turn them off centrally in one place by defining

proc assert args {}

This way, assertions are compiled to no bytecode at all, and can remain in the source code as a kind of documentation.

If assertions are tested, it only happens at the position where they stand in the code. Using a trace, it is also possible to specify a condition once, and have it tested whenever a variable's value changes:

proc assertt {varName condition} {
   uplevel 1 [list trace var $varName w "assert $condition ;#"]
}

The ";#" at the end of the trace causes the additional arguments name element op, that are appended to the command prefix when a trace fires, to be ignored as a comment.

Testing:

% assertt list {[llength $list]<10}
% set list {1 2 3 4 5 6 7 8}
1 2 3 4 5 6 7 8
% lappend list 9 10
can't set "list": assertion failed: 10<10

The error message isn't as clear as could be, because the [llength $list] is already substituted in it. But I couldn't find an easy solution to that quirk in this breakfast fun project - backslashing the $condition in the assertt code sure didn't help. Better ideas welcome.

In any case, these few lines of code give us a kind of bounds checking - the size of Tcl's data structures is in principle only bounded by the available virtual memory, but runaway loops may be harder to debug, compared to a few assertt calls for suspicious variables:

assertt aString {[string length $aString]<1024}

or

assertt anArray {[array size anArray] < 1024*1024}

Tcllib has a control::assert with more bells and whistles.

A tiny testing framework

Bugs happen. The earlier found, the easier for the coder, so the golden rule "Test early. Test often" should really be applied.

One easy way is adding self-tests to a file of Tcl code. When the file is loaded as part of a library, just the proc definitions are executed. If however you feed this file directly to a tclsh, that fact is detected, and the "e.g." calls are executed. If the result is not the one expected, this is reported on stdout; and in the end, you even get a little statistics.

Here's a file that implements and demonstrates "e.g.":

# PROLOG -- self-test: if this file is sourced at top level:
if {[info exists argv0]&&[file tail [info script]] eq [file tail $argv0]} {
   set Ntest 0; set Nfail 0
   proc e.g. {cmd -> expected} {
       incr ::Ntest
       catch {uplevel 1 $cmd} res
       if {$res ne $expected} {
           puts "$cmd -> $res, expected $expected"
           incr ::Nfail
       }
   }
} else {proc e.g. args {}} ;# does nothing, compiles to nothing
##------------- Your code goes here, with e.g. tests following
proc sum {a b} {expr {$a+$b}}
e.g. {sum 3 4} -> 7
proc mul {a b} {expr {$a*$b}}
e.g. {mul 7 6} -> 42
# testing a deliberate error (this way, it passes):
e.g. {expr 1/0} -> "divide by zero"
## EPILOG -- show statistics:
e.g. {puts "[info script] : tested $::Ntest, failed $::Nfail"} -> ""

Guarded proc

In more complex Tcl software, it may happen that a procedure is defined twice with different body and/or args, causing hard-to-track errors. The Tcl command proc itself doesn't complain if it is called with an existing name. Here is one way to add this functionality. Early in your code, you overload the proc command like this:

 rename proc _proc
 _proc proc {name args body} {
   if {[info commands $name]!=""} {
       puts stderr "warning: [info script] redefines $name"
   }
   _proc $name $args $body
 }

From the time that is sourced, any attempt to override a proc name will be reported to stderr (on Win-wish, it would show on the console in red). You may make it really strict by adding an "exit" after the "puts stderr ...", or throw an error.

Known feature: proc names with wildcards will run into this trap, e.g.

  proc * args {expr [join $args *]*1}

will always lead to a complaint because "*" fits any proc name. Fix (some regsub magic on 'name') left as an exercise.

Windows wish console

While on Unixes, the standard channels stdin, stdout, and stderr are the same as the terminal you started wish from, a Windows wish doesn't typically have these standard channels (and is mostly started with double-click anyway). To help this, a console was added that takes over the standard channels (stderr even coming in red, stdin in blue). The console is normally hidden, but can be brought up with the command

 console show

You can also use the partially documented "console" command. "console eval <script>" evals the given script in the Tcl interpreter that manages the console. The console's text area is actually a text widget created in this interpreter. For example:

       console eval {.console config -font Times}

will change the font of the console to "Times". Since the console is a Tk text widget, you can use all text widget commands and options on it (for example, changing colors, bindings...).

console eval {winfo children .}

tells you more about the console widget: it is a toplevel with children .menu, .console (text), and .sb (scrollbar). You can resize the whole thing with

console eval {wm geometry . $Wx$H+$X+$Y}

where $W and $H are dimensions in character cells (default 80x24), but $X and $Y are in pixels.

And more again: you can even add widgets to the console - try

console eval {pack [button .b -text hello -command {puts hello}]}

The button appears between the text widget and the scroll bar, and looks and does as expected. There is also a way back: the main interpreter is visible in the console interpreter under the name, consoleinterp.

Tcl in internationalisation

Characters, glyphs, code points

"Everything is a string", the Tcl mantra goes. A string is a (finite-length) sequence of characters. Now, what is a character? A character is not the same as a glyph, the writing element that we see on screen or paper - that represents it, but the same glyph can stand for different characters, or the same character be represented with different glyphs (think e.g. of a font selector).

Also, a character is not the same as a byte, or sequence of bytes, in memory. That again may represent a character, but not unequivocally, once we leave the safe haven of ASCII.

Let's try the following working definition: "A character is the abstract concept of a small writing unit". This often amounts to a letter, digit, or punctuation sign - but a character can be more or less than that. More: Ligatures, groups of two or more letters, can at times be treated as one character (arranged even in more than one line, as seen in Japanese U+337F ㍿ or Arabic U+FDFA ﷺ). Less: little marks (diacritics) added to a character, like the two dots on ü in Nürnberg (U+00FC), can turn that into a new "precomposed" character, as in German; or they may be treated as a separate, "composing character" (U+0308 in the example) which in rendering is added to the preceding glyph (u, U+0075) without advancing the rendering position - a more sensible treatment of the function of these two dots, "trema", in Spanish, Dutch, or even (older) English orthography: consider the spelling "coöperation" in use before c. 1950. Such composition is the software equivalent of "dead keys" on a typewriter.

Although an abstract concept, a character may of course have attributes, most importantly a name: a string, of course, which describes its function, usage, pronunciation etc. Various sets of names have been formalized in Postscript (/oumlaut) or HTML (ö). Very important in technical applications is of course the assignment of a number (typically a non-negative integer) to identify a character - this is the essence of encodings, where the numbers are more formally called code points. Other attributes may be predicates like "is upper", "is lower", "is digit".

The relations between the three domains are not too complicated: an encoding controls how a 1..n sequence of bytes is interpreted as a character, or vice versa; the act of rendering turns an abstract character into a glyph (typically by a pixel or vector pattern). Conversely, making sense of a set of pixels to correctly represent a sequence of characters, is the much more difficult art of OCR, which will not be covered here.

Pre-Unicode encodings

Work on encodings, mapping characters to numbers (code points), has a longer history than electronic computing. Francis Bacon (1561-1626) is reported to have used, around 1580, a five-bit encoding where bits were represented as "a" or "b", of the English/Latin alphabet (without the letters j and u!), long before Leibniz discussed binary arithmetics in 1679. An early encoding in practical use was the 5-bit Baudot/CCIT-2 teletype (punch tape) code standardized in 1932, which could represent digits and some punctuations by switching between two modes. I have worked on Univac machines that used six bits per "Fieldata" character, as hardware words were 36 bits long. While IBM used 8 bits in the EBCDIC code, the more famous American Standard Code for Information Interchange (ASCII) did basically the same job in 7 bits per character, which was sufficient for upper/lowercase basic Latin (English) as well as digits and a number of punctuations and other "special" characters - as hardware tended to 8-bit bytes as smallest memory unit, one was left for parity checks or other purposes.

The most important purpose, outside the US, was of course to accommodate more letters required to represent the national writing system - Greek, Russian, or the mixed set of accented or "umlauted" characters used in virtually every country in Europe. Even England needed a code point for the Pound Sterling sign. The general solution was to use the 128 additional positions available when ASCII was implemented as 8-bit bytes, hex 80..FF. A whole flock of such encodings were defined and used:

  • ISO standard encodings iso8859-.. (1-15)
  • MS/DOS code pages cp...
  • Macintosh code pages mac...
  • Windows code pages cp1...

The East Asian countries China, Japan, Korea all use character sets numbering several thousands, so the "high ASCII" approach was not feasible there. Instead, the ASCII concept was extended to a 2x7 bit pattern, where the 94 printing ASCII characters indicate row and column in a 94x94 matrix. This way, all character codes were in practice two bytes wide, and thousands of Hanzi/Kanji/Hangul could be accommodated, plus hundreds of others (ASCII, Greek, Russian alphabets, many graphic characters). These national multibyte encodings are:

  • JIS C-6226 (Japan 1978)
  • GB 2312-80 (China 1980)
  • KS C-5601 (Korea)

If the 2x7 pattern was directly implemented, files in such encodings could not be told apart from ASCII files, except for unreadability. In order to handle both types of strings transparently, the "high ASCII" approach was extended so that a byte in 00..7F was taken at ASCII face value, while bytes with high bit set (80..FF) were interpreted as halves of multibyte codes. For instance, the first Chinese character in GB2312, row 16 col 1 (decimally 1601 for short), gives the two bytes

16 + 32 + 128 = 176 = 0xB0
 1 + 32 + 128 = 161 = 0xA1

This implementation became known as "Extended UNIX Code" (EUC) in the national flavors euc-cn (China), -jp (Japan), -kr (Korea). Microsoft did it differently and adopted the similar but different ShiftJIS (Japan) resp. "Big 5" (Taiwan, Hongkong) encodings, to add to the "ideograph soup" confusion.

Unicode as pivot for all other encodings

The Unicode standard is an attempt to unify all modern character encodings into one consistent 16-bit representation. Consider a page with a 16x16 table filled with EuroLatin-1 (ISO 8859-1), the lower half being the ASCII code points. Call that "page 00" and imagine a book of 256 or more such pages (with all kinds of other characters on them, in majority CJK), then you have a pretty clear concept of the Unicode standard, in which a character's code position is "U+" hex (page number*256+cell number), for instance, U+20A4 is the Pound Sterling sign. Initiated by the computer industry (www.unicode.org), the Unicode has grown together with ISO 10646, a parallel standard providing an up-to-31-bits encoding (one left for parity?) with the same scope. Software must allow Unicode strings to be fit for i18n. From Unicode version 3.1, the 16-bit limit was transcended for some rare writing systems, but also for the CJK Unified Ideographs Extension B - apparently, even 65536 code positions are not enough. The total count in Unicode 3.1 is 94,140 encoded characters, of which 70,207 are unified Han ideographs; the next biggest group are over 14000 Korean Hangul. And the number is growing.

Unicode 4.0.0 is the latest version, reported to contain 96,248 Graphic characters, 134 format characters, 65 Control characters, 137,468 "private use", 2,048 surrogates, 66 noncharacters. 878,083 code points are reserved for what the future will bring. From www.unicode.org/versions/Unicode4.0.0 : "1,226 new character assignments were made to the Unicode Standard, Version 4.0 (over and above what was in Unicode 3.2). These additions include currency symbols, additional Latin and Cyrillic characters, the Limbu and Tai Le scripts; Yijing Hexagram symbols, Khmer symbols, Linear B syllables and ideograms, Cypriot, Ugaritic, and a new block of variation selectors (especially for future CJK variants)."

Unicode implementations: UTF-8, UCS-16

UTF-8 is made to cover 7-bit ASCII, Unicode, and ISO 10646. Characters are represented as sequences of 1..6 eight-bit bytes - termed octets in the character set business - (for ASCII: 1, for Unicode: 2..3) as follows:

  • ASCII 0x00..0x7F (Unicode page 0, left half): 0x00..0x7F. Nothing changed.
  • Unicode, pages 00..07: 2 bytes, 110aaabb 10bbbbbb, where aaa are the rightmost bits of page#, bb.. are the bits of the second Unicode byte. These pages cover European/Extended Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic.
  • Unicode, pages 08..FE: 3 bytes, 1110aaaa 10aaaabb 10bbbbbb. These cover all the rest of Unicode, including Hangul, Kanji, and what else. This means that East Asian texts are 50% longer in UTF-8 than in pure 16 bit Unicode.

ISO 10646 codes beyond Unicode: 4..6 bytes. (Never seen one yet). General principle of UTF-8 is that the first byte either is a single-byte character (if below 0x80), or indicates the length of a multi-byte code by the number of 1's before the first 0, and is then filled up with data bits. All other bytes start with bits 10 and are then filled up with 6 data bits. It follows from this that bytes in UTF-8 encoding fall in distinct ranges:

  00..7F - plain old ASCII
  80..BF - non-initial bytes of multibyte code
  C2..FD - initial bytes of multibyte code (C0, C1 are not legal!)
  FE, FF - never used (so, free for byte-order marks).

The distinction between initial and non-initial helps in plausibility checks, or to re-synchronize with missing data. Besides, it's independent of byte order (as opposed to UCS-16, see below). Tcl however shields these UTF-8 details from us: characters are just characters, no matter whether 7 bit, 16 bit, or (in the future) more.

I found out by chance that the byte sequence EF BB BF is the UTF-8 equivalent of \uFEFF, and the humble Notepad editor of Windows 2000 indeed switches to UTF-8 encoding when a file starts with these three bytes. I don't know how widely used this convention is, but I like it - my i18n-aware Tcl code will adopt it for reading and writing files, in addition to FEFF treatment that I already do.

The UCS-16 representation (in Tcl just called the "unicode" encoding) is much easier explained: each character code is written as a 16-bit "short" unsigned integer. The practical complication is that the two memory bytes making up a "short" can be arranged in "big-endian" (Motorola, Sparc) or "little-endian" (Intel) byte order. Hence, the following rules were defined for Unicode:

Code point U+FEFF was defined as Byte Order Mark (BOM), later renamed to "Zero-width non-breaking space"; code point U+FFFE (as well as FFFF) is not a valid Unicode. This way, a Unicode-reading application (even Notepad/W2k) can easily detect that something's wrong when it encounters the byte sequence FFFE, and swap the following byte pairs - a minimal and elegant way of dealing with varying byte orders. For XML, a similar self-identification is defined with the encoding attribute in the leading tag.

Tcl and encodings

From Tcl 8.1, i18n support was brought to string processing, and it was wisely decided to

  • use Unicode as general character set
  • use UTF-8 as standard internal encoding
  • provide conversion support for the many other encodings in use.

However, as unequal-length byte sequences make simple tasks as indexing into a string, or determining its length in characters more complex, the internal representation is converted to fixed-length 16-bit UCS-16 in such cases. (This brings new problems with recent Unicodes that cross the 16-bit barrier... When practical use justifies it, this will have to change to UCS-32, or 4 bytes per character.)

Not all i18n issues are therefore automatically solved for the user. One still has to analyze seemingly simple tasks like uppercase conversion (Turkish dotted/undotted I make an anomaly) or sorting ("collation order" is not necessarily the numeric order of the Unicodes, as lsort would apply by default), and write custom routines if a more correct behavior is required. Other locale-dependent i18n issues like number/currency formatting, date/time handling also belong to this group. I recommend to start from the defaults Tcl provides, and if necessary, customize the appearance as desired. International data exchange is severely hampered if localized numeric data are exchanged, one side using period, the other comma as decimal point...

Strictly spoken, the Tcl implementation "violates the UTF-8 spec, which explicitly forbids non-canonical representation of characters and requires that malformed UTF-8 sequences in the input be errors. ... I think that to be an advantage. But the spec says 'MUST' so we're at least technically non-compliant." (Kevin B. Kenny in the Tcl chat, 2003-05-13)

If textual data are internal to your Tcl script, all you have to know is the \uxxxx notation, which is substituted into the character with Unicode U+xxxx (hexadecimal). This notation can be used wherever Tcl substitution takes place, even in braced regexp's and string map pairlists; else you can force it by substing the string in question.

To demonstrate that for instance scan works transparently, here's a one-liner to format any Unicode character as HTML hex entity:

proc c2html c {format "&#x%4.4x;" [scan $c %c]}

Conversely it takes a few lines more:

proc html2u string {
   while {[regexp {&#[xX]([0-9A-Fa-f]+);} $string matched hex]} {
       regsub -all $matched $string [format %c 0x$hex] string
   }
   set string
}
% html2u "this is a &x20ac; sign"
this is a € sign

For all other purposes, two commands basically provide all i18n support:

fconfigure $ch -encoding $e

enables conversion from/to encoding e for an open channel (file or socket) if different from system encoding;

encoding convertfrom/to $e $string

does what it says, the other encoding being always Unicode.

For instance, I could easily decode the bytes EF BB BF from a hexdump with

format %x [encoding convertfrom utf-8 \xef\xbb\xbf]

in an interactive tclsh, and found that it stood for the famous byte-order mark FEFF. Internally to Tcl, (almost) everything is a Unicode string. All communications with the operating system is done in the "system encoding", which you can query (but best not change) with the [encoding system] command. Typical values are iso8859-1 or -15 on European Linuxes, and cp1252 on European Windowses.

Finally, the msgcat package supports localization ("l10n") of apps by allowing message catalogs for translation of strings, typically for GUI display, from a base language (typically English) to a target language selected by the current locale. For example, an app to be localized for France might contain a file en_fr.msg with, for simplicity, only the line

msgcat::mcset fr File FichierIn the app itself, all you need is
package require msgcat
namespace import msgcat::mc
msgcat::mclocale fr ;#(1)
#...
pack [button .b -text [mc File]]

to have the button display the localized text for "File", namely "Fichier", as obtained from the message catalog. For other locales, only a new message catalog has to be produced by translating from the base language. Instead of explicit setting as in (1), typically the locale information might come from an environment (LANG) or registry variable.

Tk: text rendering, fonts

Rendering international strings on displays or printers can pose the biggest problems. First, you need fonts that contain the characters in question. Fortunately, more and more fonts with international characters are available, a pioneer being Bitstream Cyberbit that contains roughly 40000 glyphs and was for some time offered for free download on the Web. Microsoft's Tahoma font also added support for most alphabet writings, including Arabic. Arial Unicode MS delivered with Windows 2000 contains just about all the characters in the Unicode, so even humble Notepad can get truly international with that.

But having a good font is still not enough. While strings in memory are arranged in logical order, with addresses increasing from beginning to end of text, they may need to be rendered in other ways, with diacritics shifted to various positions of the preceding character, or most evident for the languages that are written from right to left ("r2l"): Arabic, Hebrew. (Tk still lacks automatic "bidi"rectional treatment, so r2l strings have to be directed "wrongly" in memory to appear right when rendered - see A simple Arabic renderer on the Wiki).

Correct bidi treatment has consequences for cursor movement, line justification, and line wrapping as well. Vertical lines progressing from right to left are popular in Japan and Taiwan - and mandatory if you had to render Mongolian.

Indian scripts like Devanagari are alphabets with about 40 characters, but the sequence of consonants and vowels is partially reversed in rendering, and consonant clusters must be rendered as ligatures of the two or more characters involved - the pure single letters would look very ugly to an Indian. An Indian font for one writing system already contains several hundred glyphs. Unfortunately, Indian ligatures are not contained in the Unicode (while Arabic ones are), so various vendor standards apply for coding such ligatures.

Input methods in Tcl/Tk

To get outlandish characters not seen on the keyboard into the machine, they may at lowest level be specified as escape sequences, e.g. "\u2345". But most user input will come from keyboards, for which many layouts exist in different countries. In CJK countries, there is a separate coding level between keys and characters: keystrokes, which may stand for the pronunciation or geometric components of a character, are collected in a buffer and converted into the target code when enough context is available (often supported by on-screen menus to resolve ambiguities).

Finally, a "virtual keyboard" on screen, where characters are selected by mouse click, is especially helpful for non-frequent use of rarer characters, since the physical keyboard gives no hints which key is mapped to which other code. This can be implemented by a set of buttons, or minimally with a canvas that holds the provided characters as text items, and bindings to <1>, so clicking on a character inserts its code into the widget which has keyboard focus. See iKey: a tiny multilingual keyboard.

The term "input methods" is often used for operating-system-specific i18n support, but I have no experiences with this, doing i18n from a German Windows installation. So far I'm totally content with hand-crafted pure Tcl/Tk solutions - see taiku on the Wiki.

Transliterations: The Lish family

The Lish family is a set of transliterations, all designed to convert strings in lowly 7-bit ASCII to appropriate Unicode strings in some major non-Latin writing systems. The name comes from the common suffix "lish" as in English, which is actually the neutral element of the family, faithfully returning its input ;-) Some rules of thumb:

  • One *lish character should unambiguously map to one target character, wherever applicable
  • One target letter should be represented by one *lish letter (A-Za-z), wherever applicable. Special characters and digits should be avoided for coding letters
  • Mappings should be intuitive and/or follow established practices

In languages that distinguish case, the corresponding substitutes for upper- and lowercase letters should also correspond casewise in lower ASCII. It all began with Greeklish, which is not my invention, but used by Greeks on the Internet for writing Greek without Greek fonts or character set support. I just extended the practice I found with the convention of marking accented vowels with a trailing apostrophe (so it's not a strict 1:1 transliteration anymore). The Tclers' Wiki http://mini.net/tcl/ has the members of the Lish family available for copy'n'paste. The ones I most frequently use are

  • Arblish, which does context glyph selection and right-to-left conversion;
  • Greeklish;
  • Hanglish for Korean Hangul, which computes Unicodes from initial-vowel-final letters;
  • Ruslish for Cyrillic.

Calling examples, that return the Unicodes for the specified input:

  arblish   dby w Abw Zby
  greeklish Aqh'nai
  hanglish  se-qul
  heblish   irwsliM
  ruslish   Moskva i Leningrad


Tcl Examples

Most of these example scripts first appeared in the Tclers' Wiki http://wiki.tcl.tk . The author (Richard Suchenwirth) declares them to be fully in the public domain. The following scripts are plain Tcl, they don't use the Tk GUI toolkit (there's a separate chapter for those).

The pages in the Tclers' Wiki were designed to be directly savable as executable scripts, therefore comments were wrapped in if 0 {...} blocks.

Sets as lists

Tcl's lists are well suited to represent sets. Here's typical set operations. If you use the tiny testing framework explained earlier, the e.g. lines make the self-test; otherwise they just illustrate how the operations should work.

proc set'contains {set el} {expr {[lsearch -exact $set $el]>=0}}
e.g. {set'contains {A B C} A} -> 1
e.g. {set'contains {A B C} D} -> 0
proc set'add {_set args} {
   upvar 1 $_set set
   foreach el $args {
       if {![set'contains $set $el]} {lappend set $el}
   }
   set set
}
set example {1 2 3}
e.g. {set'add example 4} -> {1 2 3 4}
e.g. {set'add example 4} -> {1 2 3 4}
proc set'remove {_set args} {
   upvar 1 $_set set
   foreach el $args {
       set pos [lsearch -exact $set $el]
       set set [lreplace $set $pos $pos]
   }
   set set
}
e.g. {set'remove example 3} -> {1 2 4}
proc set'intersection {a b} {
   foreach el $a {set arr($el) ""}
   set res {}
   foreach el $b {if {[info exists arr($el)]} {lappend res $el}}
   set res
}
e.g. {set'intersection {1 2 3 4} {2 4 6 8}} -> {2 4}
proc set'union {a b} {
   foreach el $a {set arr($el) ""}
   foreach el $b {set arr($el) ""}
   lsort [array names arr]
}
e.g. {set'union {1 3 5 7} {2 4 6 8}} -> {1 2 3 4 5 6 7 8}
proc set'difference {a b} {
   eval set'remove a $b
}
e.g. {set'difference {1 2 3 4 5} {2 4 6}} -> {1 3 5}

Hex-dumping a file

The following example code opens a file, configures it to binary translation (i.e. line-ends \r\n are not standardized to \n as usual in C), and prints as many lines as needed which each contain 16 bytes in hexadecimal notation, plus, where posasible, the ASCII character.

proc file'hexdump filename {
   set fp [open $filename]
   fconfigure $fp -translation binary
   set n 0
   while {![eof $fp]} {
       set bytes [read $fp 16]
       regsub -all {[^\x20-\xfe]} $bytes . ascii
       puts [format "%04X %-48s %-16s" $n [hexdump $bytes] $ascii]
       incr n 16
   }
   close $fp
}
proc hexdump string {
   binary scan $string H* hex
   regexp -all -inline .. $hex
}

The "main routine" is a single line that dumps all files given on the command line:

foreach file $argv {file'hexdump $file}

Sample output, the script applied to itself:

...> tclsh hexdump.tcl hexdump.tcl
0000 0d 0a 20 70 72 6f 63 20 66 69 6c 65 27 68 65 78  .. proc file'hex
0010 64 75 6d 70 20 66 69 6c 65 6e 61 6d 65 20 7b 0d  dump filename {.
0020 0a 20 20 20 20 73 65 74 20 66 70 20 5b 6f 70 65  .    set fp [ope
0030 6e 20 24 66 69 6c 65 6e 61 6d 65 5d 0d 0a 20 20  n $filename]..
...

Roman numerals

Roman numerals are an additive (and partially subtractive) system with the following letter values:

I=1 V=5 X=10 L=50 C=100 D=500 M=1000; MCMXCIX = 1999

Here's some Tcl routines for dealing with Roman numerals.

Sorting roman numerals: I,V,X already come in the right order; for the others we have to introduce temporary collation transformations, which we'll undo right after sorting:

proc roman:sort list {
   set map {IX VIIII L Y XC YXXXX C Z D {\^} ZM {\^ZZZZ} M _}
   foreach {from to} $map {
       regsub -all $from $list $to list
   }
   set list [lsort $list]
   foreach {from to} [lrevert $map] {
       regsub -all $from $list $to list
   }
   set list
}

Roman numerals from integer:

proc roman:numeral {i} {
       set res ""
       foreach {value roman} {
           1000 M 900 CM 500 D 400 CD 100 C 90 XC 50 L 40 XL 
           10 X 9 IX 5 V 4 IV 1 I} {
               while {$i>=$value} {
                       append res $roman
                       incr i -$value
               }
       }
       set res
}

Roman numerals parsed into integer:

proc roman:get {s} {
       array set r_v {M 1000 D 500 C 100 L 50 X 10 V 5 I 1}
       set last 99999; set res 0
       foreach i [split [string toupper $s] ""] {
               if [catch {set val $r_v($i)}] {
                   error "un-Roman digit $i in $s"
               }
               incr res $val
               if {$val>$last} {incr res [expr -2*$last]}
               set last $val
       }
       set res
}

Custom control structures

As "control structures" are really nothing special in Tcl, just a set of commands, it is easier than in most other languages to create one's own. For instance, if you would like to simplify the for loop

for {set i 0} {$i < $max} {incr i} {...}

for the typical simple cases so you can write instead

loop i 0 $max {...}

here is an implementation that even returns a list of the results of each iteration:

proc loop {_var from to body} {
   upvar 1 $_var var
   set res {}
   for {set var $from} {$var < $to} {incr var} {lappend res [uplevel 1 $body]}
   set res
 }

using this, a string reverse function can be had as a one-liner:

proc sreverse {str} {
   join [loop i 0 [string length $str] {string index $str end-$i}] ""
}

Another example is the following range-aware switch variation. A range (numeric or strings) can be given as from..to, and the associated scriptlet gets executed if the tested value lies inside that range.

Like in switch, fall-through collapsing of several cases is indicated by "-", and "default" as final condition fires if none else did. Different from switch, numbers are compared by numeric value, no matter whether given as decimal, octal or hex.

proc rswitch {value body} {
  set go 0
  foreach {cond script} $body {
     if {[regexp {(.+)\.\.(.+)} $cond -> from to]} {
          if {$value >= $from && $value <= $to} {incr go}
     } else {
         if {$value == $cond} {incr go}
     }
     if {$go && $script ne "-"} { #(2)
         uplevel 1 $script
         break
     }
  }
  if {$cond eq "default" && !$go} {uplevel 1 $script} ;#(1)
}

Testing:

% foreach i {A K c z 0 7} {
     puts $i
     rswitch $i {
        A..Z {puts upper} 
        a..z {puts lower} 
        0..9 {puts digit}
     }
}
A
upper
K
upper
c
lower
z
lower
0
digit
7
digit
% rswitch 0x2A {42 {puts magic} default {puts df}}
magic

Rational numbers

Rational numbers, a.k.a. fractions, can be thought of as pairs of integers {numerator denominator}, such that their "real" numerical value is numerator/denominator (and not in integer nor "double" division!). They can be more precise than any "float" or "double" numbers on computers, as those can't exactly represent any fractions whose denominator isn't a power of 2 - consider 1/3 which can not at any precision be exactly represented as floating-point number to base 2, nor as decimal fraction (base 10), even if bignum.

An obvious string representation of a rational is of course "n/d". The following "constructor" does that, plus it normalizes the signs, reduces to lowest terms, and returns just the integer n if d==1:}

proc rat {n d} {
  if {!$d} {error "denominator can't be 0"}
  if {$d<0} {set n [- $n]; set d [- $d]}
  set g [gcd $n $d]
  set n [/ $n $g]
  set d [/ $d $g]
  expr {$d==1? $n: "$n/$d" }
}

if 0 {Conversely, this "deconstructor" splits zero or more rational or integer strings into num and den variables, such that [ratsplit 1/3 a b] assigns 1 to a and 3 to b:}

proc ratsplit args {
   foreach {r _n _d} $args {
      upvar 1 $_n n  $_d d
      foreach {n d} [split $r /] break
      if {$d eq ""} {set d 1}
   }
}
#-- Four-species math on "rats":
proc rat+ {r s} {
   ratsplit $r a b $s c d
   rat [+ [* $a $d] [* $c $b]] [* $b $d]
}
proc rat- {r s} {
   ratsplit $r a b $s c d
   rat [- [* $a $d] [* $c $b]] [* $b $d]
}
proc rat* {r s} {
   ratsplit $r a b $s c d
   rat [* $a $c] [* $b $d]
}
proc rat/ {r s} {
   ratsplit $r a b $s c d
   rat [* $a $d] [* $b $c]
}

Arithmetical helper functions can be wrapped with func if they only consist of one call of expr:

proc func {name argl body} {proc $name $argl [list expr $body]}
#-- Greatest common denominator:
func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}
#-- Binary expr operators exported:
foreach op {+ * / %} {func $op {a b} \$a$op\$b}
#-- "-" can have 1 or 2 operands:
func - {a {b ""}} {$b eq ""? -$a: $a-$b}
#-- a little tester reports the unexpected:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd -> $res, expected $expected"}
}
#-- The test suite should silently pass when this file is sourced:
? {rat 42 6} 7
? {rat 1 -2} -1/2
? {rat -1 -2} 1/2
? {rat 1 0} "denominator can't be 0"
? {rat+ 1/3 1/3} 2/3
? {rat+ 1/2 1/2} 1
? {rat+ 1/2 1/3} 5/6
? {rat+ 1 1/2}    3/2
? {rat- 1/2 1/8} 3/8
? {rat- 1/2 1/-8} 5/8
? {rat- 1/7 1/7} 0
? {rat* 1/2 1/2} 1/4
? {rat/ 1/4 1/4} 1
? {rat/ 4 -6} -2/3

Docstrings

Languages like Lisp and Python have the docstring feature, where a string in the beginning of a function can be retrieved for on-line (or printed) documentation. Tcl doesn't have this mechanism built-in (and it would be hard to do it exactly the same way, because everything is a string), but a similar mechanism can easily be adopted, and it doesn't look bad in comparison:

  • Common Lisp: (documentation 'foo 'function)
  • Python: foo.__doc__
  • Tcl: docstring foo

If the docstring is written in comments at the top of a proc body, it is easy to parse it out. In addition, for all procs, even without docstring, you get the "signature" (proc name and arguments with defaults). The code below also serves as usage example: }

proc docstring procname {
   # reports a proc's args and leading comments.
   # Multiple documentation lines are allowed.
   set res "{usage: $procname [uplevel 1 [list args $procname]]}"
   # This comment should not appear in the docstring
   foreach line [split [uplevel 1 [list info body $procname]] \n] {
       if {[string trim $line] eq ""} continue
       if ![regexp {\s*#(.+)} $line -> line] break
       lappend res [string trim $line]
   }
   join $res \n
}
proc args procname {
   # Signature of a proc: arguments with defaults
   set res ""
   foreach a [info args $procname] {
       if [info default $procname $a default] {
           lappend a $default
       }
       lappend res $a
   }
   set res
}

Testing:

% docstring docstring
usage: docstring procname
reports a proc's args and leading comments.
Multiple documentation lines are allowed.
% docstring args
usage: args procname
Signature of a proc: arguments with defaults

Factorial

Factorial (n!) is a popular function with super-exponential growth. Mathematically put,

  0! = 1
  n! = n (n-1)! if n >0, else undefined

In Tcl, we can have it pretty similarly:

proc fact n {expr {$n<2? 1: $n * [fact [incr n -1]]}}

But this very soon crosses the limits of integers, giving wrong results.

A math book showed me the Stirling approximation to n! for large n (at Tcl's precisions, "large" is > 20 ...), so I built that in:

proc fact n {expr {
    $n<2? 1:
    $n>20? pow($n,$n)*exp(-$n)*sqrt(2*acos(-1)*$n):
           wide($n)*[fact [incr n -1]]}
}

Just in case somebody needs approximated large factorials... But for n>143 we reach the domain limit of floating point numbers. In fact, the float limit is at n>170, so an intermediate result in the Stirling formula must have busted at 144. For such few values it is most efficient to just look them up in a pre-built table, as Tcllib's math::factorial does.

How big is A4?

Letter and Legal paper formats are popular in the US and other places. In Europe and elsewhere, the most widely used paper format is called A4. To find out how big a paper format is, one can measure an instance with a ruler, or look up appropriate documentation. The A formats can also be deduced from the following axioms:

  • A0 has an area of one square meter
  • A(n) has half the area of A(n-1)
  • The ratio between the longer and the shorter side of an A format is constant

How much this ratio is, can easily be computed if we consider that A(n) is produced from A(n-1) by halving it parallel to the shorter side, so

2a : b = b : a, 
2 a2 = b2, 
b=sqrt(2) a, hence 
b : a = sqrt(2) : 1

So here is my Tcl implementation, which returns a list of height and width in centimeters (10000 cm2 = 1 m2) with two fractional digits, which delivers a sufficient precision of 1/10 mm: }

proc paperA n {
   set w [expr {sqrt(10000/(pow(2,$n) * sqrt(2)))}]
   set h [expr {$w * sqrt(2)}]
   format "%.2f %.2f" $h $w
}
% paperA 4
29.73 21.02

Bit vectors

Here is a routine for querying or setting single bits in vectors, where bits are addressed by non-negative integers. Implementation is as a "little-endian" list of integers, where bits 0..31 are in the first list element, 32..63 in the second, etc.

Usage: bit varName position ?bitval?

If bitval is given, sets the bit at numeric position position to 1 if bitval != 0, else to 0; in any case returns the bit value at specified position. If variable varName does not exist in caller's scope, it will be created; if it is not long enough, it will be extended to hold at least $position+1 bits, e.g. bit foo 32 will turn foo into a list of two integers, if it was only one before. All bits are initialized to 0.

proc bit {varName pos {bitval {}}} {
   upvar 1 $varName var
   if {![info exist var]} {set var 0}
   set element [expr {$pos/32}]
   while {$element >= [llength $var]} {lappend var 0}
   set bitpos [expr {1 << $pos%32}]
   set word [lindex $var $element]
   if {$bitval != ""} {
       if {$bitval} {
           set word [expr {$word | $bitpos}]
       } else {
           set word [expr {$word & ~$bitpos}]
       }
       lset var $element $word
   }
   expr {($word & $bitpos) != 0}
}
#---------------------- now testing...
if {[file tail [info script]] == [file tail $argv0]} {
   foreach {test      expected} {
       {bit foo 5 1}  1
       {set foo}      32
       {bit foo 32 1} {32 1}
   } {
       catch {eval $test} res
       puts $test:$res/$expected
   }
}

This may be used for Boolean properties of numerically indexed sets of items. Example: An existence map of ZIP codes between 00000 and 99999 can be kept in a list of 3125 integers (where each element requires about 15 bytes overall), while implementing the map as an array would take 100000 * 42 bytes in worst case, but still more than a bit vector if the population isn't extremely sparse - in that case, a list of 1-bit positions, retrieved with lsearch, might be more efficient in memory usage. Runtime of bit vector accesses is constant, except when a vector has to be extended to much larger length.

Bit vectors can also be used to indicate set membership (set operations would run faster if processing 32 bits on one go with bitwise operators (&, |, ~, ^)) - or pixels in a binary imary image, where each row could be implemented by a bitvector.

Here's a routine that returns the numeric indices of all set bits in a bit vector:

proc bits bitvec {
   set res {}
   set pos 0
   foreach word $bitvec {
       for {set i 0} {$i<32} {incr i} {
           if {$word & 1<<$i} {lappend res $pos}
           incr pos
       }
   }
   set res
}
% bit foo 47 1
1
% bit foo 11 1
1
% set foo
2048 32768
% bits $foo
11 47

Sieve of Erastothenes: The following procedure exercises the bit vector functions by letting bits represent integers, and unsetting all that are divisible. The numbers of the bits finally still set are supposed to be primes, and returned:

proc sieve max {
   set maxroot [expr {sqrt($max)}]
   set primes [string repeat " 0xFFFFFFFF" [expr {($max+31)/32}]]
   bit primes 0 0; bit primes 1 0
   for {set i [expr $max+1]} {$i<=(($max+31)/32)*32} {incr i} {
       bit primes $i 0 ;# mask out excess bits
   }
   for {set i 2} {$i<=$maxroot} {incr i} {
      if {[bit primes $i]} {
          for {set j [expr $i<<1]} {$j<=$max} {incr j $i} {
              bit primes $j 0
          }
      }
   }
   bits $primes
}
% time {set res [sieve 10000]}
797000 microseconds per iteration

Here's code to count the number of 1-bits in a bit vector, represented as an integer list. It does so by adding the values of the hex digits:

proc bitcount intlist {
   array set bits {
      0 0  1 1  2 1  3 2  4 1  5 2  6 2  7 3
      8 1  9 2  a 2  b 3  c 2  d 3  e 3  f 4
   }
   set sum 0
   foreach int $intlist {
      foreach nybble [split [format %x $int] ""] {
         incr sum $bits($nybble)
      }
   }
   set sum
}


Stacks and queues

Stacks and queues are containers for data objects with typical access methods:

  • push: add one object to the container
  • pop: retrieve and remove one object from the container

In Tcl it is easiest to implement stacks and queues with lists, and the push method is most naturally lappend, so we only have to code a single generic line for all stacks and queues:

interp alias {} push {} lappend

It is pop operations in which stacks, queues, and priority queues differ:

  • in a stack, the most recently pushed object is retrieved and removed (last in first out, LIFO)
  • in a (normal) queue, it is the least recently pushed object (first in first out, FIFO)
  • in a priority queue, the object with the highest priority comes first.

Priority (a number) has to be assigned at pushing time - by pushing a list of two elements, the item itself and the priority, e.g..

push toDo [list "go shopping" 2]
push toDo {"answer mail" 3}
push toDo {"Tcl coding" 1}  ;# most important thing to do

In a frequent parlage, priority 1 is the "highest", and the number increases for "lower" priorities - but you could push in an item with 0 for "ultrahigh" ;-) Popping a stack can be done like this:

proc pop name {
   upvar 1 $name stack
   set res [lindex $stack end]
   set stack [lrange $stack 0 end-1]
   set res
}

Popping a queue is similarly structured, but with so different details that I found no convenient way to factor out things:

proc qpop name {
   upvar 1 $name queue
   set res [lindex $queue 0]
   set queue [lrange $queue 1 end]
   set res
}

Popping a priority queue requires sorting out which item has highest priority. Sorting can be done when pushing, or when popping, and since our push is so nicely generic I prefer the second choice (as the number of pushs and pops should be about equal, it does not really matter). Tcl's lsort is stable, so items with equal priority will remain in the order in which they were queued:

proc pqpop name {
   upvar 1 $name queue
   set queue [lsort -real -index 1 $queue]
   qpop queue ;# fall back to standard queue, now that it's sorted
}


A practical application is e.g. in state space searching, where the kind of container of the to-do list determines the strategy:

  • stack is depth-first
  • (normal) queue is breadth-first
  • priority queue is any of the more clever ways: A*, Greedy, ...

Recent-use lists: A variation that can be used both in a stack or queue fashion is a list of values in order of their last use (which may come handy in an editor to display the last edited files, for instance). Here, pushing has to be done by dedicated code because a previous instance would have to be removed:

proc rupush {listName value} {
     upvar 1 $listName list
     if {![info exist list]} {set list {}}
     set pos [lsearch $list $value]
     set list [lreplace $list $pos $pos]
     lappend list $value
}
% rupush tmp hello
hello
% rupush tmp world
hello world
% rupush tmp again
hello world again
% rupush tmp world
hello again world

The first element is the least recently, the last the most recently used. Elements are not removed by the popping, but (if necessary) when re-pushing. (One might truncate the list at front if it gets too long).

Functions

Functions in Tcl are typically written with the proc command. But I notice more and more that, on my way to functional programming, my proc bodies are a single call to expr which does all the rest (often with the powerful x?y:z operator). So what about a thin abstraction (wrapper) around this recurring pattern?

proc func {name argl body} {proc $name $argl [list expr $body]}

(I might have called it fun as well... it sure is.) That's all. A collateral advantage is that all expressions are braced, without me having to care. But to not make the page look so empty, here's some examples for func uses:

func fac n     {$n<2? 1: $n*[fac [incr n -1]]}
func gcd {u v} {$u? [gcd [expr $v%$u] $u]: $v}
func min {a b} {$a<$b? $a: $b}
func sgn x     {($x>0)-($x<0)} ;# courtesy rmax

Pity we have to make expr explicit again, in nested calls like in gcd ... But func isn't limited to math functions (which, especially when recursive, come out nice), but for expr uses in testing predicates as well:

func atomar list          {[lindex $list 0] eq $list}
func empty  list          {[llength $list] == 0}
func in    {list element} {[lsearch -exact $list $element] >= 0}
func limit {x min max}    {$x<$min? $min: $x>$max? $max: $x}
func ladd  {list e}       {[in $list $e]? $list: [lappend list $e]}

Exposing expr binary arithmetic operators as Tcl commands goes quite easy too:

foreach op {+ * / %} {func $op {a b} "\$a $op \$b"}

For "-", we distinguish unary and binary form:

func - {a {b ""}} {$b eq ""? -$a: $a-$b}

Having the modulo operator exposed, gcd now looks nicer:

func gcd {u v} {$u? [gcd [% $v $u] $u]: abs($v)}

For unary not I prefer that name to "!", as it might also stand for factorial - and see the shortest function body I ever wrote :^) :

func not x {!$x}

Without big mention, functions implemented by recursion have a pattern for which func is well suited (see fac and gcd above). Another example is this integer range generator (starts from 1, and is inclusive, so [iota1 5] == {1 2 3 4 5}):

func iota1 n {$n == 1? 1: [concat [iota1 [- $n 1]] $n]}

Experiments with Boolean functions

"NAND is not AND." Here are some Tcl codelets to demonstrate how all Boolean operations can be expressed in terms of the single NAND operator, which returns true if not both his two inputs are true (NOR would have done equally well). We have Boolean operators in expr, so here goes:}

proc nand {A B} {expr {!($A && $B)}}

The only unary operator NOT can be written in terms of nand:

proc not {A} {nand $A $A}

.. and everything else can be built from them too:

proc and {A B} {not [nand $A $B]}
proc or {A B} {nand [not $A] [not $B]}
proc nor {A B} {not [or $A $B]}
proc eq {A B} {or [and $A $B] [nor $A $B]}
proc ne {A B} {nor [and $A $B] [nor $A $B]}

Here's some testing tools - to see whether an implementation is correct, look at its truth table, here done as the four results for A,B combinations 0,0 0,1 1,0 1,1 - side note: observe how easily functions can be passed in as arguments:}

proc truthtable f {
   set res {}
   foreach A {0 1} {
       foreach B {0 1} {
           lappend res [$f $A $B]
       }
   }
   set res
}
% truthtable and
0 0 0 1
% truthtable nand
1 1 1 0
% truthtable or
0 1 1 1
% truthtable nor
1 0 0 0
% truthtable eq
1 0 0 1

To see how efficient the implementation is (in terms of NAND units used), try this, which relies on the fact that Boolean functions contain no lowercase letters apart from the operator names:}

proc nandcount f {
   regsub -all {[^a-z]} [info body $f] " " list
   set nums [string map {nand 1 not 1 and 2 nor 4 or 3 eq 6} $list]
   expr [join $nums +]
}

As a very different idea, having nothing to do with NAND as elementary function, the following generic code "implements" Boolean functions very intuitively, by just giving their truth table for look-up at runtime:

proc booleanFunction {truthtable a b} {
   lindex $truthtable [expr {!!$a+!!$a+!!$b}]
}
interp alias {} and  {} booleanFunction {0 0 0 1}
interp alias {} or   {} booleanFunction {0 1 1 1}
interp alias {} nand {} booleanFunction {1 1 1 0}

Solving cryptarithms

Cryptarithms are puzzles where digits are represented by letters, and the task is to find out which. The following "General Problem Solver" (for small values of General) uses heavy metaprogramming builds up a nest of foreachs suiting the problem, quick kills (with continue) to force unique values for the variables, and returns the first solution found, or else an empty string:

proc solve {problem {domain0 {0 1 2 3 4 5 6 7 8 9}}} {
   set vars [lsort -u [split [regsub -all {[^A-Z]} $problem ""] ""]]
   set map {= ==}
   set outers {}
   set initials [regexp -all -inline {[^A-Z]([A-Z])} /$problem]
   set pos [lsearch $domain0 0]
   set domain1 [lreplace $domain0 $pos $pos]
   foreach var $vars {
       append body "foreach $var \$domain[expr [lsearch $initials $var]>=0] \{\n"
       lappend map $var $$var
       foreach outer $outers {
           append body "if {$$var eq $$outer} continue\n"
       }
       lappend outers $var
       append epilog \}
   }
   set test [string map $map $problem]
   append body "if {\[expr $test\]} {return \[subst $test\]}" $epilog
   if 1 $body
}

This works fine on some well-known cryptarithms:

% solve SEND+MORE=MONEY
9567+1085==10652
% solve SAVE+MORE=MONEY
9386+1076==10462
% solve YELLOW+YELLOW+RED=ORANGE
143329+143329+846==287504

A simple database

There are lots of complex databases around. Here I want to explore how a database can be implemented in the Tcl spirit of simplicity, and how far that approach takes us. Consider the following model:

  • A database is a set of records
  • A record is a nonempty set of fields with a unique ID
  • A field is a pair of tag and nonempty value, both being strings

Fields may well be implemented as array entries, so we could have an array per record, or better one array for the whole database, where the key is composed of ID and tag. Unique IDs can be had by just counting up (incrementing the highest ID so far). The process of creating a simple database consists only of setting an initial value for the ID:

set db(lastid) 0

Let's consider a library application for an example. Adding a book to the database can be simply done by

set id [incr db(lastid)]
set db($id,author) "Shakespeare, William"
set db($id,title) "The Tempest"
set db($id,printed) 1962
set db($id,label) S321-001

Note that, as we never specified what fields a record shall contain, we can add whatever we see fit. For easier handling, it's a good idea to classify records somehow (we'll want to store more than books), so we add

set db($id,isa) book

Retrieving a record is as easy as this (though the fields come in undefined order):

array get db $id,*

and deleting a record is only slightly more convolved:

foreach i [array names db $id,*] {unset db($i)}

or, even easier and faster from Tcl 8.3 on:

array unset db $id,*

Here's how to get a "column", all fields of a given tag:

array get db *,title

But real columns may have empty fields, which we don't want to store. Retrieving fields that may not physically exist needs a tolerant access function:

proc db'get {_db id field} {
   upvar $_db db
   if {[array names db $id,$field]=="$id,$field"} {
       return $db($id,$field)
   } else {return ""}
}

In a classical database we have to define tables: which fields of what type and of which width. Here we can do what we want, even retrieve which fields we have used so far (using a temporary array to keep track of field names):

proc db'fields {_db} {
  upvar $_db db
  foreach i [array names db *,*] {
     set tmp([lindex [split $i ,] 1]) ""
  }
  lsort [array names tmp]
}

Searching for records that meet a certain condition can be done sequentially. For instance, we want all books printed before 1980:

foreach i [array names *,printed] {
   if {$db($i)<1980} {
       set id [lindex [split $i ,] 0]
       puts "[db'get db $id author]: [db'get db $id title] $db($i)"
   }
}

We might also store our patrons in the same database (here in a different style):

set i [incr $db(lastid)]
array set db [list $i,name "John F. Smith" $i,tel (123)456-7890 $i,isa  patron}

Without a concept of "tables", we can now introduce structures like in relational databases. Assume John Smith borrows "The Tempest". We have the patron's and book's ID in variables and do double bookkeeping:

lappend db($patron,borrowed) $book ;# might have borrowed other books
set db($book,borrower) $patron
set db($book,dueback) 2001-06-12

When he returns the book, the process is reversed:

set pos [lsearch $db($patron,borrowed) $book]
set db($patron,borrowed) [lreplace $db($patron,borrowed) $pos $pos]
unset db($book,borrower) ;# we're not interested in empty fields
unset db($book,dueback)

The dueback field (%Y-%M-%d format is good for sorting and comparing) is useful for checking whether books have not been returned in time:

set today [clock format [clock seconds] -format %Y-%M-%d]]
foreach i [array names db *,dueback] {
   if {$db($i)<$today} {
       set book [lindex [split $i ,] 0] ;# or: set book [idof $i] - see below
       set patron $db($book,borrower)
       #write a letter
       puts "Dear $db($patron,name), "
       puts "please return $db($book,title) which was due on\
       $db($book,dueback)"
   }
}

Likewise, parts of the accounting (e.g. orders to, and bills from, booksellers) can be added with little effort, and cross-related also to external files (just set the value to the filename).

Indexes: As shown, we can retrieve all data by sequential searching over array names. But if the database grows in size, it's a good idea to create indexes which cross-reference tags and values to IDs. For instance, here's how to make an authors' index in four lines:

foreach i [array names db *,author] {
   set book [lindex [split $i ,] 0]
   lappend db(author=[string toupper $db($i)]) $book
}
# and then..
foreach i [lsort [array names db author=SHAK*]] {
   puts "[lindex [split $i =] 1]:" ;# could be wrapped as 'valueof'
   foreach id $db($i) {
       puts "[db'get db $id title] - [db'get db $id label]"
   }
}

gives us a books list of all authors matching the given glob pattern (we reuse Tcl's functionality, instead of reinventing it...). Indexes are useful for repeated information that is likely to be searched. Especially, indexing the isa field allows iterating over "tables" (which we still don't explicitly have!;-):

regsub -all isa= [array names db isa=*] "" tables
foreach patron $db(isa=patron) {...}

And beyond industry-standard SQL, we can search multiple indices in one query:

array names db *=*MARK*

gives you all (case-independent) occurrences of MARK, be it in patron's names, book's authors or titles. As versatile as good old grep...

Persistence: Databases are supposed to exist between sessions, so here's how to save a database to a file:

set fp [open Library.db w]
puts $fp [list array set db [array get db]]
close $fp

and loading a database is even easier (on re-loading, better unset the array before):

source Library.db

If you use characters outside your system encoding (no problem to write Japanese book titles in Kanji), you'll have to fconfigure (e.g -encoding utf-8) on saving and loading, but that's just a few more LOC. Saving also goes a good way to what is ceremonially called "committing" (you'll need write-locking for multi-user systems), while loading (without saving before) might be called a "one-level rollback", where you want to discard your latest changes.

Notice that so far we have only defined one short proc, all other operations were done with built-in Tcl commands only. For clearer code, it is advisable to factor out frequent operations into procs, e.g.

proc idof {index} {lindex [split $index ,] 0}
proc db'add {_db data} {
   upvar $_db db
   set id [incr db(lastid)]
   foreach {tag value} $data {set db($id,$tag) $value}
   # might also update indexes here
}
proc db'tablerow {_db id tags} {
   upvar $_db db
   set res {}
   foreach tag $tags {lappend res [db'get db $id $tag]}
   set res
}

Of course, with growing databases we may reach memory limits: arrays need some extra storage for administration. On the other hand, the present approach is pretty economic, since it does not use field widths (all strings are "shrink-wrapped"), and omits empty fields, while at the same time allowing to add whatever fields you wish. A further optimization could be to tally value strings, and replace the frequent ones with "@$id", where db(@$id) holds the value once, and only db'get has to be adapted to redirect the query.

Also, memory limits on modern computers are somewhere up high... so only at some time in the future you might have (but maybe not want) to change to a complex database ;-)

On the limits: Tcl arrays may get quite large (one app was reported to store 800000 keys in Greek characters), and at some point enumerating all keys with array names db (which produces one long list) may exceed your available memory, causing the process to swap. In that situation, you can fall back to the (otherwise slower, and uglier) use of a dedicated iterator:

set search [array startsearch db]
while {[array anymore db $search]} {
   set key [array nextelement db $search]
   # now do something with db($key) - but see below!
}
array donesearch db $search

But neither can you filter the keys you will get with a glob pattern, nor may you add or delete array elements in the loop - the search will be immediately terminated.

Tables as lists of lists

Tables are understood here as rectangular (matrix) arrangements of data in rows (one row per "item"/"record") and columns (one column per "field"/"element"). They are for instance the building blocks of relational databases and spreadsheets. In Tcl, a sensible implementation for compact data storage would be as a list of lists. This way, they are "pure values" and can be passed e.g. through functions that take a table and return a table. No con-/destructors are needed, in contrast to the heavierweight matrix in Tcllib. I know there are many table implementations in Tcl, but like so often I wanted to build one "with my bare hands" and as simple as possible. As you see below, many functionalities can be "implemented" by just using Tcl's list functions.

A nice table also has a header line, that specifies the field names. So to create such a table with a defined field structure, but no contents yet, one just assigns the header list:

set tbl { {firstname lastname phone}}

Note the double bracing, which makes sure tbl is a 1-element list. Adding "records" to the table is as easy as

lappend tbl {John Smith (123)456-7890}

Make sure the fields (cells) match those in the header. Here single bracing is correct. If a field content contains spaces, it must be quoted or braced too:

lappend tbl {{George W} Bush 234-5678}

Sorting a table can be done with lsort -index, taking care that the header line stays on top:

proc tsort args {
   set table [lindex $args end]
   set header [lindex $table 0]
   set res [eval lsort [lrange $args 0 end-1] [list [lrange $table 1 end]]]
   linsert $res 0 $header
}

Removing a row (or contiguous sequence of rows) by numeric index is a job for lreplace:

set tbl [lreplace $tbl $from $to]

Simple printing of such a table, a row per line, is easy with

puts [join $tbl \n]

Accessing fields in a table is more fun with the field names than the numeric indexes, which is made easy by the fact that the field names are in the first row:

proc t@ {tbl field} {lsearch [lindex $tbl 0] $field}
% t@ $tbl phone
2

You can then access cells:

puts [lindex $tbl $rownumber [t@ $tbl lastname]]

and replace cell contents like this:

lset tbl $rownumber [t@ $tbl phone] (222)333-4567

Here is how to filter a table by giving pairs of field name and glob-style expression - in addition to the header line, all rows that satisfy at least one of those come through (you can force AND behavior by just nesting such calls):

proc trows {tbl args} {
   set conditions {}
   foreach {field condition} $args {
       lappend conditions [t@ $tbl $field] $condition
   }
   set res [list [lindex $tbl 0]]
   foreach row [lrange $tbl 1 end] {
       foreach {index condition} $conditions {
           if [string match $condition [lindex $row $index]] {
              lappend res $row
              break; # one hit is sufficient
           }
       }
   }
   set res
}
% trows $tbl lastname Sm*
{firstname lastname} phone {John Smith (123)456-7890}

This filters (and, if wanted, rearranges) columns, sort of what is called a "view":

proc tcols {tbl args} {
   set indices {}
   foreach field $args {lappend indices [t@ $tbl $field]}
   set res {}
   foreach row $tbl {
       set newrow {}
       foreach index $indices {lappend newrow [lindex $row $index]}
       lappend res $newrow
   }
   set res
}

Programming Languages Laboratory


In the following few chapters you'll see how easy it is to emulate or explore other programming langauges with Tcl.


Playing Assembler


In this weekend fun project to emulate machine language, I picked those parts of Intel 8080A/8085 Assembler (because I had a detailed reference handy) that are easily implemented and still somehow educational (or nostalgic ;-).

Of course this is no real assembler. The memory model is constant-size instructions (strings in array elements), which are implemented as Tcl procs. So an "assembler" program in this plaything will run even slower than in pure Tcl, and consume more memory - while normally you associate speed and conciseness with "real" assembler code. But it looks halfway like the real thing: you get sort of an assembly listing with symbol table, and can run it - I'd hardly start writing an assembler in C, but in Tcl it's fun for a sunny Sunday afternoon... }

namespace eval asm {
   proc asm body {
       variable mem
       catch {unset mem} ;# good for repeated sourcing
       foreach line [split $body \n] {
           foreach i {label op args} {set $i ""}
           regexp {([^;]*);} $line -> line ;# strip off comments
           regexp {^ *(([A-Z0-9]+):)? *([A-Z]*) +(.*)} [string toupper $line]\
                ->  -   label           op       args
                puts label=$label,op=$op,args=$args
           if {$label!=""} {set sym($label) $PC}
           if {$op==""}     continue
           if {$op=="DB"}  {set mem($PC) [convertHex $args]; incr PC; continue}
           if {$op=="EQU"} {set sym($label) [convertHex $args]; continue}
           if {$op=="ORG"} {set PC [convertHex $args]; continue}
           regsub -all ", *" $args " " args ;# normalize commas
           set mem($PC) "$op $args"
           incr PC
       }
       substituteSymbols sym
       dump   sym
   }
   proc convertHex s {
       if [regexp {^([0-9A-F]+)H$} [string trim $s] -> s] {set s [expr 0x$s]}
       set s
   }
   proc substituteSymbols {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [array names mem] {
           set tmp [lindex $mem($i) 0]
           foreach j [lrange $mem($i) 1 end] {
               if {[array names sym $j] eq $j} {set j $sym($j)}
               lappend tmp $j
           }
           set mem($i) $tmp
       }
   }
   proc dump {_sym} {
       variable mem
       upvar $_sym sym
       foreach i [lsort -integer [array names mem]] {
           puts [format "%04d %s" $i $mem($i)]
       }
       foreach i [lsort [array names sym]] {
           puts [format "%-10s: %04x" $i $sym($i)]
       }
   }
   proc run { {pc 255}} {
       variable mem
       foreach i {A B C D E Z} {set ::$i 0}
       while {$pc>=0} {
           incr pc
           #puts "$mem($pc)\tA:$::A B:$::B C:$::C D:$::D E:$::E Z:$::Z"
           eval $mem($pc)
       }
   }

"machine opcodes" implemented as procs

   proc ADD  {reg reg2}  {set ::Z [incr ::$reg [set ::$reg2]]}
   proc ADI  {reg value} {set ::Z [incr ::$reg $value]}
   proc CALL {name}      {[string tolower $name] $::A}
   proc DCR  {reg}       {set ::Z [incr ::$reg -1]}
   proc INR  {reg}       {set ::Z [incr ::$reg]}
   proc JMP  where       {uplevel 1 set pc [expr $where-1]}
   proc JNZ  where       {if $::Z {uplevel 1 JMP $where}}
   proc JZ   where       {if !$::Z {uplevel 1 JMP $where}}
   proc MOV  {reg adr}   {variable mem; set ::$reg $mem($adr)}
   proc MVI  {reg value} {set ::$reg $value}
}

Now testing:

asm::asm {
       org  100     ; the canonical start address in CP/M
       jmp  START   ; idiomatic: get over the initial variable(s)
DONE:  equ  0       ; warm start in CP/M ;-)
MAX:   equ  5
INCR:  db   2       ; a variable (though we won't vary it)
;; here we go...
START: mvi  c,MAX   ; set count limit
       mvi  a,0     ; initial value
       mov  b,INCR
LOOP:  call puts    ; for now, fall back to Tcl for I/O
       inr  a
       add  a,b     ; just to make adding 1 more complicated
       dcr  c       ; counting down..
       jnz  LOOP    ; jump on non-zero to LOOP
       jmp  DONE    ; end of program
       end
}

The mov b,INCR part is an oversimplification. For a real 8080, one would have to say

LXI H,INCR ; load double registers H+L with the address INCR
MOV B,M    ; load byte to register B from the address pointed to in HL

Since the pseudo-register M can also be used for writing back, it cannot be implemented by simply copying the value. Rather, one could use read and write traces on variable M, causing it to load from, or store to, mem($HL). Maybe another weekend...


Functional programming (Backus 1977)


John Backus turned 80 these days. For creating FORTRAN and the BNF style of language description, he received the ACM Turing Award in 1977. In his Turing Award lecture,

Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. (Comm. ACM 21.8, Aug. 1978, 613-641)

he developed an amazing framework for functional programming, from theoretical foundations to implementation hints, e.g. for installation, user privileges, and system self-protection. In a nutshell, his FP system comprises

  • a set O of objects (atoms or sequences)
  • a set F of functions that map objects into objects (f : O |-> O}
  • an operation, application (very roughly, eval)
  • a set FF of functional forms, used to combine functions or objects to form new functions in F
  • a set D of definitions that map names to functions in F

I'm far from having digested it all, but like so often, interesting reading prompts me to do Tcl experiments, especially on weekends. I started with Backus' first Functional Program example,

Def Innerproduct = (Insert +) o (ApplyToAll x) o Transpose

and wanted to bring it to life - slightly adapted to Tcl style, especially by replacing the infix operator "o" with a Polish prefix style:

Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}

Unlike procs or lambdas, more like APL or RPN, this definition needs no variables - it declares (from right to left) what to do with the input; the result of each step is the input for the next step (to the left of it). In an RPN language, the example might look like this:

/Innerproduct {Transpose * swap ApplyToAll + swap Insert} def

which has the advantage that execution goes from left to right, but requires some stack awareness (and some swaps to set the stack right ;^)

Implementing Def, I took an easy route by just creating a proc that adds an argument and leaves it to the "functional" to do the right thing (with some quoting heaven :-) }

proc Def {name = functional} {
   proc $name x "\[$functional\] \$x"
}

For functional composition, where, say for two functions f and g,

[{o f g} $x] == [f [g $x]]

again a proc is created that does the bracket nesting:

proc o args {
   set body return
   foreach f $args {append body " \[$f"}
   set name [info level 0]
   proc $name x "$body \$x [string repeat \] [llength $args]]"
   set name
}

Why Backus used Transpose on the input, wasn't first clear to me, but as he (like we Tclers) represents a matrix as a list of rows, which are again lists (also known as vectors), it later made much sense to me. This code for transposing a matrix uses the fact that variable names can be any string, including those that look like integers, so the column contents are collected into variables named 0 1 2 ... and finally turned into the result list:

proc Transpose matrix {
   set cols [iota [llength [lindex $matrix 0]]]
   foreach row $matrix {
       foreach element $row col $cols {
           lappend $col $element
       }
   }
   set res {}
   foreach col $cols {lappend res [set $col]}
   set res
}

An integer range generator produces the variable names, e.g iota 3 => {0 1 2}

proc iota n {
   set res {}
   for {set i 0} {$i<$n} {incr i} {lappend res $i}
   set res
}
  1. -- This "functional form" is mostly called map in more recent FP:
proc ApplyToAll {f list} {
   set res {}
   foreach element $list {lappend res [$f $element]}
   set res
}

...and Insert is better known as fold, I suppose. My oversimple implementation assumes that the operator is one that expr understands:

proc Insert {op arguments} {expr [join $arguments $op]}
  1. -- Prefix multiplication comes as a special case of this:
interp alias {} * {} Insert *
  1. -- Now to try out the whole thing:
Def Innerproduct = {o {Insert +} {ApplyToAll *} Transpose}
puts [Innerproduct {{1 2 3} {6 5 4}}]

which returns 28 just as Dr. Backus ordered (= 1*6 + 2*5 + 3*4). Ah, the joys of weekend Tcl'ing... - and belatedly, Happy Birthday, John! :)

Another example, cooked up by myself this time, computes the average of a list. For this we need to implement the construction operator, which is sort of inverse mapping - while mapping a function over a sequence of inputs produces a sequence of outputs of that function applied to each input, Backus' construction maps a sequence of functions over one input to produce a sequence of results of each function to that input, e.g.

[f,g](x) == <f(x),g(x)>

Of course I can't use circumfix brackets as operator name, so let's call it constr:

proc constr args {
   set functions [lrange $args 0 end-1]
   set x [lindex $args end]
   set res {}
   foreach f $functions {lappend res [eval $f [list $x]]}
   set res
}
  1. -- Testing:
Def mean = {o {Insert /} {constr {Insert +} llength}}
puts [mean {1 2 3 4 5}]

which returns correctly 3. However, as integer division takes place, it would be better to make that

proc double x {expr {double($x)}}
Def mean    = {o {Insert /} {constr {Insert +} dlength}}
Def dlength = {o double llength}
puts [mean {1 2 3 4}]

giving the correct result 2.5. However, the auxiliary definition for dlength cannot be inlined into the definition of mean - so this needs more work... But this version, that maps double first, works:

Def mean = {o {Insert /} {constr {Insert +} llength} {ApplyToAll double}}

One more experiment, just to get the feel:

Def hypot  = {o sqrt {Insert +} {ApplyToAll square}}
Def square = {o {Insert *} {constr id id}}
proc sqrt x {expr {sqrt($x)}}
proc id x   {set x}
puts [hypot {3 4}]

which gives 5.0. Compared to an RPN language, hypot would be

/hypot {dup * swap dup * + sqrt} def

which is shorter and simpler, but meddles more directly with the stack.

An important functional form is the conditional, which at Backus looks like

p1 -> f; p2 -> g; h

meaning, translated to Tcl,

if {[p1 $x]} then {f $x} elseif {[p2 $x]} then {g $x} else {h $x}

Let's try that, rewritten Polish-ly to:

cond p1 f p2 g h


proc cond args {
   set body ""
   foreach {condition function} [lrange $args 0 end-1] {
       append body "if {\[$condition \$x\]} {$function \$x} else"
   }
   append body " {[lindex $args end] \$x}"
   set name [info level 0]
   proc $name x $body
   set name
}
  1. -- Testing, with K in another role as Konstant function :)
Def abs = {cond {> 0} -- id}
proc > {a b} {expr {$a>$b}}
proc < {a b} {expr {$a<$b}}
proc -- x {expr -$x}
puts [abs -42],[abs 0],[abs 42]
Def sgn = {cond {< 0} {K 1} {> 0} {K -1} {K 0}}
proc K {a b} {set a}
puts [sgn 42]/[sgn 0]/[sgn -42]
  1. --Another famous toy example, reading a file's contents:
Def readfile = {o 1 {constr read close} open}
  1. --where Backus' selector (named just as integer) is here:
proc 1 x {lindex $x 0}

Reusable functional components

Say you want to make a multiplication table for an elementary school kid near you. Easily done in a few lines of Tcl code:

proc multable {rows cols} {
   set res ""
   for {set i 1} {$i <= $rows} {incr i} {
       for {set j 1} {$j <= $cols} {incr j} {
           append res [format %4d [expr {$i*$j}]]
       }
       append res \n
   }
   set res
}

The code does not directly puts its results, but returns them as a string - you might want to do other things with it, e.g. save it to a file for printing. Testing:

% multable 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

Or print the result directly from wish:

 catch {console show}
 puts "[multable 3 10]"

Here's a different way to do it à la functional programming:

proc multable2 {rows cols} {
   formatMatrix %4d [outProd * [iota 1 $rows] [iota 1 $cols]]
}

The body is nice and short, but consists of all unfamiliar commands. They are however better reusable than the multable proc above. The first formats a matrix (a list of lists to Tcl) with newlines and aligned columns for better display:

proc formatMatrix {fm matrix} {
   join [lmap row $matrix {join [lmap i $row {format $fm $i}] ""}] \n
}

Short again, and slightly cryptic, as is the "outer product" routine, which takes a function f and two vectors, and produces a matrix where f was applied to every pair of a x b - in APL they had special compound operators for this job, in this case "°.x":}

proc outProd {f a b} {
   lmap i $a {lmap j $b {$f $i $j}}
}

Again, lmap (the collecting foreach) figures prominently, so here it is in all its simplicity:

proc lmap {_var list body} {
   upvar 1 $_var var
   set res {}
   foreach var $list {lappend res [uplevel 1 $body]}
   set res
}
#-- We need multiplication from expr exposed as a function:
proc * {a b} {expr {$a * $b}}

#-- And finally, iota is an integer range generator:
proc iota {from to} {
   set res {}
   while {$from <= $to} {lappend res $from; incr from}
   set res
}

With these parts in place, we can see that multable2 works as we want:

% multable2 3 10
  1   2   3   4   5   6   7   8   9  10
  2   4   6   8  10  12  14  16  18  20
  3   6   9  12  15  18  21  24  27  30

So why write six procedures, where one did the job already? A matter of style and taste, in a way - multable is 10 LOC and depends on nothing but Tcl, which is good; multable2 describes quite concisely what it does, and builds on a few other procs that are highly reusable.

Should you need a unit matrix (where the main diagonal is 1, and the rest is 0), just call outProd with a different function (equality, ==):

% outProd == [iota 1 5] [iota 1 5]
{1 0 0 0 0} {0 1 0 0 0} {0 0 1 0 0} {0 0 0 1 0} {0 0 0 0 1}

which just requires expr's equality to be exposed too:

proc == {a b} {expr {$a == $b}}

One of the fascinations of functional programming is that one can do the job in a simple and clear way (typically a one-liner), while using a collection of reusable building-blocks like lmap and iota. And formatMatrix and outProd are so general that one might include them in some library, while the task of producing a multiplication table may not come up any more for a long time...


Modelling an RPN language


Tcl follows strictly the Polish notation, where an operator or function always precedes its arguments. It is however easy to build an interpreter for a language in Reverse Polish Notation (RPN) like Forth, Postscript, or Joy, and experiment with it.

The "runtime engine" is just called "r" (not to be confused with the R language), and it boils down to a three-way switch done for each word, in eleven lines of code:

  • "tcl" evaluates the top of stack as a Tcl script
  • known words in the ::C array are recursively evaluated in "r"
  • other words are just pushed

Joy's rich quoting for types ([list], {set}, "string", 'char) conflict with the Tcl parser, so lists in "r" are {braced} if their length isn't 1, and (parenthesized) if it is - but the word shall not be evaluated now. This looks better to me than /slashing as in Postscript.

As everything is a string, and to Tcl "a" is {a} is a , Joy's polymorphy has to be made explicit. I added converters between characters and integers, and between strings and lists (see the dictionary below). For Joy's sets I haven't bothered yet - they are restricted to the domain 0..31, probably implemented with bits in a 32-bit word.

Far as this is from Joy, it was mostly triggered by the examples in Manfred von Thun's papers, so I tongue-in-cheek still call it "Pocket Joy" - it was for me, at last, on the iPaq... The test suite at end should give many examples of what one can do in "r". }

proc r args {
   foreach a $args {
     dputs [info level]:$::S//$a
     if {$a eq "tcl"} {
             eval [pop]
     } elseif [info exists ::C($a)] {
             eval r $::C($a)
     } else {push [string trim $a ()]}
   }
   set ::S
}
  1. That's it. Stack (list) and Command array are global variables:
set S {}; unset C
  1. -- A tiny switchable debugger:
proc d+ {} {proc dputs s {puts $s}}
proc d- {}  {proc dputs args {}}
d- ;#-- initially, debug mode off

Definitions are in Forth style - ":" as initial word, as they look much more compact than Joy's DEFINE n == args;

proc : {n args} {set ::C($n) $args}

expr functionality is exposed for binary operators and one-arg functions:

proc 2op op {
   set t [pop]
   push [expr {[pop]} $op {$t}]
}
foreach op {+ - * / > >= != <= <} {: $op [list 2op $op] tcl}
: =    {2op ==} tcl
proc 1f  f {push [expr $f\([pop])]}
foreach f {abs double exp int sqrt sin cos acos tan} {: $f [list 1f $f] tcl}
interp alias {} pn {} puts -nonewline
#----- The dictionary has all one-liners:
: .      {pn "[pop] "} tcl
: .s    {puts $::S} tcl
: '      {push [scan [pop] %c]} tcl   ;# char -> int
: `     {push [format %c [pop]]} tcl  ;# int -> char
: and  {2op &&} tcl
: at     1 - swap {push [lindex [pop] [pop]]} tcl
: c      {set ::S {}} tcl ;# clear stack
: choice {choice [pop] [pop] [pop]} tcl
: cleave {cleave [pop] [pop] [pop]} tcl
: cons {push [linsert [pop] 0 [pop]]} tcl
: dup  {push [set x [pop]] $x} tcl
: dupd {push [lindex $::S end-1]} tcl
: emit {pn [format %c [pop]]} tcl
: even  odd not
: explode  {push [split [pop] ""]} tcl  ;# string -> char list
: fact  1 (*) primrec
: filter  split swap pop
: first  {push [lindex [pop] 0]} tcl
: fold  {rfold [pop] [pop] [pop]} tcl
: gcd  swap {0 >} {swap dupd rem swap gcd} (pop) ifte
: has  swap in
: i      {eval r [pop]} tcl
: ifte   {rifte [pop] [pop] [pop]} tcl
: implode  {push [join [pop] ""]} tcl ;# char list -> string
: in  {push [lsearch [pop] [pop]]} tcl 0 >=
: map  {rmap [pop] [pop]} tcl
: max  {push [max [pop] [pop]]} tcl
: min  {push [min [pop] [pop]]} tcl
: newstack  c
: not   {1f !} tcl
: odd  2 rem
: of  swap at
: or    {2op ||} tcl
: pop  (pop) tcl
: pred 1 -
: primrec {primrec [pop] [pop] [pop]} tcl
: product 1 (*) fold
: qsort (lsort) tcl
: qsort1 {lsort -index 0} tcl
: rem  {2op %} tcl
: rest  {push [lrange [pop] 1 end]} tcl
: reverse {} swap (swons) step
: set  {set ::[pop] [pop]} tcl
: $     {push [set ::[pop]]} tcl
: sign  {0 >}  {0 <} cleave -
: size  {push [llength [pop]]} tcl
: split  {rsplit [pop] [pop]} tcl
: step  {step [pop] [pop]} tcl
: succ  1 +
: sum   0 (+) fold
: swap  {push [pop] [pop]} tcl
: swons  swap cons
: xor  !=

Helper functions written in Tcl:

proc rifte {else then cond} {
   eval r dup $cond
   eval r [expr {[pop]? $then: $else}]
}
proc choice {z y x} {
   push [expr {$x? $y: $z}]
}
proc cleave { g f x} {
   eval [list r $x] $f [list $x] $g
}
proc max {x y} {expr {$x>$y?$x:$y}}
proc min {x y} {expr {$x<$y? $x:$y}}
proc rmap {f list} {
   set res {}
   foreach e $list {
      eval [list r $e] $f
      lappend res [pop]
   }
   push $res
}
proc step {f list} {
   foreach e $list {eval [list r ($e)] $f}
}
proc rsplit {f list} {
   foreach i {0 1} {set $i {}}
   foreach e $list {
      eval [list r $e] $f
      lappend [expr {!![pop]}] $e
   }
   push $0 $1
}
proc primrec {f init n} {
   if {$n>0} {
      push $n
      while {$n>1} {
          eval [list r [incr n -1]] $f
      }
   } else {push $init}
}
proc rfold {f init list} {
   push $init
   foreach e $list {eval [list r $e] $f}
}
  1. ------------------ Stack routines
proc push args {
  foreach a $args {lappend ::S $a}
}
proc pop {} {
   if [llength $::S] {
      K [lindex $::S end] \
         [set ::S [lrange $::S 0 end-1]]
   } else {error "stack underflow"}
}
proc K {a b} {set a}
  1. ------------------------ The test suite:
proc ? {cmd expected} {
   catch {uplevel 1 $cmd} res
   if {$res ne $expected} {puts "$cmd->$res, not $expected"}
}
? {r 2 3 +} 5
? {r 2 *}   10
? {r c 5 dup *} 25
: sqr dup *
: hypot sqr swap sqr + sqrt
? {r c 3 4 hypot} 5.0
? {r c {1 2 3} {dup *} map} { {1 4 9}}
? {r size} 3
? {r c {2 5 3} 0 (+) fold} 10
? {r c {3 4 5} product} 60
? {r c {2 5 3} 0 {dup * +} fold} 38
? {r c {1 2 3 4} dup sum swap size double /} 2.5
? {r c {1 2 3 4} (sum)  {size double} cleave /} 2.5
: if0 {1000 >} {2 /} {3 *} ifte
? {r c 1200 if0} 600
? {r c 600 if0}  1800
? {r c 42 sign}   1
? {r c 0 sign}     0
? {r c -42 sign} -1
? {r c 5 fact} 120
? {r c 1 0 and} 0
? {r c 1 0 or}   1
? {r c 1 0 and not} 1
? {r c 3 {2 1} cons} { {3 2 1}}
? {r c {2 1} 3 swons} { {3 2 1}}
? {r c {1 2 3} first} 1
? {r c {1 2 3} rest} { {2 3}}
? {r c {6 1 5 2 4 3} {3 >} filter} { {6 5 4}}
? {r c 1 2 {+ 20 * 10 4 -} i} {60 6}
? {r c 42 succ} 43
? {r c 42 pred} 41
? {r c {a b c d} 2 at} b
? {r c 2 {a b c d} of} b
? {r c 1 2 pop} 1
? {r c A ' 32 + succ succ `} c
? {r c {a b c d} reverse} { {d c b a}}
? {r c 1 2 dupd} {1 2 1}
? {r c 6 9 gcd} 3
? {r c true yes no choice} yes
? {r c false yes no choice} no
? {r c {1 2 3 4} (odd) split} { {2 4} {1 3}}
? {r c a {a b c} in} 1
? {r c d {a b c} in} 0
? {r c {a b c} b has} 1
? {r c {a b c} e has} 0
? {r c 3 4 max} 4
? {r c 3 4 min}  3
? {r c hello explode reverse implode} olleh
: palindrome dup explode reverse implode =
? {r c hello palindrome} 0
? {r c otto palindrome}  1
#-- reading (varname $) and setting (varname set) global Tcl vars
set tv 42
? {r c (tv) $ 1 + dup (tv) set} 43
? {expr $tv==43} 1

Tacit programming


The J programming language is the "blessed successor" to APL, where "every function is an infix or prefix operator", x?y (dyadic) or ?y (monadic), for ? being any pre- or user-defined function).

"Tacit programming" (tacit: implied; indicated by necessary connotation though not expressed directly) is one of the styles possible in J, and means coding by combining functions, without reference to argument names. This idea may have been first brought up in Functional programming (Backus 1977), if not in Forth and Joy, and it's an interesting simplification compared to the lambda calculus.

For instance, here's a breathtakingly short J program to compute the mean of a list of numbers:

mean=.+/%#

Let's chew this, byte by byte :)

=.   is assignment to a local variable ("mean") which can be called
+/%# is the "function body"
+    (dyadic) is addition
/    folds the operator on its left over the list on its right
+/   hence being the sum of a list
%    (dyadic) is division, going double on integer arguments when needed
#    (monadic) is tally, like Tcl's [llength] resp. [string length]

Only implicitly present is a powerful function combinator called "fork". When J parses three operators in a row, gfh, where f is dyadic and g and h are monadic, they are combined like the following Tcl version does:}

proc fork {f g h x} {$f [$g $x] [$h $x]}

if 0 {In other words, f is applied to the results of applying g and h to the single argument. Note that +/ is considered one operator, which applies the "adverb" folding to the "verb" addition (one might well call it "sum"). When two operands occur together, the "hook" pattern is implied, which might in Tcl be written as:}

proc hook {f g x} {$f $x [$g $x]}

if 0 {As KBK pointed out in the Tcl chatroom, the "hook" pattern corresponds to Schönfinkel/Curry's S combinator (see Hot Curry and Combinator Engine), while "fork" is called S' there.

Unlike in earlier years when I was playing APL, this time my aim was not to parse and emulate J in Tcl - I expected hard work for a dubitable gain, and this is a weekend fun project after all. I rather wanted to explore some of these concepts and how to use them in Tcl, so that in slightly more verbose words I could code (and call)

Def mean = fork /. sum llength

following Backus' FP language with the "Def" command. So let's get the pieces together. My "Def" creates an interp alias, which is a good and simple Tcl way to compose partial scripts (the definition, here) with one or more arguments, also known as "currying":}

proc Def {name = args} {eval [list interp alias {} $name {}] $args}

if 0 {The second parameter, "=", is for better looks only and evidently never used.

Testing early and often is a virtue, as is documentation - to make the following code snippets clearer, I tuned my little tester for better looks, so that the test cases in the source code also serve as well readable examples - they look like comments but are code! The cute name "e.g." was instigated by the fact that in J, "NB." is used as comment indicator, both being well known Latin abbreviations:}

proc e.g. {cmd -> expected} {
   catch {uplevel 1 $cmd} res
   if {$res != $expected} {puts "$cmd -> $res, not $expected"}
}

Again, the "->" argument is for eye-candy only - but it feels better to me at least. See the examples soon to come.

For recursive functions and other arithmetics, func makes better reading, by accepting expr language in the body:

proc func {name argl body} {proc $name $argl [list expr $body]}

We'll use this to turn expr's infix operators into dyadic functions, plus the "slashdot" operator that makes division always return a real number, hence the dot :}

foreach op {+ - * /} {func $op {a b} "\$a $op \$b"}
        e.g. {+ 1 2} -> 3
        e.g. {/ 1 2} -> 0        ;# integer division
func /. {a b} {double($a)/$b}
        e.g. {/. 1 2} -> 0.5     ;# "real" division
#-- Two abbreviations for frequently used list operations:
proc head list {lindex $list 0}
          e.g. {head {a b c}} -> a
proc tail list {lrange $list 1 end}
          e.g. {tail {a b c}} -> {b c}

if 0 {For "fold", this time I devised a recursive version:}

func fold {neutral op list} {
   $list eq [] ? $neutral
   : [$op [head $list] [fold $neutral $op [tail $list]]]
}
        e.g. {fold 0 + {1 2 3 4}} -> 10
#-- A "Def" alias does the same job:
Def sum = fold 0 +
        e.g. {sum      {1 2 3 4}} -> 10
#-- So let's try to implement "mean" in tacit Tcl!
Def mean = fork /. sum llength
         e.g. {mean {1 2 3 40}} -> 11.5

if 0 {Tacit enough (one might have picked fancier names like +/ for "sum" and # as alias for llength), but in principle it is equivalent to the J version, and doesn't name a single argument. Also, the use of llength demonstrates that any good old Tcl command can go in here, not just the artificial Tacit world that I'm just creating...

In the next step, I want to reimplement the "median" function, which for a sorted list returns the central element if its length is odd, or the mean of the two elements adjacent to the (virtual) center for even length. In J, it looks like this:

median=.(mean@:\{~medind@#)@sortu
medind=.((<.,>.)@half) ` half @.(2&|)
half=.-:@<:                        NB. halve one less than rt. argument
sortu=.\{~/:                       NB. sort upwards

which may better explain why I wouldn't want to code in J :^) J has ASCIIfied the zoo of APL strange character operators, at the cost of using braces and brackets as operators too, without regard for balancing, and extending them with dots and colons, so e.g.

-   monadic: negate; dyadic: minus
-.  monadic: not
-:  monadic: halve

J code sometimes really looks like an accident in a keyboard factory... I won't go into all details of the above code, just some:

@ ("atop") is strong linkage, sort of functional composition
<. (monadic) is floor()
>. (monadic) is ceil()

(<.,>.) is building a list of the floor and the ceiling of its single argument, the comma being the concatenation operator here, comparable to Backus' "construction" or Joy's cleave. The pattern

a ` b @. c

is a kind of conditional in J, which could in Tcl be written

if {[$c $x]} {$a $x} else {$b $x}

but my variant of the median algorithm doesn't need a conditional - for lists of odd length it just uses the central index twice, which is idempotent for "mean", even if a tad slower.

J's "from" operator \{ (without the backslash, but unbalanced braces are troublesome in Tcl code, which this page is) takes zero or more elements from a list, possibly repeatedly. For porting this, lmap is a good helper, even though not strictly functional:}

proc lmap {_v list body} {
   upvar 1 $_v v
   set res {}
   foreach v $list {lappend res [uplevel 1 $body]}
   set res
}
e.g. {lmap i {1 2 3 4} {* $i $i}} -> {1 4 9 16}
#-- So here's my 'from':
proc from {indices list} {lmap i $indices {lindex $list $i}}
          e.g. {from {1 0 0 2} {a b c}} -> {b a a c}

if 0 {We furtheron borrow some more content from expr:}

func ceil  x {int(ceil($x))}
func floor x {int(floor($x))}
   e.g. {ceil 1.5}  -> 2
   e.g. {floor 1.5} -> 1
   e.g. {fork list floor ceil 1.5} -> {1 2}

if 0 {We'll need functional composition, and here's a recursive de-luxe version that takes zero or more functions, hence the name o*:}

func o* {functions x} {
   $functions eq []? $x
   : [[head $functions] [o* [tail $functions] $x]]
}
e.g. {o* {} hello,world} -> hello,world

if 0 {Evidently, identity as could be written

proc I x {set x}

is the neutral element of variadic functional composition, when called with no functions at all.

If composite functions like 'fork' are arguments to o*, we'd better let unknown know that we want auto-expansion of first word:}

proc know what {proc unknown args $what\n[info body unknown]}
know {
   set cmd [head $args]
   if {[llength $cmd]>1} {return [eval $cmd [tail $args]]}
}

if 0 {Also, we need a numeric sort that's good for integers as well as reals ("Def" serves for all kinds of aliases, not just combinations of functions):}

Def sort = lsort -real
         e.g. {sort {2.718 10 1}} -> {1 2.718 10}
         e.g. {lsort {2.718 10 1}} -> {1 10 2.718} ;# lexicographic
#-- And now for the median test:
Def median = o* {mean {fork from center sort}}
Def center = o* {{fork list floor ceil} {* 0.5} -1 llength}
func -1 x {$x - 1}
        e.g. {-1 5} -> 4 ;# predecessor function, when for integers
#-- Trying the whole thing out:
e.g. {median {1 2 3 4 5}} -> 3
e.g. {median {1 2 3 4}}   -> 2.5

if 0 {As this file gets tacitly sourced, I am pretty confident that I've reached my goal for this weekend - even though my median doesn't remotely look like the J version: it is as "wordy" as Tcl usually is. But the admittedly still very trivial challenge was met in truly function-level style, concerning the definitions of median, center and mean - no variable left behind. And that is one, and not the worst, Tcl way of Tacit programming...

Vector arithmetics

APL and J (see Tacit programming) have the feature that arithmetics can be done with vectors and arrays as well as scalar numbers, in the varieties (for any operator @):

  • scalar @ scalar -> scalar (like expr does)
  • vector @ scalar -> vector
  • scalar @ vector -> vector
  • vector @ vector -> vector (all of same dimensions, element-wise)

Here's experiments how to do this in Tcl. First lmap is a collecting foreach - it maps the specified body over a list:}

proc lmap {_var list body} {
    upvar 1 $_var var
    set res {}
    foreach var $list {lappend res [uplevel 1 $body]}
    set res
}
  1. -- We need basic scalar operators from expr factored out:
foreach op {+ - * / % ==} {proc $op {a b} "expr {\$a $op \$b}"}

The following generic wrapper takes one binary operator (could be any suitable function) and two arguments, which may be scalars, vectors, or even matrices (lists of lists), as it recurses as often as needed. Note that as my lmap above only takes one list, the two-list case had to be made explicit with foreach.

proc vec {op a b} {
    if {[llength $a] == 1 && [llength $b] == 1} {
        $op $a $b
    } elseif {[llength $a]==1} {
        lmap i $b {vec $op $a $i}
    } elseif {[llength $b]==1} {
        lmap i $a {vec $op $i $b}
    } elseif {[llength $a] == [llength $b]} {
        set res {}
        foreach i $a j $b {lappend res [vec $op $i $j]}
        set res
    } else {error "length mismatch [llength $a] != [llength $b]"}
}

Tests are done with this minimal "framework":

proc e.g. {cmd -> expected} {
    catch $cmd res
    if {$res ne $expected} {puts "$cmd -> $res, not $expected"}
}

Scalar + Scalar

e.g. {vec + 1 2} -> 3

Scalar + Vector

e.g. {vec + 1 {1 2 3 4}} -> {2 3 4 5}

Vector / Scalar

e.g. {vec / {1 2 3 4} 2.} -> {0.5 1.0 1.5 2.0}

Vector + Vector

e.g. {vec + {1 2 3} {4 5 6}} -> {5 7 9}

Matrix * Scalar

e.g. {vec * {{1 2 3} {4 5 6}} 2} -> {{2 4 6} {8 10 12}}

Multiplying a 3x3 matrix with another:

e.g. {vec * {{1 2 3} {4 5 6} {7 8 9}} {{1 0 0} {0 1 0} {0 0 1}}} -> \

{{1 0 0} {0 5 0} {0 0 9}}

The dot product of two vectors is a scalar. That's easily had too, given a sum function:

proc sum list {expr [join $list +]+0}
sum [vec * {1 2} {3 4}]

should result in 11 (= (1*3)+(2*4)).

Here's a little application for this: a vector factorizer, that produces the list of divisors for a given integer. For this we again need a 1-based integer range generator:}

proc iota1 x {
    set res {}
    for {set i 1} {$i<=$x} {incr i} {lappend res $i}
    set res
}
e.g. {iota1 7}           -> {1 2 3 4 5 6 7}
  1. -- We can compute the modulo of a number by its index vector:
e.g. {vec % 7 [iota1 7]} -> {0 1 1 3 2 1 0}
  1. -- and turn all elements where the remainder is 0 to 1, else 0:
e.g. {vec == 0 [vec % 7 [iota1 7]]} -> {1 0 0 0 0 0 1}

if 0 {At this point, a number is prime if the sum of the latest vector is 2. But we can also multiply out the 1s with the divisors from the i ndex vector:}

e.g. {vec * [iota1 7] [vec == 0 [vec % 7 [iota1 7]]]} -> {1 0 0 0 0 0 7}
  1. -- Hence, 7 is only divisible by 1 and itself, hence it is a prime.
e.g. {vec * [iota1 6] [vec == 0 [vec % 6 [iota1 6]]]} -> {1 2 3 0 0 6}

if 0 {So 6 is divisible by 2 and 3; non-zero elements in (lrange $divisors 1 end-1) gives the "proper" divisors. And three nested calls to vec are sufficient to produce the divisors list :)

Just for comparison, here's how it looks in J:

   iota1=.>:@i.
   iota1 7
1 2 3 4 5 6 7
   f3=.iota1*(0&=@|~iota1)
   f3 7
1 0 0 0 0 0 7
   f3 6
1 2 3 0 0 6

Let unknown know

To extend Tcl, i.e. to make it understand and do things that before raised an error, the easiest way is to write a proc. Any proc must however be called in compliance with Tcl's fundamental syntax: first word is the command name, then the arguments separated by whitespace. Deeper changes are possible with the unknown command, which is called if a command name is, well, unknown, and in the standard version tries to call executables, to auto-load scripts, or do other helpful things (see the file init.tcl). One could edit that file (not recommended), or rename unknown to something else and provide one's own unknown handler, that falls through to the original proc if unsuccessful, as shown in Radical language modification.

Here is a simpler way that allows to extend unknown "in place" and incrementally: We let unknown "know" what action it shall take under what conditions. The know command is called with a condition that should result in an integer when given to expr, and a body that will be executed if cond results in nonzero, returning the last result if not terminated with an explicit return. In both cond and body you may use the variable args that holds the problem command unknown was invoked with.

proc know what {
   if ![info complete $what] {error "incomplete command(s) $what"}
   proc unknown args $what\n[info body unknown]
} ;# RS

The extending code what is prepended to the previous unknown body. This means that subsequent calls to know stack up, last condition being tried first, so if you have several conditions that fire on the same input, let them be "known" from generic to specific.

Here's a little debugging helper, to find out why "know" conditions don't fire:

proc know? {} {puts [string range [info body unknown] 0 511]}

Now testing what new magic this handful of code allows us to do. This simple example invokes expr if the "command" is digestible for it:

% know {if {![catch {expr $args} res]} {return $res}}
% 3+4
7

If we had no if

Imagine the makers of Tcl had failed to provide the if command. All the rest would be there. Doing more steps towards functional programming, I came upon this interesting problem, and will shortly demonstrate that it can easily be solved in pure-Tcl.

We still have the canonical truth values 0 and 1 as returned from expr with a comparison operator. The idea in the paper I read is to use them as names of very simple functions:}

proc 0 {then else} {uplevel 1 $else}
proc 1 {then else} {uplevel 1 $then} ;# the famous K combinator

Glory be to the 11 rules of man Tcl that this is already a crude though sufficient reimplementation:

set x 42
[expr $x<100] {puts Yes} {puts No}

The bracketed expr command is evaluated first, returning 0 or 1 as result of the comparison. This result (0 or 1) is substituted for the first word of this command. The other words (arguments) are not substituted because they're curly-braced, so either 0 or 1 is invoked, and does its simple job. (I used uplevel instead of eval to keep all side effects in caller's scope). Formally, what happened to the bracketed call is that it went through "applicative order" evaluation (i.e., do it now), while the braced commands wait for "normal order" evaluation (i.e., do when needed, maybe never - the need is expressed through eval/upvar or similar commands).

Though slick at first sight, we actually have to type more. As a second step, we create the If command that wraps the expr invocation:

proc If {cond then else} {
   [uplevel 1 [list expr ($cond)!=0]] {uplevel 1 $then} {uplevel 1 $else}
}
If {$x>40} {puts Indeed} {puts "Not at all"}

This again passes impromptu tests, and adds the feature that any non-zero value counts as true and returns 1 - if we neglect the other syntactic options of if, especially the elseif chaining. However, this is no fundamental problem - consider that

if A then B elseif C then D else E

can be rewritten as

if A then B else {if C then D else E}

so the two-way If is about as mighty as the real thing, give or take a few braces and redundant keywords (then, else).

Luckily we have an if in Tcl (and it certainly fares better in byte-code compilation), but on leisurely evenings it's not the microseconds that count (for me at least) - it's rather reading on the most surprising (or fundamental) ideas, and demonstrating how easily Tcl can bring them to life...

Brute force meets Goedel

Never afraid of anything (as long as everything is a string), a discussion in the Tcl chatroom brought me to try the following: let the computer write ("discover") its own software, only given specifications of input and output. In truly brute force, up to half a million programs are automatically written and (a suitable subset of them) tested to find the one that passes the tests.

To make things easier, this flavor of "software" is in a very simple RPN language similar to, but much smaller than, the one presented in Playing bytecode: stack-oriented like Forth, each operation being one byte (ASCII char) wide, so we don't even need whitespace in between. Arguments are pushed on the stack, and the result of the "software", the stack at end, is returned. For example, in

ebc ++ 1 2 3

execution of the script "++" should sum its three arguments (1+(2+3)), and return 6.

Here's the "bytecode engine" (ebc: execute byte code), which retrieves the implementations of bytecodes from the global array cmd:}

proc ebc {code argl} {
   set ::S $argl
   foreach opcode [split $code ""] {
       eval $::cmd($opcode)
   }
   set ::S
}

Let's now populate the bytecode collection. The set of all defined bytecodes will be the alphabet of this little RPN language. It may be interesting to note that this language has truly minimal syntax - the only rule is: each script ("word") composed of any number of bytecodes is well-formed. It just remains to check whether it does what we want.

Binary expr operators can be treated generically:}

foreach op {+ - * /} {
   set cmd($op) [string map "@ $op" {swap; push [expr {[pop] @ [pop]}]}]
}
#-- And here's some more hand-crafted bytecode implementations
set cmd(d) {push [lindex $::S end]} ;# dup
set cmd(q) {push [expr {sqrt([pop])}]}
set cmd(^) {push [swap; expr {pow([pop],[pop])}]}
set cmd(s) swap
#-- The stack routines imply a global stack ::S, for simplicity
interp alias {} push {} lappend ::S
proc pop {}  {K [lindex $::S end] [set ::S [lrange $::S 0 end-1]]}
proc K {a b} {set a}
proc swap {} {push [pop] [pop]}

Instead of enumerating all possible bytecode combinations beforehand (which grows exponentially by alphabet and word length), I use this code from Mapping words to integers to step over their sequence, uniquely indexed by an increasing integer. This is something like the Goedel number of the corresponding code. Note that with this mapping, all valid programs (bytecode sequences) correspond to one unique non-negative integer, and longer programs have higher integers associated:

proc int2word {int alphabet} {
   set word ""
   set la [llength $alphabet]
   while {$int > 0} {
       incr int -1
       set word  [lindex $alphabet [expr {$int % $la}]]$word
       set int   [expr {$int/$la}]
   }
   set word
}

Now out for discovery! The toplevel proc takes a paired list of inputs and expected output. It tries in brute force all programs up to the specified maximum Goedel number and returns the first one that complies with all tests:

proc discover0 args {
   set alphabet [lsort [array names ::cmd]]
   for {set i 1} {$i<10000} {incr i} {
       set code [int2word $i $alphabet]
       set failed 0
       foreach {inputs output} $args {
           catch {ebc $code $inputs} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But iterating over many words is still pretty slow, at least on my 200 MHz box, and many useless "programs" are tried. For instance, if the test has two inputs and wants one output, the stack balance is -1 (one less out than in). This is provided e.g. by one the binary operators +-*/. But the program "dd" (which just duplicates the top of stack twice) has a stack balance of +2, and hence can never pass the example test. So, on a morning dogwalk, I thought out this strategy:

  • measure the stack balance for each bytecode
  • iterate once over very many possible programs, computing their stack balance
  • partition them (put into distinct subsets) by stack balance
  • perform each 'discovery' call only on programs of matching stack balance

Here's this version. Single bytecodes are executed, only to measure their effect on the stack. The balance of longer programs can be computed by just adding the balances of their individual bytecodes:

proc bc'stack'balance bc {
   set stack {1 2} ;# a bytecode will consume at most two elements
   expr {[llength [ebc $bc $stack]]-[llength $stack]}
}
proc stack'balance code {
   set res 0
   foreach bc [split $code ""] {incr res $::balance($bc)}
   set res
}

The partitioning will run for some seconds (depending on nmax - I tried with several ten thousand), but it's needed only once. The size of partitions is further reduced by excluding programs which contain redundant code, that will have no effect, like swapping the stack twice, or swapping before an addition or multiplication. A program without such extravaganzas is shorter and yet does the same job, so it will have been tested earlier anyway.

proc partition'programs nmax {
   global cmd partitions balance
   #-- make a table of bytecode stack balances
   set alphabet [array names cmd]
   foreach bc $alphabet {
       set balance($bc) [bc'stack'balance $bc]
   }
   array unset partitions ;# for repeated sourcing
   for {set i 1} {$i<=$nmax} {incr i} {
       set program [int2word $i $alphabet]
       #-- "peephole optimizer" - suppress code with redundancies
       set ok 1
       foreach sequence {ss s+ s*} {
           if {[string first $sequence $program]>=0} {set ok 0}
       }
       if {$ok} {
           lappend partitions([stack'balance $program]) $program
       }
   }
   set program ;# see how far we got
}

The discoverer, Second Edition, determines the stack balance of the first text, and tests only those programs of the same partition:

proc discover args {
   global partitions
   foreach {in out} $args break
   set balance [expr {[llength $out]-[llength $in]}]
   foreach code $partitions($balance) {
       set failed 0
       foreach {input output} $args {
           catch {ebc $code $input} res
           if {$res != $output} {incr failed; break}
       }
       if {!$failed} {return $code}
   }
}

But now for the trying. The partitioning helps very much in reducing the number of candidates. For the 1000 programs with Goedel numbers 1..1000, it retains only a fraction for each stack balance:

-2: 75 
-1: 155 (this and 0 will be the most frequently used) 
0: 241 
1: 274 
2: 155 
3: 100

Simple starter - discover the successor function (add one):

% discover 5 6  7 8
dd/+

Not bad: duplicate the number twice, divide by itself to get the constant 1, and add that to the original number. However, it fails to work if we add the successor of 0 as another test case:

% discover 5 6  7 8  0 1

Nothing coming - because zero division made the last test fail. If we give only this test, another solution is found:

% discover 0 1
d^

"Take x to the x-th" power" - pow(0,0) gives indeed 1, but that's not the generic successor function.

More experiments to discover the hypot() function:

% discover {4 3} 5
d/+

Hm - the 3 is duplicated, divided by itself (=1), which is added to 4. Try to swap the inputs:

% discover {3 4} 5
q+

Another dirty trick: get square root of 4, add to 3 - presto, 5. The correct hypot() function would be

d*sd*+q

but my program set (nmax=30000) ends at 5-byte codes, so even by giving another test to force discovery of the real thing, it would never reach a 7-byte code. OK, I bite the bullet, set nmax to 500000, wait 5 minutes for the partitioning, and then:

% discover {3 4} 5  {11 60}  61
sd/+

Hm.. cheap trick again - it was discovered that the solution is just the successor of the second argument. Like in real life, test cases have to be carefully chosen. So I tried with another a^2+b^2=c^2 set, and HEUREKA! (after 286 seconds):

% discover {3 4} 5  {8 15} 17
d*sd*+q

After partitioning, 54005 programs had the -1 stack balance, and the correct result was on position 48393 in that list...

And finally, with the half-million set of programs, here's a solution for the successor function too:

% discover  0 1  4711 4712
ddd-^+

"d-" subtracts top of stack from itself, pushing 0; the second duplicate to the 0-th power gives 1, which is added to the original argument. After some head-scratching, I find it plausible, and possibly it is even the simplest possible solution, given the poorness of this RPN language.

Lessons learned:

  • Brute force is simple, but may demand very much patience (or faster hardware)
  • The sky, not the skull is the limit what all we can do with Tcl :)

Bare-bones OO

OO (Object Orientation) is a style in programming languages popular since Smalltalk, and especially C++, Java, etc. For Tcl, there have been several OO extensions/frameworks (incr Tcl, XOTcl, stooop, Snit to name a few) in different flavors, but none can be considered as standard followed by a majority of users. However, most of these share the features

  • classes can be defined, with variables and methods
  • objects are created as instances of a class
  • objects are called with messages to perform a method

Quite a bunch of what is called OO can be done in pure Tcl without a "framework", only that the code might look clumsy and distracting. Just choose how to implement instance variables:

  • in global variables or namespaces
  • or just as parts of a transparent value, with TOOT

The task of frameworks, be they written in Tcl or C, is just to hide away gorey details of the implementation - in other words, sugar it :) On the other hand, one understands a clockwork best when it's outside the clock, and all parts are visible - so to get a good understanding of OO, it might be most instructive to look at a simple implementation.

As an example, here's a Stack class with push and pop methods, and an instance variable s - a list that holds the stack's contents:

namespace eval Stack {set n 0}
proc Stack::Stack {} { #-- constructor
  variable n
  set instance [namespace current]::[incr n]
  namespace eval $instance {variable s {}}
  interp alias {} $instance {} ::Stack::do $instance
}

The interp alias makes sure that calling the object's name, like

::Stack::1 push hello

is understood and rerouted as a call to the dispatcher below:

::Stack::do ::Stack::1 push hello

The dispatcher imports the object's variables (only s here) into local scope, and then switches on the method name:

proc Stack::do {self method args} { #-- Dispatcher with methods
  upvar #0 ${self}::s s
  switch -- $method {
      push {eval lappend s $args}
      pop  {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
      }
      default {error "unknown method $method"}
  }
}
proc K {a b} {set a}

A framework would just have to make sure that the above code is functionally equivalent to, e.g. (in a fantasy OO style):

class Stack {
   variable s {}
   method push args {eval lappend s $args}
   method pop {} {
          if ![llength $s] {error "stack underflow"}
          K [lindex $s end] [set s [lrange $s 0 end-1]]
   }
}

which, I admit, reads definitely better. But bare-bones has its advantages too: in order to see how a clockwork works, you'd better have all parts visible :)

Now testing in an interactive tclsh:

% set s [Stack::Stack] ;#-- constructor
::Stack::1             ;#-- returns the generated instance name
% $s push hello
hello
% $s push world
hello world
% $s pop
world
% $s pop
hello
% $s pop
stack underflow       ;#-- clear enough error message
% namespace delete $s ;#-- "destructor"

A little deterministic Turing machine

At university, I never learned much about Turing machines. Only decades later, a hint in the Tcl chatroom pointed me to http://csc.smsu.edu/~shade/333/project.txt , an assignment to implement a Deterministic Turing Machine (i.e. one with at most one rule per state and input character), which gives clear instructions and two test cases for input and output, so I decided to try my hand in Tcl.

Rules in this little challenge are of the form a bcD e, where

  • a is the state in which they can be applied
  • b is the character that must be read from tape if this rule is to apply
  • c is the character to write to the tape
  • D is the direction to move the tape after writing (R(ight) or L(eft))
  • e is the state to transition to after the rule was applied

Here's my naive implementation, which takes the tape just as the string it initially is. I only had to take care that when moving beyond its ends, I had to attach a space (written as _) on that end, and adjust the position pointer when at the beginning. Rules are also taken as strings, whose parts can easily be extracted with string index - as it's used so often here, I alias it to @. }

proc dtm {rules tape} {
   set state 1
   set pos 0
   while 1 {
       set char [@ $tape $pos]
       foreach rule $rules {
           if {[@ $rule 0] eq $state && [@ $rule 2] eq $char} {
               #puts rule:$rule,tape:$tape,pos:$pos,char:$char
               #-- Rewrite tape at head position.
               set tape [string replace $tape $pos $pos [@ $rule 3]]
               #-- Move tape Left or Right as specified in rule.
               incr pos [expr {[@ $rule 4] eq "L"? -1: 1}]
               if {$pos == -1} {
                   set pos 0
                   set tape _$tape
               } elseif {$pos == [string length $tape]} {
                   append tape _
               }
               set state [@ $rule 6]
               break
           }
       }
       if {$state == 0} break
   }
   #-- Highlight the head position on the tape.
   string trim [string replace $tape $pos $pos \[[@ $tape $pos]\]] _
}
interp alias {} @ {} string index

Test data from http://csc.smsu.edu/~shade/333/project.txt

set rules {
   {1 00R 1}
   {2 01L 0}
   {1 __L 2}
   {2 10L 2}
   {2 _1L 0}
   {1 11R 1}
}
set tapes {
   0
   10011
   1111
}
set rules2 {
   {3 _1L 2}
   {1 _1R 2}
   {1 11L 3}
   {2 11R 2}
   {3 11R 0}
   {2 _1L 1}
}
set tapes2 _

Testing:

foreach tape $tapes {puts [dtm $rules $tape]}
puts *
puts [dtm $rules2 $tapes2]

if 0 {reports the results as wanted in the paper, on stdout:

>tclsh turing.tcl
[_]1
1[0]100
[_]10000
*
1111[1]1


Streams

Streams are a powerful concept in (not only functional) programming. In SICP chapter 3.5, streams are introduced as data structures characterized as "delayed lists", whose elements are produced and returned only on demand (deferred evaluation). This way, a stream can promise to be a potentially endless source of data, while taking only finite time to process and deliver what's really wanted. Other streams may provide a finite but very large number of elements, which would be impractical to process in one go. In Tcl, the two ways of reading a file are a good example:

  • read $fp returns the whole contents, which then can be processed;
  • while {[gets $fp line]>-1} {...} reads line by line, interleaved with processing

The second construct may be less efficient, but is robust for gigabyte-sized files. A simpler example is pipes in Unix/DOS (use TYPE for cat there):

  cat foo.bar | more

where the "cat" delivers lines of the file as long as "more" will take them, and waits otherwise (after all, stdin and stdout are just streams...). Such process chains can be emulated in Tcl with the following rules:

A stream is modelled here as a procedure that returns one stream item on each call. The special item "" (the empty string) indicates that the stream is exhausted. Streams are interesting if they don't deliver the same result on every call, which requires them to maintain state between calls e.g. in static variables (here implemented with the fancy remember proc) - examples are intgen that delivers ever increasing integers, or gets $fp where the file pointer advances at each call, so potentially all lines of the file are returned over time.

A filter takes one or more streams, and possibly other arguments, and reacts like a stream too. Hence, streams can be (and typically are) nested for processing purposes. If a filter meets end-of-stream, it should return that too. Filters may be characterized as "selectors" (who may return only part of their input, like "grep") and/or "appliers" who call a command on their input and return the result. Note that on infinite streams, selectors may never return, e.g. if you want the second even prime... Streams in general should not be written in brackets (then the Tcl parser would eagerly evaluate them before evaluating the command), but braced, and stream consumers eval the stream at their discretion.

Before we start, a word of warning: maintaining state of a procedure is done with default arguments that may be rewritten. To prevent bugs from procedures whose defaults have changed, I've come up with the following simple architecture - procs with static variables are registered as "sproc"s, which remembers the initial defaults, and with a reset command you can restore the initial values for one or all sprocs:

proc sproc {name head body} {
   set ::sproc($name) $head
   proc $name $head $body
}
proc reset { {what *}} {
   foreach name [array names ::sproc $what] {
       proc $name $::sproc($name) [info body $name]
   }
}

Now let's start with a simple stream source, "cat", which as a wrapper for gets returns the lines of a file one by one until exhausted (EOF), in which case an empty string is returned (this requires that empty lines in the files, which would look similarly, are represented as a single blank):

sproc cat {filename {fp {}} } {
   if {$fp==""} {
       remember fp [set fp [open $filename]]
   }
   if {[gets $fp res]<0} {
       remember fp [close $fp] ;# which returns an empty string ;-)
   } elseif {$res==""} {set res " "} ;# not end of stream!
   set res
}
proc remember {argn value} {
   # - rewrite a proc's default arg with given value
   set procn [lindex [info level -1] 0] ;# caller's name
   set argl {}
   foreach arg [info args $procn] {
       if [info default $procn $arg default] {
           if {$arg==$argn} {set default $value}
           lappend argl [list $arg $default]
       } else {
           lappend argl $arg
       }
   }
   proc $procn $argl [info body $procn]
   set value
}
# This simple but infinite stream source produces all positive integers:
sproc intgen { {seed -1}} {remember seed [incr seed]}
# This produces all (well, very many) powers of 2:
sproc powers-of-2 { {x 0.5}} {remember x [expr $x*2]}
# A filter that reads and displays a stream until user stops it:
proc more {stream} {
   while 1 {
       set res [eval $stream]
       if {$res==""} break ;# encountered end of stream
       puts -nonewline $res; flush stdout
       if {[gets stdin]=="q"} break
   }
}

Usage example:

more {cat streams.tcl}

which crudely emulates the Unix/DOS pipe mentioned above (you'll have to hit <Return> after every line, and "q<Return>" to quit..). more is the most important "end-user" of streams, especially if they are infinite. Note however that you need stdin for this implementation, which excludes wishes on Windows (one might easily write a UI-more that reacts on mouse clicks, though).

A more generic filter takes a condition and a stream, and on each call returns an element of the input stream where the condition holds - if ever one comes along:

proc filter {cond stream} {
   while 1 {
       set res [eval $stream]
       if {$res=="" || [$cond $res]} break
   }
   set res
}
# Here is a sample usage with famous name:
proc grep {re stream} {
   filter [lambda [list x [list re $re]] {regexp $re $x}] $stream
}
#.... which uses the (less) famous function maker:
proc lambda {args body} {
   set name [info level 0]
   proc $name $args $body
   set name
}
# Usage example: more {grep this {cat streams.tcl}}

Friends of syntactic sugar might prefer shell style:

$ cat streams.tcl | grep this | more

and guess what, we can have that in Tcl too (and not in Scheme !-), by writing a proc, that also resets all sprocs, with the fancy name "$" (in Unix, this could be the shell prompt that you don't type, but for Tcl we always have to have the command name as first word):

proc $ args {
    reset
    set cmd {}
    foreach arg $args {
       if {$arg != "|"} {
           lappend tmp $arg
       } else {
           set cmd [expr {$cmd==""? $tmp: [lappend tmp $cmd]}]
           set tmp {}
       }
   }
   uplevel 1 [lappend tmp $cmd]
}

To prove that we haven't cheated by using exec, let's introduce a line counter filter:

sproc -n {stream {n 0}} {
   set res [eval $stream]
   if {$res!=""} {set res [remember n [incr n]]:$res}
}

This can be added to filter chains, to count lines in the original file, or only the results from grep:

$ cat streams.tcl | -n | grep this | more
$ cat streams.tcl | grep this | -n | more

We further observe that more has a similar structure to filter, so we could also rewrite it in terms of that:

proc more2 stream {
   filter [lambda x {
       puts -nonewline $x; flush stdout
       expr {[gets stdin]=="q"}
   }] $stream
}
# Here is another stream producer that returns elements from a list:
sproc streamlist {list {todo {}} {firstTime 1} } {
   if $firstTime {set todo $list; remember firstTime 0}
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}
# This one repeats its list endlessly, so better use it with 'more':
sproc infinite-streamlist {list {todo {}} } {
   initially todo $list
   remember  todo [lrange $todo 1 end]
   lindex   $todo 0
}
# This is sugar for first-time assignment of static variables:
proc initially {varName value} {
   upvar 1 $varName var
   if {$var==""} {set var $value}
}
# But for a simple constant stream source, just use [subst]:
# more {subst 1} ;# will produce as many ones as you wish
# This filter collects its input (should be finite ;-) into a list:
proc collect stream {
   set res {}
   while 1 {
       set element [eval $stream]
       if {$element==""} break
       lappend res $element
   }
   set res
}

The sort filter is unusual in that it consumes its whole (finite!) input, sorts it, and acts as a stream source on the output:

sproc sort {stream {todo {}} {firstTime 1}} {
   if $firstTime {
       set todo [lsort [collect $stream]]
       remember firstTime 0
   }
   remember todo [lrange $todo 1 end]
   lindex $todo 0
}
# $ streamlist {foo bar grill a} | sort | collect => a bar foo grill
proc apply {f stream} {$f [eval $stream]}
#... This can be plugged into a filter chain to see what's going on:
proc observe stream {apply [lambda y {puts $y; set y}] $stream}
# ... or, to get a stream of even numbers, starting from 0:
more {apply [lambda x {expr $x*2}] intgen}

Now for the example in SICP: find the second prime in the interval between 10000 and 1000000.

sproc interval {from to {current {}} } {
   initially current $from
   if {$current<=$to} {
       remember current [expr $current+1]
   }
}
proc prime? x {
   if {$x<2} {return 0}
   set max [expr sqrt($x)]
   set try 2
   while {$try<=$max} {
       if {$x%$try == 0} {return 0}
       incr try [expr {2-($try==2)}]
   }
   return 1
}
proc stream-index {stream index} {
   for {set i 0} {$i<=$index} {incr i} {
       set res [eval $stream]
   }
   set res
}
sproc stream-range {stream from to {pos 0}} {
   while {$pos<$from} {
       set res [eval $stream] ;# ignore elements before 'from'
       if {$res==""} return   ;# might be end-of-stream
       incr pos
   }
   if {$to!="end" && $pos > $to} return
   remember pos [incr pos]
   eval $stream
}
stream-index {filter prime? {interval 10000 1000000}} 1 ==> 10009

Another idea from SICP is a "smoothing" function, that averages each pair of values from the input stream. For this we need to introduce a short-term memory also in the filter:

sproc average {stream {previous {}} } {
   if {$previous==""} {set previous [eval $stream]}
   remember previous [set current [eval $stream]]
   if {$current!=""} {expr {($previous+$current)/2.}}
}

which, tested on a n-element stream, returns n-1 averages:

collect {average {streamlist {1 2 3 4 5}}} ==> 1.5 2.5 3.5 4.5

Yet another challenge was to produce an infinite stream of pairs {i j} of positive integers, i <= j, ordered by their sum, so that more pairs produces consecutively

{1 1} {1 2} {1 3} {2 2} {1 4} {2 3} {1 5} {2 4} {3 3} {1 6} ...

Here's my solution which does that:

sproc pairs { {last {}} } {
   if {$last==""} {
       set last [list 1 1] ;# start of iteration
   } else {
       foreach {a b} $last break
       if {$a >= $b-1} {
           set last [list 1 [expr {$a+$b}]] ;# next sum level
       } else {
           set last [list [incr a] [incr b -1]]
       }
   }
   remember last $last
}

Ramanujan numbers: The pairs generator can be used to find Ramanujan numbers, which can be represented as the sum of two integer cubes in more than one way. Here I use a global array for recording results:

sproc Ramanujan {stream {firstTime 1}} {
   if $firstTime {unset ::A; remember firstTime 0}
   while 1 {
       set pair [eval $stream]
       foreach {a b} $pair break
       set n [expr {$a*$a*$a + $b*$b*$b}]
       if [info exists ::A($n)] {
           lappend ::A($n) $pair
           break
       } else {set ::A($n) [list $pair]}
   }
   list $n $::A($n)
}
more {Ramanujan pairs} ;# or: $ pairs | Ramanujan | more

delivers in hardly noticeable time the R. numbers 1729, 4104, 13832... Or, how's this infinite Fibonacchi number generator, which on more fibo produces all the F.numbers (0,1,1,2,3,5,8,13,21...) you might want?

sproc fibo { {a ""} {b ""}} {
   if {$a==""} {
       remember a 0
   } elseif {$b==""} {
       remember b 1
   } else {
       if {$b > 1<<30} {set b [expr double($b)]}
       remember a $b
       remember b [expr $a+$b]
   }
}

Discussion: With the above code, it was possible to reproduce quite some behavior of streams as documented in SICP, not as data structures but with Tcl procs (though procs are data too, in some sense...). What's missing is the capability to randomly address parts of a stream, as is possible in Scheme (and of course their claim to do without assignment, or mutable data...) Tcl lists just don't follow LISP's CAR/CDR model (though KBK demonstrated in Tcl and LISP that this structure can be emulated, also with procs), but rather C's flat *TclObject[] style. The absence of lexical scoping also led to constructs like sproc/reset, which stop a gap but aren't exactly elegant - but Tcl's clear line between either local or global variables allows something like closures only by rewriting default arguments like done in remember (or like in Python).

Don't take this as a fundamental critique of Tcl, though - its underlying model is far more simple and elegant than LISP's (what with "special forms", "reader macros"...), and yet powerful enough to do just about everything possible...

Playing with Laws of Form

After many years, I re-read

G. Spencer-Brown, "Laws of Form". New York: E.P. Dutton 1979

which is sort of a mathematical thriller, if you will. Bertrand Russell commented that the author "has revealed a new calculus, of great power and simplicity" (somehow sounds like Tcl ;^). In a very radical simplification, a whole world is built up by two operators, juxtaposition without visible symbol (which could be likened to or) and a overbar-hook (with the meaning of not) that I can't type here - it's a horizontal stroke over zero or more operands, continued at right by a vertical stroke going down to the baseline. In these Tcl experiments, I use "" for "" and angle-brackets <> for the overbar-hook (with zero or more operands in between).

One point that was new for me is that the distinction between operators and operands is not cast in stone. Especially constants (like "true" and "false" in Boolean algebras) can be equally well expressed as neutral elements of operators, if these are considered variadic, and having zero arguments. This makes sense, even in Tcl, where one might implement them as

proc and args {
   foreach arg $args {if {![uplevel 1 expr $arg]} {return 0}}
   return 1
}
proc or args {
   foreach arg $args {if {[uplevel 1 expr $arg]} {return 1}}
   return 0
}

which, when called with no arguments, return 1 or 0, respectively. So [or] == 0 and [and] == 1. In Spencer-Brown's terms, [] (which is "", the empty string with no arguments) is false ("nil" in LISP), and [<>] is the negation of "", i.e. true. His two axioms are:

<><> == <> "to recall is to call       -- (1 || 1) == 1"
<<>> ==    "to recross is not to cross -- !!0 == 0"

and these can be implemented by a string map that is repeated as long as it makes any difference (sort of a trampoline) to simplify any expression consisting only of operators and constants (which are operators with zero arguments):

proc lf'simplify expression {
   while 1 {
       set res [string map {<><> <> <<>> ""} $expression]
       if {$res eq $expression} {return $res}
       set expression $res
   }
}

Testing:

% lf'simplify <<><>><>
<>

which maps <><> to <>, <<>> to "", and returns <> for "true".

% lf'simplify <a>a
<a>a

In the algebra introduced here, with a variable "a", no further simplification was so far possible. Let's change that - "a" can have only two values, "" or <>, so we might try to solve the expression by assuming all possible values for a, and see if they differ. If they don't, we have found a fact that isn't dependent on the variable's value, and the resulting constant is returned, otherwise the unsolved expression:

proc lf'solve {expression var} {
   set results {}
   foreach value {"" <>} {
       set res [lf'simplify [string map [list $var $value] $expression]]
       if {![in $results $res]} {lappend results $res}
       if {[llength $results] > 1} {return $expression}
   }
   set results
}

with a helper function in that reports containment of an element in a list:

proc in {list element} {expr {[lsearch -exact $list $element] >= 0}}

Testing:

% lf'solve <a>a a
<>

which means, in expr terms, {(!$a || $a) == 1}, for all values of a. In other words, a tautology. All of Boole's algebra can be expressed in this calculus:

* (1) not a       == !$a       == <a>
* (2) a or b      == $a || $b  == ab
* (3) a and b     == $a && $b  == <<a>>
* (4) a implies b == $a <= $b  == <a>b

We can test it with the classic "ex contradictione quodlibet" (ECQ) example - "if p and not p, then q" for any q:

% lf'solve <<p><<p>>>q p
q

So formally, q is true, whatever it is :) If this sounds overly theoretic, here's a tricky practical example in puzzle solving, Lewis Carroll's last sorites (pp. 123f.). The task is to conclude something from the following premises:

  • The only animals in this house are cats
  • Every animal is suitable for a pet, that loves to gaze at the moon
  • When I detest an animal, I avoid it
  • No animals are carnivorous, unless they prowl at night
  • No cat fail to kill mice
  • No animals ever take to me, except what are in this house
  • Kangaroos are not suitable for pets
  • None but carnivora kill mice
  • I detest animals that do not take to me
  • Animals that prowl at night always love to gaze at the moon

These are encoded to the following one-letter predicates:

  • a - avoided by me
  • c - cat
  • d - detested by me
  • h - house, in this
  • k - kill mice
  • m - moon, love to gaze at
  • n - night, prowl at
  • p - pet, suitable for
  • r - (kanga)roo
  • t - take to me
  • v - (carni)vorous

So the problem set can be restated, in Spencer-Brown's terms, as

  • <h>c <m>p <d>a <v>n <c>k <t>h <r><p> <k>v td <n>m

I first don't understand why all premises can be just written in a row, which amounts to implicit "or", but it seems to work out well. As we've seen that <x>x is true for any x, we can cancel out such tautologies. For this, we reformat the expression to a list of values of type x or !x, that is in turn dumped into a local array for existence checking. And when both x and !x exist, they are removed from the expression:


proc lf'cancel expression {
   set e2 [string map {"< " ! "> " ""} [split $expression ""]]
   foreach term $e2 {if {$term ne ""} {set a($term) ""}}
   foreach var [array names a ?] {
       if [info exists a(!$var)] {
           set expression [string map [list <$var> "" $var ""] $expression]
       }
   }
   set expression
}
puts [lf'cancel {<h>c <m>p <d>a <v>n <c>k <t>h <r><p> <k>v td <n>m}]

which results in:

  • a <r>

translated back: "I avoid it, or it's not a kangaroo", or, reordered, "<r> a" which by (4) means, "All kangaroos are avoided by me".


Tk: the cross-platform GUI toolkit

For doing graphic user interfaces (GUI) on Macintosh, Unix/Linux or Windows, Tk (Tool Kit) is the most popular Tcl extension. With little effort it allows to put together useful windows consisting of widgets which are managed by geometry managers. Also, you can define bindings of mouse or keyboard events to trigger the actions you want.

Here is a very simple, complete Tcl/Tk script that implements a calculator:

package require Tk
pack [entry .e -textvar e -width 50]
bind .e <Return> {
   set e  [regsub {=.*} $e ""] ;# remove evaluation (Chris)
   catch {expr [string map {/ *1./} $e]} res
   append e " = $res"
} 

It creates an entry widget named .e, into which you can type from the keyboard, with an associated variable e (which will mirror the entry's content), and manages it with pack.

One binding is defined: if with keyboard focus on .e, the <Return> key is hit, then

  • all division operators (/) in the variable e are mapped to "*1./" (this forces floating point division),
  • the resulting string is fed to expr to evaluate it as an arithmetic/logic expression
  • as there may be errors in the user input, the expr call is wrapped into a catch which assigns either the result of expr, or the error message if one occurred, into the variable res
  • the result of the last evaluation is cleared by deleting everything after =
  • finally, an equal sign and the value of the res variable are appended to e, making input asnd result immediately visible in the entry.

Widgets

Widgets are GUI objects that, when mapped by a geometry manager, correspond to a rectangular area on the screen, with different properties and capabilities.

Widgets are named with path names, somehow resembling file system path names, except that the separator for widgets is a period ".". For example, .foo.bar is a child of foo which is a child of "." Parent-child relation occurs typically with containers, e.g. toplevel, frame, canvas, text. The name parts (i.e. between the dots) can be almost any string that does not contain "." of course, or starts with a Capital Letter (this is because widget class names with capital initial have a special meaning, they denote class-wide options).

Widgets are created by a command named after their class, which then takes the path name (requiring that parent widgets have to exist by that time) and any number of -key value options that make up the original configuration, e.g.

button .b -text "Hello!" -command {do something}

After creation, communication with the widget goes via a created command that corresponds to its name. For instance, the configuration options can be queried

set text [.b cget -text]

or changed

.b configure -text Goodbye! -background red

Some "methods" like cget, configure are generic to all widget classes, others are specific to one or a few. For instance, both 'text and entry accept the insert method. See the manual pages for all the details.

Here's a list of the widgets available in Tk:

button
with text and/or image, calls a configurable command when clicked
canvas
scrollable graphic surface for line, rectangle, polygon, oval, and text items, as well as bitmaps and photo images and whole embedded windows. See for example "A tiny drawing program" below
entry
one-line editable text field, horizontally scrollable (see example above)
frame
container for several widgets, often used with pack, or for wrapping "megawidgets"
label
one- or multiline field for text display, can be tied to a text variable to update when that variable changes
listbox
multiline display of a list, scrollable
scrollbar
can be tied to canvas, entry, listbox or text widgets
text
scrollable editable multiline text with many formatting options. Can also contain images and embedded widgets
toplevel
standalone frame window, mostly with decorations (title bar, buttons) from the window manager

Geometry managers


The purpose of geometry managers is to compute the required height, width, and location of widgets, and map them on screen. Besides grid, pack and place, the canvas and text widgets can also manage embedded widgets.

Calls to geometry managers always start with the manager's name, then (mostly) one or more widget names, then any number of -name value options

grid

This geometry manager is best suited for tabular window layout consisting of rows and columns. Example, to put three widgets in a horizontal row:

grid .1 .2 .3 -sticky news

The -sticky option indicates what side of its box the widget should stick to, in compass direction; "news" is north-east-west-south", i.e. all four sides.

Here's a minimal solution to display a table (list of lists) in a grid of labels:

package require Tk
proc table {w content args} {
    frame $w -bg black
    set r 0
    foreach row $content {
        set fields {}
        set c 0
        foreach col $row {
            lappend fields [label $w.$r/$c -text $col]
            incr c
        }
        eval grid $fields -sticky news -padx 1 -pady 1
        incr r
    }
    set w
}
#--- Test:
table .t {
   {Row Head1 Head2}
   {1   foo   42}
   {2   bar   1234}
   {3   grill testing}
}
pack .t
#--- Changing the contents, given row and column number:
after 2000 .t.3/2 config -text Coucou

pack

This used to be the workhorse manager, but in recent years has been less popular than grid. Anyway, it is still good for cases where you have widgets aligned in only one direction (horizontally or vertically). For more complex layouts, one used to insert intermediate frames, but grid makes such jobs just easier. Example:

pack .x -fill both -expand 1 -side left

place

This geometry manager is not often used, mostly for special applications, like when you want to highlight the current tab of a tab notebook. It allows pixel-precise placement of widgets, but is less dynamic in reaction to resizing of the toplevel or inner widgets.

BWidget

BWidget is an extension to Tk written in pure Tcl (therefore it can even run on Windows Mobile-driven cell phones). It offers mega-widgets (all class names starting with Uppercase) like

  • ComboBox
  • NoteBook
  • Tree

Screenshot of a NoteBook and a Tree, on a PocketPC under Windows/CE

Tree examples

Here is a "hello world" example of a Tree widget (this is a complete script). The root node is constaly called root, for the others you have to make up names:

package require BWidget
pack [Tree .t]
.t insert end root n1  -text hello
.t insert end root n2  -text world
.t insert end n2   n21 -text (fr:monde)
.t insert end n2   n22 -text (de:Welt)

The famous typewriter test sentence represented as syntax tree:

package require BWidget
pack [Tree .t -height 16] -fill both -expand 1
foreach {from to text} {
   root S S
   S   np1  NP
   S   vp   VP
   np1 det1 Det:The
   np1 ap1  AP
   ap1 adj1 Adj:quick
   ap1 adj2 Adj:brown
   ap1 n1   N:fox
   vp  v    V:jumps
   vp  pp   PP
   pp  prep Prep:over
   pp  np2  NP
   np2 det2 Det:the
   np2 ap2  AP
   ap2 adj3 Adj:lazy
   ap2 n2   N:dog
   
} {.t insert end $from $to -text $text}
.t opentree S

Images: photos and bitmaps

Tk allows simple yet powerful operations on images. These come in two varieties: bitmaps and photos. Bitmaps are rather limited in functionality, they can be specified in XBM format, and rendered in configurable colors.

Photos have much more possibilities - you can load and save them in different file formats (Tk itself supports PPM and GIF - for others, use the Img extension), copy them with many options, like scaling/subsampling or mirroring, and get or set the color of single pixels.

Setting can also do rectangular regions in one go, as the following example shall demonstrate that creates a photo image of a tricolore flag (three even-spaced vertical bands - like France, Italy, Belgium, Ireland and many more). The default is France. You can specify the width, the height will be 2/3 of that. The procedure returns the image name - just save it in -format GIF if you want:

proc tricolore {w {colors {blue white red}}} {
   set im [image create photo]
   set fromx 0
   set dx [expr $w/3]
   set tox $dx
   set toy [expr $w*2/3]
   foreach color $colors {
       $im put $color -to $fromx 0 $tox $toy
       incr fromx $dx; incr tox $dx
   }
   set im
}


Debugging Tk programs

A Tk program under development can be very rapidly debugged by adding such bindings:

bind . <F1> {console show}

This works only on Windows and Macintosh (pre-OS-X) and brings up a console in which you can interact with the Tcl interpreter, inspect or modify global variables, configure widgets, etc.

bind . <Escape> {eval [list exec wish $argv0] $argv &; exit}

This starts a new instance of the current program (assuming you edited a source file and saved it to disk), and then terminates the current instance.

Tcl/Tk examples

The following examples originally appeared in the Tcler's Wiki http://wiki.tcl.tk . They are all in the public domain - no rights reserved.

A funny cookbook

This funny little program produces random cooking recipes. Small as it is, it can produce 900 different recipes, though they might not be to everybody's taste...

The basic idea is to pick an arbitrary element from a list, which is easily done in Tcl with the following:}

proc ? L {
 lindex $L [expr {int(rand()*[llength $L])}]
}

if 0 {This is used several times in:}

proc recipe {} {
  set a {
    {3 eggs} {an apple} {a pound of garlic}
    {a pumpkin} {20 marshmallows}
  }
  set b {
    {Cut in small pieces} {Dissolve in lemonade} {Bury in the ground for 3 months}
    {Bake at 300 degrees} {Cook until tender}
  }
  set c {parsley snow nutmeg curry raisins cinnamon}
  set d {
     ice-cream {chocolate cake} spinach {fried potatoes} rice {soy sprouts}
  }
  return "   Take [? $a].
  [? $b].
  Top with [? $c].
  Serve with [? $d]."
}

And as modern programs always need a GUI, here is a minimal one that appears when you source this file at top level, and shows a new recipe every time you click on it:

if {[file tail [info script]]==[file tail $argv0]} {
  package require Tk
  pack [text .t -width 40 -height 5]
  bind .t <1> {showRecipe %W; break}
  proc showRecipe w {
    $w delete 1.0 end
    $w insert end [recipe]
  }
  showRecipe .t
}

A little A/D clock

This is a clock that shows time either analog or digital - just click on it to toggle.

#!/usr/bin/env tclsh
package require Tk
proc every {ms body} {eval $body; after $ms [info level 0]}
proc drawhands w {
    $w delete hands
    set secSinceMidnight [expr {[clock sec]-[clock scan 00:00:00]}]
    foreach divisor {60 3600 43200} length {45 40 30} width {1 3 7} {
       set angle [expr {$secSinceMidnight * 6.283185 / $divisor}]
       set x [expr {50 + $length * sin($angle)}]
       set y [expr {50 - $length * cos($angle)}]
       $w create line 50 50 $x $y -width $width -tags hands
    }
}
proc toggle {w1 w2} {
    if [winfo ismapped $w2] {
        foreach {w2 w1} [list $w1 $w2] break ;# swap
    }
    pack forget $w1
    pack $w2
}
#-- Creating the analog clock:
canvas .analog -width 100 -height 100 -bg white
every 1000 {drawhands .analog}
pack .analog
#-- Creating the digital clock:
label .digital -textvar ::time -font {Courier 24}
every 1000 {set ::time [clock format [clock sec] -format %H:%M:%S]}
bind . <1> {toggle .analog .digital}

A little calculator

Here is a small calculator in Tcl/Tk. In addition to the buttons on screen, you can use any of expr's other functionalities via keyboard input.

package require Tk
wm title . Calculator
grid [entry .e -textvar e -just right] -columnspan 5
bind .e <Return> =
set n 0
foreach row {
   {7 8 9 + -}
   {4 5 6 * /}
   {1 2 3 ( )}
   {C 0 . =  }
} {
   foreach key $row {
       switch -- $key {
           =       {set cmd =}
           C       {set cmd {set clear 1; set e ""}}
           default {set cmd "hit $key"}
       }
       lappend keys [button .[incr n] -text $key -command $cmd]
   }
   eval grid $keys -sticky we ;#-padx 1 -pady 1
   set keys [list]
}
grid .$n -columnspan 2 ;# make last key (=) double wide
proc = {} {
   regsub { =.+} $::e "" ::e ;# maybe clear previous result
   if [catch {set ::res [expr [string map {/ *1.0/} $::e]]}] {
       .e config -fg red
   }
   append ::e = $::res 
   .e xview end
   set ::clear 1
}
proc hit {key} {
   if $::clear {
       set ::e ""
       if ![regexp {[0-9().]} $key] {set ::e $::res}
       .e config -fg black
       .e icursor end
       set ::clear 0
   }
   .e insert end $key
}
set clear 0
focus .e           ;# allow keyboard input
wm resizable . 0 0

And, as Cameron Laird noted, this thingy is even programmable: enter for example

[proc fac x {expr {$x<2? 1: $x*[fac [incr x -1]]}}]

into the entry, disregard warnings; now you can do

[fac 10]

and receive [fac 10] = 3628800.0 as result...

A little slide rule

The slide rule was an analog, mechanical device for approximate engineering computing, made obsolete by the pocket calculator since about the 1970-80s. The basic principle is that multiplication is done by adding logarithms, hence most of the scales are logarithmic, with uneven increments.

This fun project recreates a slide rule (roughly an Aristo-Rietz Nr. 89 with 7 scales - high-notch ones had up to 24) with a white "body" and a beige "slide" which you can move left or right with mouse button 1 clicked, or in pixel increment with the <Shift-Left>/<Shift-Right> cursor keys. Finally, the blue line represents the "mark" (how is that correctly called? "runner"? "slider"?) which you can move with the mouse over the whole thing to read a value. Fine movements with <Left>/<Right>.

Due to rounding errors (integer pixels), this plaything is even less precise than a physical slide rule was, but maybe you still enjoy the memories... The screenshot shows how I found out that 3 times 7 is approx. 21... (check the A and B scales).

proc ui {} {
   set width 620
   pack [canvas .c -width $width -height 170 -bg white]
   pack [label .l -textvariable info -fg blue] -fill x
   .c create rect 0 50 $width 120 -fill grey90
   .c create rect 0 50 $width 120 -fill beige -outline beige \
       -tag {slide slidebase}
   .c create line 0 0 0 120 -tag mark -fill blue
   drawScale .c K  x3    10 5    5 log10 1 1000 186.6666667
   drawScale .c A  x2    10 50  -5 log10 1 100 280
   drawScale .c B  x2    10 50   5 log10 1 100 280 slide
   drawScale .c CI 1/x   10 90 -5 -log10 1 10  560 slide
   drawScale .c C  x     10 120 -5 log10 1 10  560 slide
   drawScale .c D  x     10 120  5 log10 1 10  560
   drawScale .c L "lg x" 10 160 -5 by100  0 10   5600
   bind .c <Motion> {.c coords mark %x 0 %x 170; set info [values .c]}
   bind .c <1> {set x %x}
   bind .c <B1-Motion> {%W move slide [expr {%x-$x}] 0; set x %x}
   bind . <Shift-Left>  {.c move slide -1 0; set info [values .c]}
   bind . <Shift-Right> {.c move slide  1 0; set info [values .c]}
   bind . <Left>  {.c move mark -1 0; set info [values .c]}
   bind . <Right> {.c move mark  1 0; set info [values .c]}
}
proc drawScale {w name label x y dy f from to fac {tag {}}} {
   set color [expr {[string match -* $f]? "red": "black"}]
   $w create text $x [expr $y+2*$dy] -text $name -tag $tag -fill $color
   $w create text 600 [expr $y+2*$dy] -text $label -tag $tag -fill $color
   set x [expr {[string match -* $f]? 580: $x+10}]
   set mod 5
   set lastlabel ""
   set lastx 0
   for {set i [expr {$from*10}]} {$i<=$to*10} {incr i} {
       if {$i>100} {
           if {$i%10} continue ;# coarser increments
           set mod 50
       }
       if {$i>1000} {
           if {$i%100} continue ;# coarser increments
           set mod 500
       }
       set x0 [expr $x+[$f [expr {$i/10.}]]*$fac]
       set y1 [expr {$i%(2*$mod)==0? $y+2.*$dy: $i%$mod==0? $y+1.7*$dy: $y+$dy}]
       set firstdigit [string index $i 0]
       if {$y1==$y+$dy && abs($x0-$lastx)<2} continue
       set lastx $x0
       if {$i%($mod*2)==0 && $firstdigit != $lastlabel} {
           $w create text $x0 [expr $y+3*$dy] -text $firstdigit \
              -tag $tag -font {Helvetica 7} -fill $color
           set lastlabel $firstdigit
       }
       $w create line $x0 $y $x0 $y1 -tag $tag -fill $color
   }
}
proc values w {
   set x0 [lindex [$w coords slidebase] 0]
   set x1 [lindex [$w coords mark] 0]
   set lgx [expr {($x1-20)/560.}]
   set x [expr {pow(10,$lgx)}]
   set lgxs [expr {($x1-$x0-20)/560.}]
   set xs [expr {pow(10,$lgxs)}]
   set res     K:[format %.2f [expr {pow($x,3)}]]
   append res "  A:[format %.2f [expr {pow($x,2)}]]"
   append res "  B:[format %.2f [expr {pow($xs,2)}]]"
   append res "  CI:[format %.2f [expr {pow(10,-$lgxs)*10}]]"
   append res "  C:[format %.2f $xs]"
   append res "  D:[format %.2f $x]"
   append res "  L:[format %.2f $lgx]"
}
proc pow10 x {expr {pow(10,$x)}}
proc log10 x {expr {log10($x)}}
proc -log10 x {expr {-log10($x)}}
proc by100  x {expr {$x/100.}}
#--------------------------------
ui
bind . <Escape> {exec wish $argv0 &; exit}

A minimal doodler

Here is a tiny but complete script that allows doodling (drawing with the mouse) on a canvas widget:

proc doodle {w {color black}} {
   bind $w <1>         [list doodle'start %W %x %y $color]
   bind $w <B1-Motion> {doodle'move %W %x %y}
}
proc doodle'start {w x y color} {
   set ::_id [$w create line $x $y $x $y -fill $color]
}
proc doodle'move {w x y} {
   $w coords $::_id [concat [$w coords $::_id] $x $y]
}
pack [canvas .c -bg white] -fill both -expand 1
doodle       .c
bind .c <Double-3> {%W delete all}

And here it comes again, but this time with explanations:

The "Application Program Interface" (API) for this, if you want such ceremonial language, is the doodle command, where you specify which canvas widget should be enabled to doodle, and in which color (defaults to black):}

proc doodle {w {color black}} {
   bind $w <1>         [list doodle'start %W %x %y $color]
   bind $w <B1-Motion> {doodle'move %W %x %y}
}

It registers two bindings for the canvas, one (<1>) when the left mouse-button is clicked, and the other when the mouse is moved with button 1 (left) down. Both bindings just call one internal function each.

On left-click, a line item is created on the canvas in the specified fill color, but with no extent yet, as start and end points coincide. The item ID (a number assigned by the canvas) is kept in a global variable, as it will have to persist long after this procedure has returned:

proc doodle'start {w x y color} {
   set ::_id [$w create line $x $y $x $y -fill $color]
}

if 0 {The left-motion procedure obtains the coordinates (alternating x and y) of the globally known doodling line object, appends the current coordinates to it, and makes this the new cooordinates - in other words, extends the line to the current mouse position:}

proc doodle'move {w x y} {
   $w coords $::_id [concat [$w coords $::_id] $x $y]
}

if 0 {That's all we need to implement doodling - now let's create a canvas to test it, and pack it so it can be drawn as big as you wish:}

pack [canvas .c -bg white] -fill both -expand 1

if 0 {And this line turns on the doodle functionality created above (defaulting to black):}

doodle       .c

if 0 {Add a binding for double-right-click/double-button-3, to clear the canvas (added by MG, Apr 29 04)}

bind .c <Double-3> {%W delete all}

A tiny drawing program

Here is a tiny drawing program on a canvas. Radio buttons on top allow choice of draw mode and fill color. In "Move" mode, you can of course move items around. Right-click on an item to delete it.

A radio is an obvious "megawidget" to hold a row of radiobuttons. This simple one allows text or color mode: }

proc radio {w var values {col 0}} {
   frame $w
   set type [expr {$col? "-background" : "-text"}]
   foreach value $values {
       radiobutton $w.v$value $type $value -variable $var -value $value \
           -indicatoron 0
       if $col {$w.v$value config -selectcolor $value -borderwidth 3}
   }
   eval pack [winfo children $w] -side left
   set ::$var [lindex $values 0]
   set w
}

if 0 {Depending on draw mode, the mouse events "Down" and "Motion" have different handlers, which are dispatched by names that look like array elements. So for a mode X, we need a pair of procs, down(X) and move(X). Values used between calls are kept in global variables.

First, the handlers for free-hand line drawing:}

proc down(Draw) {w x y} {
   set ::ID [$w create line $x $y $x $y -fill $::Fill]
}
proc move(Draw) {w x y} {
   $w coords $::ID [concat [$w coords $::ID] $x $y]
}
#-- Movement of an item
proc down(Move) {w x y} {
   set ::ID [$w find withtag current]
   set ::X $x; set ::Y $y
}
proc move(Move) {w x y} {
   $w move $::ID [expr {$x-$::X}] [expr {$y-$::Y}]
   set ::X $x; set ::Y $y
}
#-- Clone an existing item
proc serializeCanvasItem {c item} {
   set data [concat [$c type $item] [$c coords $item]]
   foreach opt [$c itemconfigure $item] {
       # Include any configuration that deviates from the default
       if {[lindex $opt end] != [lindex $opt end-1]} {
           lappend data [lindex $opt 0] [lindex $opt end]
           }
       }
   return $data
   }
proc down(Clone) {w x y} {
   set current [$w find withtag current]
   if {[string length $current] > 0} {
       set itemData [serializeCanvasItem $w [$w find withtag current]]
       set ::ID [eval $w create $itemData]
       set ::X $x; set ::Y $y
   }
}
interp alias {} move(Clone) {} move(Move)
#-- Drawing a rectangle
proc down(Rect) {w x y} {
   set ::ID [$w create rect $x $y $x $y -fill $::Fill]
}
proc move(Rect) {w x y} {
   $w coords $::ID [lreplace [$w coords $::ID] 2 3 $x $y]
}
#-- Drawing an oval (or circle, if you're careful)
proc down(Oval) {w x y} {
   set ::ID [$w create oval $x $y $x $y -fill $::Fill]
}
proc move(Oval) {w x y} {
   $w coords $::ID [lreplace [$w coords $::ID] 2 3 $x $y]
}

if 0 {Polygons are drawn by clicking the corners. When a corner is close enough to the first one, the polygon is closed and drawn. }

proc down(Poly) {w x y} {
   if [info exists ::Poly] {
       set coords [$w coords $::Poly]
       foreach {x0 y0} $coords break
       if {hypot($y-$y0,$x-$x0)<10} {
           $w delete $::Poly
           $w create poly [lrange $coords 2 end] -fill $::Fill
           unset ::Poly
       } else {
           $w coords $::Poly [concat $coords $x $y]
       }
   } else {
       set ::Poly [$w create line $x $y $x $y -fill $::Fill]
   }
}
proc move(Poly) {w x y} {#nothing}
#-- With little more coding, the Fill mode allows changing an item's fill color:
proc down(Fill) {w x y} {$w itemconfig current -fill $::Fill}
proc move(Fill) {w x y} {}
#-- Building the UI
set modes {Draw Move Clone Fill Rect Oval Poly}
set colors {
   black white magenta brown red orange yellow green green3 green4
   cyan blue blue4 purple
}
grid [radio .1 Mode $modes] [radio .2 Fill $colors 1] -sticky nw
grid [canvas .c -relief raised -borderwidth 1] - -sticky news
grid rowconfig . 0 -weight 0
grid rowconfig . 1 -weight 1
#-- The current mode is retrieved at runtime from the global Mode variable:
bind .c <1>         {down($Mode) %W %x %y}
bind .c <B1-Motion> {move($Mode) %W %x %y}
bind .c <3>         {%W delete current}

if 0 {For saving the current image, you need the Img extension, so just omit the following binding if you don't have Img:}

bind . <F1> {
   package require Img
   set img [image create photo -data .c]
   set name [tk_getSaveFile -filetypes {{GIFF .gif} {"All files" *}}\
       -defaultextension .gif]
   if {$name ne ""} {$img write $name; wm title . $name}
}
#-- This is an always useful helper in development:
bind . <Escape> {exec wish $argv0 &; exit}

A minimal editor

Here's an utterly simple editor, in 26 lines of code, which just allows to load and save files, and of course edit, and cut and paste, and whatever is built-in into the text widget anyway. And it has a bit "online help"... ;-)

It is always a good idea to start a source file with some explanations on the name, purpose, author, and date. I have recently picked up the habit to put this information into a string variable (which in Tcl can easily span multiple lines), so the same info is presented to the reader of the source code, and can be displayed as online help: }

set about "minEd - a minimal editor
Richard Suchenwirth 2003
F1: help
F2: load
F3: save
"

if 0 {The visible part of a Graphical User Interface (GUI) consists of widgets. For this editor, I of course need a text widget, and a vertical scrollbar. With the option "-wrap word" for the text widget, another horizontal scrollbar is not needed - lines longer than the window just wrap at word boundaries.

Tk widgets come on the screen in two steps: first, they are created with an initial configuration; then, handed to a "geometry manager" for display. As widget creation commands return the pathname, they can be nested into the manager command (pack in this case), to keep all settings for a widget in one place. This may lead to over-long lines, though.

Although the scrollbar comes to the right of the text, I create and pack it first. The reason is that when a window is made smaller by the user, the widgets last packed first lose visibility.

These two lines also illustrate the coupling between a scrollbar and the widget it controls:

  • the scrollbar sends a yview message to it when moved
  • the widget sends a set message to the scrollbar when the view changed, for instance from cursor keys

And these two lines already give us an editor for arbitrarily long text, with built-in capabilities of cut, copy, and paste - see the text man page. Only file I/O has to be added by us to make it really usable. }

pack [scrollbar .y -command ".t yview"] -side right -fill y
pack [text .t -wrap word -yscrollc ".y set"] -side right -fill both -expand 1

if 0 {The other important part of a GUI are the bindings - what event shall trigger what action. For simplicity, I've limited the bindings here to a few of the function keys on top of typical keyboards:}

bind . <F1> {tk_messageBox -message $about}

if 0 {Online help is done with a no-frills tk_messageBox with the "about" text defined at start of file. - The other bindings call custom commands, which get a filename argument from Tk's file selector dialogs:}

bind . <F2> {loadText .t [tk_getOpenFile]}
bind . <F3> {saveText .t [tk_getSaveFile]}

if 0 {These dialogs can also be configured in a number of ways, but even in this simple form they are quite powerful - allow navigation around the file system, etc. On Windows they call the native file selectors, which have a history of previously opened files, detail view (size/date etc.)

When this editor is called with a filename on the command line, that file is loaded on startup (simple as it is, it can only handle one file at a time):}

if {$argv != ""} {loadText .t [lindex $argv 0]}

if 0 {The procedures for loading and saving text both start with a sanity check of the filename argument - if it's an empty string, as produced by file selector dialogs when the user cancels, they return immediately. Otherwise, they transfer file content to text widget or vice-versa. loadText adds the "luxury" that the name of the current file is also put into the window title. Then it opens the file, clears the text widget, reads all file contents in one go, and puts them into the text widget.}

proc loadText {w fn} {
   if {$fn==""} return
   wm title . [file tail $fn]
   set fp [open $fn]
   $w delete 1.0 end
   $w insert end [read $fp]
   close $fp
}

if 0 {saveText takes care not to save the extra newline that text widgets append at end, by limiting the range to "end - 1 c"(haracter).}

proc saveText {w fn} {
   if {$fn==""} return
   set fp [open $fn w]
   puts -nonewline $fp [$w get 1.0 "end - 1 c"]
   close $fp
}

File watch

Some editors (e.g. PFE, MS Visual Studio) pop up an alert dialog when a file was changed on disk while being edited - that might lead to edit conflicts. Emacs shows a more subtle warning at the first attempt to change a file that has changed on disk.

Here I try to emulate this feature. It is oversimplified because it does not update the mtime (file modification time) to check, once you saved it from the editor itself. So make sure to call text'watch'file again after saving.

Using the global variable ::_twf it is at least possible to avoid false alarms - for a more serious implementation one might use a namespaced array of watchers, indexed by file name, in case you want multiple edit windows. }

proc text'watch'file {w file {mtime -}} {
   set checkinterval 1000 ;# modify as needed
   if {$mtime eq "-"} {
       if [info exists ::_twf] {after cancel $::_twf}
       set file [file join [pwd] $file]
       text'watch'file $w $file [file mtime $file]
   } else {
       set newtime [file mtime $file]
       if {$newtime != $mtime} {
           set answer [tk_messageBox -type yesno -message \
               "The file\n$file\nhas changed on disk. Reload it?"]
           if {$answer eq "yes"} {text'read'file $w $file}
           text'watch'file $w $file
       } else {set ::_twf [after $checkinterval [info level 0]]}
   }
}
proc text'read'file {w file} {
   set f [open $file]
   $w delete 1.0 end
   $w insert end [read $f]
   close $f
}
#-- Testing:
pack [text .t -wrap word] -fill both -expand 1
set file textwatch.tcl
text'read'file  .t $file
text'watch'file .t $file

if 0 {The dialog should come up when you change the file externally, say by touch-ing it in pure Tcl, which might be done with editing it in another editor, or

file mtime $filename [clock seconds]

Tiny presentation graphics

This is a crude little canvas presentation graphics that runs on PocketPCs, but also on bigger boxes (one might scale fonts and dimensions there). Switch pages with Left/Right cursor, or left/right mouseclick (though a stylus cannot right-click).

Not many features, but the code is very compact, and with a cute little language for content specification, see example at end (which shows what I presented at the 2003 Euro-Tcl convention in Nuremberg...)}

proc slide args {
  global slides
  if {![info exist slides]} slide'init
  incr slides(N)
  set slides(title,$slides(N)) [join $args]
}
proc slide'line {type args} {
  global slides
  lappend slides(body,$slides(N)) [list $type [join $args]]
}
foreach name {* + -} {interp alias {} $name {} slide'line $name}
proc slide'init {} {
  global slides
  array set slides {
     canvas .c  N 0  show 1 dy 20
     titlefont {Tahoma 22 bold} * {Tahoma 14 bold} + {Tahoma 12}
     - {Courier 10}
  }
  pack [canvas .c -bg white] -expand 1 -fill both
  foreach e {<1> <Right>} {bind . $e {slide'show 1}}
  foreach e {<3> <Left>} {bind . $e {slide'show -1}}
  wm geometry . +0+0
  after idle {slide'show 0}
}
proc slide'show delta {
  upvar #0 slides s
  incr s(show) $delta
  if {$s(show)<1 || $s(show)>$s(N)} {
     incr s(show) [expr -$delta]
  } else {
     set c $s(canvas)
     $c delete all
     set x 10; set y 20
     $c create text $x $y -anchor w -text $s(title,$s(show))\
        -font $s(titlefont) -fill blue
     incr y $s(dy)
     $c create line $x $y 2048 $y -fill red -width 4
     foreach line $s(body,$s(show)) {
        foreach {type text} $line break
        incr y $s(dy)
        $c create text $x $y -anchor w -text $text \
        -font $s($type)
     }
  }
}
bind . <Up> {exec wish $argv0 &; exit} ;# dev helper

The rest is data - or is it code? Anyway, here's my show:

slide i18n - Tcl for the world
+ Richard Suchenwirth, Nuremberg 2003
+
* i18n: internationalization
+ 'make software work with many languages'
+
* l10n: localization
+ 'make software work with the local language'
slide Terminology
* Glyphs:
+ visible elements of writing
* Characters:
+ abstract elements of writing
* Byte sequences:
+ physical text data representation
* Rendering: character -> glyph
* Encoding: character <-> byte sequence
slide Before Unicode
* Bacon (1580), Baudot: 5-bit encodings
* Fieldata (6 bits), EBCDIC (8 bits)
* ASCII (7 bits)
+ world-wide "kernel" of encodings
* 8-bit codepages: DOS, Mac, Windows
* ISO 8859-x: 16 varieties
slide East Asia
* Thousands of characters/country
+ Solution: use 2 bytes, 94x94 matrix
+ Japan: JIS C-6226/0208/0212
+ China: GB2312-80
+ Korea: KS-C 5601
+
* coexist with ASCII in EUC encodings
slide Unicode covers all
* Common standard of software industry
* kept in synch with ISO 10646
+ Used to be 16 bits, until U 3.1
+ Now needs up to 31 bits
* Byte order problem:
+ little-endian/big-endian
+ U+FEFF "Byte Order Mark"
+ U+FFFE --illegal--
slide UTF-8
* Varying length: 1..3(..6) bytes
+ 1 byte: ASCII
+ 2 bytes: pages 00..07, Alphabets
+ 3 bytes: pages 08..FF, rest of BMP
+ >3 bytes: higher pages
+
* Standard in XML, coming in Unix
slide Tcl i18n
* Everything is a Unicode string (BMP)
+ internal rep: UTF-8/UCS-2
* Important commands:
- fconfigure \$ch -encoding \$e
- encoding convertfrom \$e \$s
- encoding convertto   \$e \$s
+
* msgcat supports l10n:
- {"File" -> [mc "File"]}
slide Tk i18n
* Any widget text is Unicoded
* Automatic font finding
+ Fonts must be provided by system
+
* Missing: bidi treatment
+ right-to-left conversion (ar,he)
slide Input i18n
* Keyboard rebinding (bindtags)
* East Asia: keyboard buffering
+ Menu selection for ambiguities
+
* Virtual keyboard (buttons, canvas)
* String conversion: *lish family
- {[ruslish Moskva]-[greeklish Aqh'nai]}
slide i18n - Tcl for the world
+
+
+ Thank you.

Timeline display

Yet another thing to do with a canvas: history visualisation of a horizontal time-line, for which a year scale is displayed on top. The following kinds of objects are so far available:

  • "eras", displayed in yellow below the timeline in boxes
  • "background items" that are grey and stretch over all the canvas in height
  • normal items, which get displayed as stacked orange bars

You can zoom in with <1>, out with <3> (both only in x direction). On mouse motion, the current year is displayed in the toplevel's title. Normal items can be a single year, like the Columbus example, or a range of years, for instance for lifetimes of persons. (The example shows that Mozart didn't live long...)

namespace eval timeliner {
   variable ""
   array set "" {-zoom 1  -from 0 -to 2000}
}
proc timeliner::create {w args} {
   variable ""
   array set "" $args
   #-- draw time scale
   for {set x [expr ($(-from)/50)*50]} {$x<=$(-to)} {incr x 10} {
       if {$x%50 == 0} {
           $w create line $x 8 $x 0
           $w create text $x 8 -text $x -anchor n
       } else {
           $w create line $x 5 $x 0
       }
   }
   bind $w <Motion> {timeliner::title %W %x ; timeliner::movehair %W %x}
   bind $w <1> {timeliner::zoom %W %x 1.25}
   bind $w <2> {timeliner::hair %W %x}
   bind $w <3> {timeliner::zoom %W %x 0.8}
}
proc timeliner::movehair {w x} {
   variable ""
   if {[llength [$w find withtag hair]]} {
       set x [$w canvasx $x]
       $w move hair [expr {$x - $(x)}] 0
       set (x) $x
   }
}
proc timeliner::hair {w x} {
   variable ""
   if {[llength [$w find withtag hair]]} {
       $w delete hair
   } else {
       set (x) [$w canvasx $x]
       $w create line $(x) 0 $(x) [$w cget -height] \
                 -tags hair -width 1 -fill red
   }
}
proc timeliner::title {w x} {
   variable ""
   wm title . [expr int([$w canvasx $x]/$(-zoom))]
}
proc timeliner::zoom {w x factor} {
   variable ""
   $w scale all 0 0 $factor 1
   set (-zoom) [expr {$(-zoom)*$factor}]
   $w config -scrollregion [$w bbox all]
   if {[llength [$w find withtag hair]]} {
       $w delete hair
       set (x) [$w canvasx $x]
       $w create line $(x) 0 $(x) [$w cget -height] \
                 -tags hair -width 1 -fill red
   }
}

This command adds an object to the canvas. The code for "item" took me some effort, as it had to locate a free "slot" on the canvas, searching top-down:

proc timeliner::add {w type name time args} {
   variable ""
   regexp {(\d+)(-(\d+))?} $time -> from - to
   if {$to eq ""} {set to $from}
   set x0 [expr {$from*$(-zoom)}]
   set x1 [expr {$to*$(-zoom)}]
   switch -- $type {
       era    {set fill yellow; set outline black; set y0 20; set y1 40}
       bgitem {set fill gray; set outline {}; set y0 40; set y1 1024}
       item   {
           set fill orange
           set outline yellow
           for {set y0 60} {$y0<400} {incr y0 20} {
               set y1 [expr {$y0+18}]
               if {[$w find overlap [expr $x0-5] $y0 $x1 $y1] eq ""} break
           }
       }
   }
   set id [$w create rect $x0 $y0 $x1 $y1 -fill $fill -outline $outline]
   if {$type eq "bgitem"} {$w lower $id}
   set x2 [expr {$x0+5}]
   set y2 [expr {$y0+2}]
   set tid [$w create text $x2 $y2 -text $name -anchor nw]
   foreach arg $args {
       if {$arg eq "!"} {
           $w itemconfig $tid -font "[$w itemcget $tid -font] bold"
       }
   }
   $w config -scrollregion [$w bbox all]
}

Here's a sample application, featuring a concise history of music in terms of composers:

scrollbar .x -ori hori -command {.c xview}
pack      .x -side bottom -fill x
canvas    .c -bg white -width 600 -height 300 -xscrollcommand {.x set}
pack      .c -fill both -expand 1
timeliner::create .c -from 1400 -to 2000

These nifty shorthands for adding items make data specification a breeze - compare the original call, and the shorthand:

   timeliner::add .c item Purcell 1659-1695
   - Purcell 1659-1695

With an additional "!" argument you can make the text of an item bold:

foreach {shorthand type} {* era  x bgitem - item} {
   interp alias {} $shorthand {} timeliner::add .c $type
}

Now for the data to display (written pretty readably):

* {Middle Ages} 1400-1450
- Dufay 1400-1474
* Renaissance    1450-1600
- Desprez 1440-1521
- Luther 1483-1546
- {Columbus discovers America} 1492
- Palestrina 1525-1594 !
- Lasso 1532-1594
- Byrd 1543-1623
* Baroque        1600-1750
- Dowland 1563-1626
- Monteverdi 1567-1643
- Schütz 1585-1672
- Purcell 1659-1695
- Telemann 1681-1767
- Rameau 1683-1764
- Bach,J.S. 1685-1750 !
- Händel 1685-1759
x {30-years war} 1618-1648
* {Classic era}  1750-1810
- Haydn 1732-1809 !
- Boccherini 1743-1805
- Mozart 1756-1791 !
- Beethoven 1770-1828 !
* {Romantic era} 1810-1914
- {Mendelssohn Bartholdy} 1809-1847
- Chopin 1810-1849
- Liszt 1811-1886
- Verdi 1813-1901
x {French revolution} 1789-1800
* {Modern era}   1914-2000
- Ravel 1875-1937 !
- Bartók 1881-1945
- Stravinskij 1882-1971
- Varèse 1883-1965
- Prokof'ev 1891-1953
- Milhaud 1892-1974
- Honegger 1892-1955
- Hindemith 1895-1963
- Britten 1913-1976
x WW1 1914-1918
x WW2 1938-1945

Fun with functions

My teenage daughter hates math. In order to motivate her, I beefed up an earlier little function plotter which before only took one function, in strict Tcl (expr) notation, from the command line. Now there's an entry widget, and the accepted language has also been enriched: beyond exprs rules, you can omit dollar and multiplication signs, like 2x+1, powers can be written as x3 instead of ($x*$x*$x); in simple cases you can omit parens round function arguments, like sin x2. Hitting <Return> in the entry widget displays the function's graph.

If you need some ideas, click on the "?" button to cycle through a set of demo functions, from boring to bizarre (e.g. if rand() is used). Besides default scaling, you can zoom in or out. Moving the mouse pointer over the canvas displays x and y coordinates, and the display changes to white if you're on a point on the curve.

The target was not reached: my daughter still hates math. But at least I had hours of Tcl (and function) fun again, surfing in the Cartesian plane... hope you enjoy it too!

proc main {} {
   canvas .c -bg white -borderwidth 0
   bind   .c <Motion> {displayXY .info %x %y}
   frame  .f
     label  .f.1 -text "f(x) = "
     entry  .f.f -textvar ::function -width 40
       bind .f.f <Return> {plotf .c $::function}
     button .f.demo -text " ? " -pady 0 -command {demo .c}
     label  .f.2 -text " Zoom: "
     entry  .f.fac -textvar ::factor -width 4
       set                  ::factor 32
       bind .f.fac <Return>               {zoom .c 1.0}
     button .f.plus  -text " + " -pady 0 -command {zoom .c 2.0}
     button .f.minus -text " - " -pady 0 -command {zoom .c 0.5}
     eval pack [winfo children .f] -side left -fill both
   label  .info -textvar ::info -just left
   pack .info .f -fill x -side bottom
   pack .c -fill both -expand 1
   demo .c
}
set ::demos {
       "cos x3" 2 1-x 0.5x2 x3/5 "sin x" "sin x2" 1/x sqrt(x)
       "tan x/5" x+1/x x abs(x) "exp x" "log x" "log x2"
       round(x) "int x%2" "x-int x" "0.2tan x+1/tan x" x*(rand()-0.5)
       x2/5-1/(2x) "atan x" sqrt(1-x2) "abs(x-int(x*2))" (x-1)/(x+1)
       "sin x-tan x" "sin x-tan x2" "x-abs(int x)" 0.5x-1/x
       -0.5x3+x2+x-1 3*sin(2x) -0.05x4-0.2x3+1.5x2+2x-3 "9%int x"
       0.5x2/(x3-3x2+4) "abs x2-3 int x" "int x%3"
}
proc displayXY {w cx cy} {
       set x [expr {double($cx-$::dx)/$::factor}]
       set y [expr {double(-$cy+$::dy)/$::factor}]
       set ::info [format "x=%.2f y=%.2f" $x $y]
       catch {
       $w config -fg [expr {abs([expr $::fun]-$y)<0.01?"white":"black"}]
       } ;# may divide by zero, or other illegal things
}
proc zoom {w howmuch} {
   set ::factor [expr round($::factor*$howmuch)]
   plotf $w $::function
}
proc plotf {w function} {
   foreach {re subst} {
       {([a-z]) +(x[0-9]?)} {\1(\2)}   " " ""   {([0-9])([a-z])} {\1*\2}
       x2 x*x   x3 x*x*x    x4 x*x*x*x   x \$x   {e\$xp} exp
   } {regsub -all $re $function $subst function}
   set ::fun $function
   set ::info "Tcl: expr $::fun"
   set color [lpick {red blue purple brown green}]
   plotline $w [fun2points $::fun] -fill $color
}
proc lpick L {lindex $L [expr {int(rand()*[llength $L])}]}
proc fun2points {fun args} {
   array set opt {-from -10.0 -to 10.0 -step .01}
   array set opt $args
   set res "{"
   for {set x $opt(-from)} {$x<= $opt(-to)} {set x [expr {$x+$opt(-step)}]} {
       if {![catch {expr $fun} y]} {
           if {[info exists lasty] && abs($y-$lasty)>100} {
               append res "\} \{" ;# incontinuity
           }
           append res " $x $y"
           set lasty $y
       } else {append res "\} \{"}
   }
   append res "}"
}
proc plotline {w points args} {
   $w delete all
   foreach i $points {
       if {[llength $i]>2} {eval $w create line $i $args -tags f}
   }
   set fac $::factor
   $w scale all 0 0 $fac -$fac
   $w create line -10000 0 10000 0      ;# X axis
   $w create line 0 -10000 0 10000      ;# Y axis
   $w create line $fac 0     $fac -3    ;# x=1 tick
   $w create line -3   -$fac 0    -$fac ;# y=1 tick
   set ::dx [expr {[$w cget -width]/2}]
   set ::dy [expr {[$w cget -height]/2}]
   $w move all $::dx $::dy
   $w raise f
}
proc demo {w} {
   set ::function [lindex $::demos 0] ;# cycle through...
   set ::demos [concat [lrange $::demos 1 end] [list $::function]]
   set ::factor 32
   plotf $w $::function
}
main

Functional imaging

In Conal Elliott's Pan project ("Functional Image Synthesis", [1]), images (of arbitrary size and resolution) are produced and manipulated in an elegant functional way. Functions written in Haskell (see Playing Haskell) are applied, mostly in functional composition, to pixels to return their color value. FAQ: "Can we have that in Tcl too?"

As the funimj demo below shows, in principle yes; but it takes some patience (or a very fast CPU) - for a 200x200 image the function is called 40000 times, which takes 9..48 seconds on my P200 box. Still, the output often is worth waiting for... and the time used to write this code was negligible, as the Haskell original could with few modifications be represented in Tcl. Functional composition had to be rewritten to Tcl's Polish notation - Haskell's

foo 1 o bar 2 o grill

(where "o" is the composition operator) would in Tcl look like

o {foo 1} {bar 2} grill

As the example shows, additional arguments can be specified; only the last argument is passed through the generated "function nest":

proc f {x} {foo 1 [bar 2 [grill $x]]}

But the name of the generated function is much nicer than "f": namely, the complete call to "o" is used, so the example proc has the name

"o {foo 1} {bar 2} grill"

which is pretty self-documenting ;-) I implemented "o" like this:


proc o args {
   # combine the functions in args, return the created name
   set name [info level 0]
   set body "[join $args " \["] \$x"
   append body [string repeat \] [expr {[llength $args]-1}]]
   proc $name x $body
   set name
}
# Now for the rendering framework:
proc fim {f {zoom 100} {width 200} {height -}} {
   # produce a photo image by applying function f to pixels
   if {$height=="-"} {set height $width}
   set im [image create photo -height $height -width $width]
   set data {}
   set xs {}
   for {set j 0} {$j<$width} {incr j} {
       lappend xs [expr {($j-$width/2.)/$zoom}]
   }
   for {set i 0} {$i<$height} {incr i} {
       set row {}
       set y [expr {($i-$height/2.)/$zoom}]
       foreach x $xs {
           lappend row [$f [list $x $y]]
       }
       lappend data $row
   }
   $im put $data
   set im
}

Basic imaging functions ("drawers") have the common functionality point -> color, where point is a pair {x y} (or, after applying a polar transform, {r a}...) and color is a Tk color name, like "green" or #010203:

proc  vstrip p {
   # a simple vertical bar
   b2c [expr {abs([lindex $p 0]) < 0.5}]
}
proc udisk p {
   # unit circle with radius 1
   foreach {x y} $p break
   b2c [expr {hypot($x,$y) < 1}]
}
proc xor {f1 f2 p} {
   lappend f1 $p; lappend f2 $p
   b2c [expr {[eval $f1] != [eval $f2]}]
}
proc and {f1 f2 p} {
   lappend f1 $p; lappend f2 $p
   b2c [expr {[eval $f1] == "#000" && [eval $f2] == "#000"}]
}
proc checker p {
   # black and white checkerboard
   foreach {x y} $p break
   b2c [expr {int(floor($x)+floor($y)) % 2 == 0}]
}
proc gChecker p {
   # greylevels correspond to fractional part of x,y
   foreach {x y} $p break
   g2c [expr {(fmod(abs($x),1.)*fmod(abs($y),1.))}]
}
proc bRings p {
   # binary concentric rings
   foreach {x y} $p break
   b2c [expr {round(hypot($x,$y)) % 2 == 0}]
}
proc gRings p {
   # grayscale concentric rings
   foreach {x y} $p break
   g2c [expr {(1 + cos(3.14159265359 * hypot($x,$y))) / 2.}]
}
proc radReg {n p} {
   # n wedge slices starting at (0,0)
   foreach {r a} [toPolars $p] break
   b2c [expr {int(floor($a*$n/3.14159265359))%2 == 0}]
}
proc xPos p {b2c [expr {[lindex $p 0]>0}]}
proc cGrad p {
   # color gradients - best watched at zoom=100
   foreach {x y} $p break
   if {abs($x)>1.} {set x 1.}
   if {abs($y)>1.} {set y 1.}
   set r [expr {int((1.-abs($x))*255.)}]
   set g [expr {int((sqrt(2.)-hypot($x,$y))*180.)}]
   set b [expr {int((1.-abs($y))*255.)}]
   c2c $r $g $b
}

Beyond the examples in Conal Elliott's paper, I found out that function imaging can also be abused for a (slow and imprecise) function plotter, which displays the graph for y = f(x) if you call it with $y + f($x) as first argument:

proc fplot {expr p} {
   foreach {x y} $p break
   b2c [expr abs($expr)<=0.04] ;# double eval required here!
}

Here is a combinator for two binary images that shows in different colors for which point both or either are "true" - nice but slow:}

proc bin2 {f1 f2 p} {
   set a [eval $f1 [list $p]]
   set b [eval $f2 [list $p]]
   expr {
       $a == "#000" ?
           $b == "#000" ? "green"
           : "yellow"
       : $b == "#000" ? "blue"
       : "black"
   }
}
#--------------------------------------- Pixel converters:
proc g2c {greylevel} {
   # convert 0..1 to #000000..#FFFFFF
   set hex [format %02X [expr {round($greylevel*255)}]]
   return #$hex$hex$hex
}
proc b2c {binpixel} {
   # 0 -> white, 1 -> black
   expr {$binpixel? "#000" : "#FFF"}
}
proc c2c {r g b} {
   # make Tk color name: {0 128 255} -> #0080FF
   format #%02X%02X%02X $r $g $b
}
proc bPaint {color0 color1 pixel} {
   # convert a binary pixel to one of two specified colors
   expr {$pixel=="#000"? $color0 : $color1}
}

if 0 {This painter colors a grayscale image in hues of the given color. It normalizes the given color through dividing by the corresponding values for "white", but appears pretty slow too:}

proc gPaint {color pixel} {
   set abspixel [lindex [rgb $pixel] 0]
   set rgb [rgb $color]
   set rgbw [rgb white]
   foreach var {r g b} in $rgb ref $rgbw {
       set $var [expr {round(double($abspixel)*$in/$ref/$ref*255.)}]
   }
   c2c $r $g $b
}

if 0 {This proc caches the results of [winfo rgb] calls, because these are quite expensive, especially on remote X displays - rmax}

proc rgb {color} {
   upvar "#0" rgb($color) rgb
   if {![info exists rgb]} {set rgb [winfo rgb . $color]}
   set rgb
}
#------------------------------ point -> point transformers
proc fromPolars p {
   foreach {r a} $p break
   list [expr {$r*cos($a)}] [expr {$r*sin($a)}]
}
proc toPolars p {
   foreach {x y} $p break
   # for Sun, we have to make sure atan2 gets no two 0's
   list [expr {hypot($x,$y)}] [expr {$x||$y? atan2($y,$x): 0}]
}
proc radInvert p {
   foreach {r a} [toPolars $p] break
   fromPolars [list [expr {$r? 1/$r: 9999999}] $a]
}
proc rippleRad {n s p} {
   foreach {r a} [toPolars $p] break
   fromPolars [list [expr {$r*(1.+$s*sin($n*$a))}] $a]
}
proc slice {n p} {
   foreach {r a} $p break
   list $r [expr {$a*$n/3.14159265359}]
}
proc rotate {angle p} {
   foreach {x y} $p break
   set x1 [expr {$x*cos(-$angle) - $y*sin(-$angle)}]
   set y1 [expr {$y*cos(-$angle) + $x*sin(-$angle)}]
   list $x1 $y1
}
proc swirl {radius p} {
   foreach {x y} $p break
   set angle [expr {hypot($x,$y)*6.283185306/$radius}]
   rotate $angle $p
}

if 0 {Now comes the demo program. It shows the predefined basic image operators, and some combinations, on a button bar. Click on one, have some patience, and the corresponding image will be displayed on the canvas to the right. You can also experiment with image operators in the entry widget at bottom - hit <Return> to try. The text of sample buttons is also copied to the entry widget, so you can play with the parameters, or rewrite it as you wish. Note that a well-formed funimj composition consists of:

  • the composition operator "o"
  • zero or more "painters" (color -> color)
  • one "drawer" (point -> color)
  • zero or more "transformers" (point -> point)

}

proc fim'show {c f} {
   $c delete all
   set ::try $f ;# prepare for editing
   set t0 [clock seconds]
   . config -cursor watch
   update ;# to make the cursor visible
   $c create image 0 0 -anchor nw -image [fim $f $::zoom]
   wm title . "$f: [expr [clock seconds]-$t0] seconds"
   . config -cursor {}
}
 proc fim'try {c varName} {
   upvar #0 $varName var
   $c delete all
   if [catch {fim'show $c [eval $var]}] {
       $c create text 10 10 -anchor nw -text $::errorInfo
   }
}

if 0 {Composed functions need only be mentioned once, which creates them, and they can later be picked up by info procs. The o looks nicely bullet-ish here..}

o bRings
o cGrad
o checker
o gRings
o vstrip
o xPos
o {bPaint brown beige} checker
o checker {slice 10} toPolars
o checker {rotate 0.1}
o vstrip {swirl 1.5}
o checker {swirl 16}
o {fplot {$y + exp($x)}}
o checker radInvert
o gRings {rippleRad 8 0.3}
o xPos {swirl .75}
o gChecker
o {gPaint red} gRings
o {bin2 {radReg 7} udisk}
#----------------------------------------------- testing
frame .f2
set c [canvas .f2.c]
set e [entry .f2.e -bg white -textvar try]
bind $e <Return> [list fim'try $c ::try]
scale .f2.s -from 1 -to 100 -variable zoom -ori hori -width 6
#--------------------------------- button bar:
frame .f
set n 0
foreach imf [lsort [info procs "o *"]] {
   button .f.b[incr n] -text $imf -anchor w -pady 0 \
       -command [list fim'show $c $imf]
}
set ::zoom 25
eval pack [winfo children .f] -side top -fill x -ipady 0
eval pack [winfo children .f2] -side top -fill x
pack .f .f2 -side left -anchor n
bind . <Escape> {exec wish $argv0 &; exit} ;# dev helper
bind . ? {console show} ;# dev helper, Win/Mac only

TkPhotoLab

The following code can be used for experiments in image processing, including

  • convolutions (see below)
  • conversion from color to greylevel
  • conversion from greylevel to faux color
  • brightness and contrast modification

Tcl is not the fastest in heavy number-crunching, as needed when going over many thousands of pixels, but I wouldn't consider C for a fun project ;) So take your time, or get a real CPU. At least you can watch the progress, as the target image is updated after every row.

File:TkPhotoLab.jpg

Edge enhancement by Laplace5 filter

The demo UI shows two images, the original on the left, the processing result on the right. You can push the result to the left with Options/Accept. See the menus for what goodies I have supplied. But what most interested me were "convolutions", for which you can edit the matrix (fixed at 3x3 - slow enough..) and click "Apply" to run it over the input image. "C" to set the matrix to all zeroes.

Convolution is a technique where a target pixel is colored according to the sum of the product of a given matrix and its neighbors. As an example, the convolution matrix

1 1 1
1 1 1
1 1 1

colors the pixel in the middle with the average of itself and its eight neighbors, which will myopically blur the picture.

0 0 0
0 1 0
0 0 0

should just faithfully repeat the input picture. These

0  -1  0       -1 -1 -1
-1  5 -1  or:  -1  9 -1
0  -1  0       -1 -1 -1

enhance {horizont,vertic}al edges, and make the image look "crispier". }

proc convolute {inimg outimg matrix} {
   set w [image width  $inimg]
   set h [image height $inimg]
   set matrix [normalize $matrix]
   set shift  [expr {[matsum $matrix]==0? 128: 0}]
   set imat [photo2matrix $inimg]
   for {set i 1} {$i<$h-1} {incr i} {
       set row {}
       for {set j 1} {$j<$w-1} {incr j} {
          foreach var {rsum gsum bsum} {set $var 0.0}
          set y [expr {$i-1}]
          foreach k {0 1 2} {
             set x [expr {$j-1}]
             foreach l {0 1 2} {
                if {[set fac [lindex $matrix $k $l]]} {
                    foreach {r g b} [lindex $imat $y $x] {}
                    set rsum [expr {$rsum + $r * $fac}]
                    set gsum [expr {$gsum + $g * $fac}]
                    set bsum [expr {$bsum + $b * $fac}]
                }
                incr x
             }
             incr y
           }
           if {$shift} {
               set rsum [expr {$rsum + $shift}]
               set gsum [expr {$gsum + $shift}]
               set bsum [expr {$bsum + $shift}]
           }
           lappend row [rgb [clip $rsum] [clip $gsum] [clip $bsum]]
       }
       $outimg put [list $row] -to 1 $i
       update idletasks
   }
}
proc alias {name args} {eval [linsert $args 0 interp alias {} $name {}]}
alias rgb   format #%02x%02x%02x
proc lambda {argl body} {K [set n [info level 0]] [proc $n $argl $body]}
proc K      {a b} {set a}
proc clip   x {expr {$x>255? 255: $x<0? 0: int($x)}}
proc photo2matrix image {
   set w [image width  $image]
   set h [image height $image]
   set res {}
   for {set y 0} {$y<$h} {incr y} {
       set row {}
       for {set x 0} {$x<$w} {incr x} {
           lappend row [$image get $x $y]
       }
       lappend res $row
   }
   set res
}
proc normalize matrix {
    #-- make sure all matrix elements add up to 1.0
    set sum [matsum $matrix]
    if {$sum==0} {return $matrix} ;# no-op on zero sum
    set res {}
    foreach inrow $matrix {
        set row {}
        foreach el $inrow {lappend row [expr {1.0*$el/$sum}]}
        lappend res $row
    }
    set res
}
proc matsum matrix {expr [join [join $matrix] +]}

The following routines could also be generified into one:

proc color2gray image {
   set w [image width  $image]
   set h [image height $image]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$image get $j $i] break
           set y [expr {int(0.299*$r + 0.587*$g + 0.114*$b)}]
           lappend row [rgb $y $y $y]
       }
       $image put [list $row] -to 0 $i
       update idletasks
   }
}
proc color2gray2 image {
   set i -1
   foreach inrow [photo2matrix $image] {
       set row {}
       foreach pixel $inrow {
           foreach {r g b} $pixel break
           set y [expr {int(($r + $g + $b)/3.)}]
           lappend row [rgb $y $y $y]
       }
       $image put [list $row] -to 0 [incr i]
       update idletasks
   }
}

An experiment in classifying graylevels into unreal colors:

proc gray2color image {
   set i -1
   set colors {black darkblue blue purple red orange yellow white}
   set n [llength $colors]
   foreach inrow [photo2matrix $image] {
       set row {}
       foreach pixel $inrow {
           set index [expr {[lindex $pixel 0]*$n/256}]
           lappend row [lindex $colors $index]
       }
       $image put [list $row] -to 0 [incr i]
       update idletasks
   }
}
proc grayWedge image {
   $image blank
   for {set i 0} {$i<256} {incr i} {
       $image put [rgb $i $i $i] -to $i 0 [expr {$i+1}] 127
   }
}

A number of algorithms are very similar, distinguished only by a few commands in the center. Hence I made them generic, and they take a function name that is applied to every pixel rgb, resp. a pair of pixel rgb's. They are instantiated by an alias that sets the function fancily as a lambda:

proc generic_1 {f target source} {
   set w [image width  $source]
   set h [image height $source]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$source get $j $i] break
           lappend row [rgb [$f $r] [$f $g] [$f $b]]
       }
       $target put [list $row] -to 0 $i
       update idletasks
   }
}
alias invert    generic_1 [lambda x {expr {255-$x}}]
alias contrast+ generic_1 [lambda x {clip [expr {128+($x-128)*1.25}]}]
alias contrast- generic_1 [lambda x {clip [expr {128+($x-128)*0.8}]}]
proc generic_2 {f target with} {
   set w [image width  $target]
   set h [image height $target]
   for {set i 0} {$i<$h} {incr i} {
       set row {}
       for {set j 0} {$j<$w} {incr j} {
           foreach {r g b} [$target get $j $i] break
           foreach {r1 g1 b1} [$with get $j $i] break
           lappend row [rgb [$f $r $r1] [$f $g $g1] [$f $b $b1]]
       }
       $target put [list $row] -to 0 $i
       update idletasks
   }
}
alias blend      generic_2 [lambda {a b} {expr {($a+$b)/2}}]
alias difference generic_2 [lambda {a b} {expr {255-abs($a-$b)}}]

A histogram is a count of which color value occurred how often in the current image, separately for red, green and blue. For graylevel images, the displayed "curves" should exactly overlap, so you see only the blue dots that are drawn last.

proc histogram {image {channel 0}} {
   set w [image width  $image]
   set h [image height $image]
   for {set i 0} {$i<256} {incr i} {set hist($i) 0}
   for {set i 0} {$i<$h} {incr i} {
       for {set j 0} {$j<$w} {incr j} {
           incr hist([lindex [$image get $j $i] $channel])
       }
   }
   set res {}
   for {set i 0} {$i<256} {incr i} {lappend res $hist($i)}
   set res
}
proc drawHistogram {target input} {
   $target blank
   set a [expr {6000./([image height $input]*[image width $input])}]
   foreach color {red green blue} channel {0 1 2} {
       set i -1
       foreach val [histogram $input $channel] {
           $target put $color -to [incr i] \
               [clip [expr {int(128-$val*$a)}]]
       }
       update idletasks
   }
}

Demo UI:

if {[file tail [info script]] eq [file tail $argv0]} {
   package require Img ;# for JPEG etc.
   proc setFilter {w matrix} {
       $w delete 1.0 end
       foreach row $matrix {$w insert end [join $row \t]\n}
       set ::info "Click 'Apply' to use this filter"
   }
   label .title -text TkPhotoLab -font {Helvetica 14 italic} -fg blue
   label .( -text ( -font {Courier 32}
   set txt [text .t -width 20 -height 3]
   setFilter .t {{0 -1 0} {-1 5 -1} {0 -1 0}}
   label .) -text ) -font {Courier 32}
   button .c -text C -command {setFilter .t {{0 0 0} {0 0 0} {0 0 0}}}
   grid .title .( .t .) .c -sticky news
   button .apply -text Apply -command applyConv
   grid x ^ ^ ^ .apply -sticky ew
   grid [label .0 -textvar info] - - -sticky w
   grid [label .1] - [label .2] - - -sticky new
   proc loadImg { {fn ""}} {
       if {$fn==""} {set fn [tk_getOpenFile]}
       if {$fn != ""} {
           cd [file dirname [file join [pwd] $fn]]
           set ::im1 [image create photo -file $fn]
           .1 config -image $::im1
           set ::im2 [image create photo]
           .2 config -image $::im2
           $::im2 copy $::im1 -shrink
           set ::info "Loaded image 1 from $fn"
       }
   }
   proc saveImg { {fn ""}} {
       if {$fn==""} {set fn [tk_getSaveFile]}
       if {$fn != ""} {
           $::im2 write $fn -format JPEG
           set ::info "Saved image 2 to $fn"
       }
   }
   proc applyConv {} {
       set ::info "Convolution running, have patience..."
       set t0 [clock clicks -milliseconds]
       convolute $::im1 $::im2 [split [$::txt get 1.0 end] \n]
       set dt [expr {([clock click -milliseconds]-$t0)/1000.}]
       set ::info "Ready after $dt sec"
   }

A little wrapper for simplified menu creation - see below for its use:

   proc m+ {head name {cmd ""}} {
       if {![winfo exists .m.m$head]} {
           .m add cascade -label $head -menu [menu .m.m$head -tearoff 0]
       }
       if [regexp ^-+$ $name] {
           .m.m$head add separator
       } else {.m.m$head add command -label $name -comm $cmd}
   }
   . config -menu [menu .m]
   m+ File Open.. loadImg
   m+ File Save.. saveImg
   m+ File ---
   m+ File Exit   exit
   m+ Edit Blend      {blend $im2 $im1}
   m+ Edit Difference {difference $im2 $im1}
   m+ Edit ---
   m+ Edit Negative   {invert     $im2 $im1}
   m+ Edit Contrast+  {contrast+  $im2 $im1}
   m+ Edit Contrast-  {contrast-  $im2 $im1}
   m+ Edit ---
   m+ Edit Graylevel  {$im2 copy $im1 -shrink; color2gray  $im2}
   m+ Edit Graylevel2 {$im2 copy $im1 -shrink; color2gray2 $im2}
   m+ Edit "Add Noise" {
       generic_1 [lambda x {expr {rand()<.01? int(rand()*255):$x}}] $im2 $im1
   }
   m+ Edit gray2color {$im2 copy $im1 -shrink; gray2color $im2}
   m+ Edit Octary {generic_1 [lambda x {expr {$x>127? 255:0}}] $im2 $im1}
   m+ Edit ---
   m+ Edit HoriMirror {$im2 copy $im1 -shrink -subsample -1 1}
   m+ Edit VertMirror {$im2 copy $im1 -shrink -subsample 1 -1}
   m+ Edit "Upside down" {$im2 copy $im1 -shrink -subsample -1 -1}
   m+ Edit ---
   m+ Edit "Zoom x 2" {$im2 copy $im1 -shrink -zoom 2}
   m+ Edit "Zoom x 3" {$im2 copy $im1 -shrink -zoom 3}
   m+ Options "Accept (1<-2)" {$im1 copy $im2 -shrink}
   m+ Options ---
   m+ Options "Gray wedge" {grayWedge $im2}
   m+ Options Histogram  {drawHistogram $im2 $im1}
   m+ Filter Clear {setFilter .t {{0 0 0} {0 0 0} {0 0 0}}}
   m+ Filter ---
   m+ Filter Blur0  {setFilter .t {{1 1 1} {1 0 1} {1 1 1}}}
   m+ Filter Blur1  {setFilter .t {{1 1 1} {1 1 1} {1 1 1}}}
   m+ Filter Gauss2 {setFilter .t {{1 2 1} {2 4 2} {1 2 1}}}
   m+ Filter ---
   m+ Filter Laplace5 {setFilter .t {{0 -1 0} {-1 5 -1} {0 -1 0}}}
   m+ Filter Laplace9 {setFilter .t {{-1 -1 -1} {-1 9 -1} {-1 -1 -1}}}
   m+ Filter LaplaceX {setFilter .t {{1 -2 1} {-2 5 -2} {1 -2 1}}}
   m+ Filter ---
   m+ Filter Emboss   {setFilter .t {{2 0 0} {0 -1 0} {0 0 -1}}}
   m+ Filter HoriEdge {setFilter .t {{-1 -1 -1} {0 0 0} {1 1 1}}}
   m+ Filter VertEdge {setFilter .t {{-1 0 1} {-1 0 1} {-1 0 1}}}
   m+ Filter SobelH   {setFilter .t {{1 2 1} {0 0 0} {-1 -2 -1}}}
   m+ Filter SobelV   {setFilter .t {{1 0 -1} {2 0 -2} {1 0 -1}}}
   bind . <Escape> {exec wish $argv0 &; exit}
   bind . <F1> {console show}
   loadImg aaa.jpg
}

Appendix

Resources

For Web development, there are:

  • pure-Tcl Web servers like Tclhttpd, which can be embedded in your Tcl-enabled applications to provide Web-based interfaces,
  • Tcl-based Web servers like AOLServer, which are much faster than their pure-Tcl cousins and able to support a heavy load, and
  • Tcl Web server modules (see Apache Tcl for examples) that enable Tcl Web applications with bog-standard Web servers.

ActiveState Community License

This is the license of the "Batteries-included" ActiveTcl distribution. Note that you cannot redistribute ActiveTcl "outside your organization" without written permission. The part between "AGREEMENT" and "Definitions:" is the more free, BSD-style license of Tcl and Tk itself.

Preamble:

The intent of this document is to state the conditions under which the Package (ActiveTcl) may be copied and distributed, such that ActiveState maintains control over the development and distribution of the Package, while allowing the users of the Package to use the Package in a variety of ways.

The Package may contain software covered by other licenses:

TCL LICENSE AGREEMENT

This software is copyrighted by the Regents of the University of California, Sun Microsystems, Inc., Scriptics Corporation, and other parties. The following terms apply to all files associated with the software unless explicitly disclaimed in individual files.

The authors hereby grant permission to use, copy, modify, distribute, and license this software and its documentation for any purpose, provided that existing copyright notices are retained in all copies and that this notice is included verbatim in any distributions. No written agreement, license, or royalty fee is required for any of the authorized uses. Modifications to this software may be copyrighted by their authors and need not follow the licensing terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply.

IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, ITS DOCUMENTATION, OR ANY DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

GOVERNMENT USE: If you are acquiring this software on behalf of the U.S. government, the Government shall have only "Restricted Rights" in the software and related documentation as defined in the Federal Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2). If you are acquiring the software on behalf of the Department of Defense, the software shall be classified as "Commercial Computer Software" and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs. Notwithstanding the foregoing, the authors grant the U.S. Government and others acting in its behalf permission to use and distribute the software in accordance with the terms specified in this license.

Definitions:

"ActiveState" refers to ActiveState Corp., the Copyright Holder of the Package.

"Package" refers to those files, including, but not limited to, source code, binary executables, images, and scripts, which are distributed by the Copyright Holder.

"You" is you, if you are thinking about copying or distributing this Package.

Terms:

1. You may use this Package for commercial or non-commercial purposes without charge.

2. You may make and give away verbatim copies of this Package for personal use, or for use within your organization, provided that you duplicate all of the original copyright notices and associated disclaimers. You may not distribute copies of this Package, or copies of packages derived from this Package, to others outside your organization without specific prior written permission from ActiveState (although you are encouraged to direct them to sources from which they may obtain it for themselves).

3. You may apply bug fixes, portability fixes, and other modifications derived from ActiveState. A Package modified in such a way shall still be covered by the terms of this license.

4. ActiveState's name and trademarks may not be used to endorse or promote packages derived from this Package without specific prior written permission from ActiveState.

5. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

ActiveState Community License Copyright (C) 2001-2003 ActiveState Corp. All rights reserved.