Introduction to newLISP/Strings

From Wikibooks, open books for an open world
Jump to: navigation, search

Strings[edit]

String-handling tools are an important part of a programming language. newLISP has many easy to use and powerful string handling tools, and you can easily add more tools to your toolbox if your particular needs aren't met.

Here's a guided tour of newLISP's string orchestra.

Strings in newLISP code[edit]

You can write strings in three ways:

  • enclosed in double quotes
  • embraced by curly braces
  • marked-up by markup codes

like this:

(set 's "this is a string")
(set 's {this is a string})
(set 's [text]this is a string[/text])

All three methods can handle strings of up to 2048 characters. For strings longer than 2048 characters, always use the [text] and [/text] tags to enclose the string.

Always use the first method, quotation marks, if you want escaped characters such as \n and \t, or code numbers (\046), to be processed.

(set 's "this is a string \n with two lines")
(println s)
this is a string 
with two lines
(println "\110\101\119\076\073\083\080")    ; decimal ASCII
newLISP
(println "\x6e\x65\x77\x4c\x49\x53\x50")    ; hex ASCII
newLISP

The double quotation character must be escaped with backslashes, as must a backslash, if you want them to appear inside a string.

Use the second method, braces (or 'curly brackets'), for strings shorter than 2048 characters when you don't want any escaped characters to be processed:

(set 's {strings can be enclosed in \n"quotation marks" \n })
(println s)
strings can be enclosed in \n"quotation marks" \n

This is a really useful way of writing strings, because you don't have to worry about putting backslashes before every quotation character, or backslashes before other backslashes. You can nest pairs of braces inside a braced string, but you can't have an unmatched brace. I like to use braces for strings, because they face the correct way (which plain dumb quotation marks don't) and because your text editor might be able to balance and match them.

The third method, using [text] and [/text] markup tags, is intended for longer text strings running over many lines, and is used automatically by newLISP when it outputs large amounts of text. Again, you don't have to worry about which characters you can and can't include - you can put anything you like in, with the obvious exception of [/text]. Escape characters such as \n or \046 aren't processed either.

(set 'novel (read-file {my-latest-novel.txt}))
 
;->
[text]
It was a dark and "stormy" night...
...
The End.
[/text]


If you want to know the length of a string, use length:

(length novel)
;-> 575196


Strings of millions of characters can be handled easily by newLISP.

Rather than length, use utf8len to get the length of a Unicode string:

(utf8len (char 955))
;-> 1
 
(length (char 955))
;-> 2


Making strings[edit]

Many functions, such as the file-reading ones, return strings or lists of strings for you. But if you want to build a string from scratch, one way is to start with the char function. This converts the supplied number to the equivalent character string with that code number. It can also reverse the operation, converting the supplied character string to its equivalent code number.)

(char 33)
;-> "!"
(char "!")
;-> 33
(char 955)       ; Unicode lambda character, decimal code
;-> "\206\187"
(char 0x2643)    ; Unicode symbol for Jupiter, hex code
;-> "\226\153\131"


These last two examples are available when you're running the Unicode-capable version of newLISP. Since Unicode is hexadecimally inclined, you can give a hex number, starting with 0x, to char. To see the actual characters, use a printing command:

(println (char 955))

λ

;-> "\206\187"
(println (char 0x2643))

;-> "\226\140\152"
 
(println (char (int (string "0x" "2643"))))    ; equivalent

;-> "\226\140\152"


The backslashed numbers are the result of the println function, presumably the multi-byte values of the Unicode glyph.

You can use char to build strings in other ways:

(join (map char (sequence (char "a") (char "z"))))
;-> "abcdefghijklmnopqrstuvwxyz"


This uses char to find out the ASCII code numbers for a and z, and then uses sequence to generate a list of code numbers between the two. Then the char function is mapped onto every element of the list, so producing a list of strings. Finally, this list is converted to a single string by join.

join can also take a separator when building strings:

(join (map char (sequence (char "a") (char "z"))) "-")
;-> "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z"

Similar to join is append, which works directly on strings:

(append "con" "cat" "e" "nation")
;-> "concatenation"


but even more useful is string, which turns any collection of numbers, lists, and strings into a single string.

(string '(sequence 1 10) { produces } (sequence 1 10) "\n")
;-> (sequence 1 10) produces (1 2 3 4 5 6 7 8 9 10)


Notice that the first list wasn't evaluated (because it was quoted) but that the second list was evaluated to produce a list of numbers, and the resulting list - including the parentheses - was converted to a string.

The string function, combined with the various string markers such as braces and markup tags, is one way to include the values of variables inside strings:

(set 'x 42)
(string {the value of } 'x { is } x) 
;-> "the value of x is 42"

You can also use format to combine strings and symbol values. See Formatting strings.

dup makes copies:

(dup "spam" 10)
;-> "spamspamspamspamspamspamspamspamspamspam"

And date makes a date string:

(date)
;-> "Wed Jan 25 15:04:49 2006"

or you can give it a number of seconds since 1970 to convert:

(date 1230000000) 
;-> "Tue Dec 23 02:40:00 2008"


See Working with dates and times.

String surgery[edit]

Now you've got your string, there are plenty of functions for operating on them. Some of these are destructive functions - they change the string permanently, possibly losing information for ever. Others are constructive, producing a new string and leaving the old one unharmed. See Destructive functions.

reverse is destructive:

(set 't "a hypothetical one-dimensional subatomic particle")
(reverse t)
;-> "elcitrap cimotabus lanoisnemid-eno lacitehtopyh a"

Now t has changed for ever. However, the case-changing functions aren't destructive, producing new strings without harming the old ones:

(set 't "a hypothetical one-dimensional subatomic particle")
 
(upper-case t)
;-> "A HYPOTHETICAL ONE-DIMENSIONAL SUBATOMIC PARTICLE"
 
(lower-case t)
;-> "a hypothetical one-dimensional subatomic particle"
 
(title-case t)
;-> "A hypothetical one-dimensional subatomic particle"

Substrings[edit]

If you know which part of a string you want to extract, use one of the following constructive functions:

(set 't "a hypothetical one-dimensional subatomic particle")
(first t)
;-> "a"
 
(rest t)
;-> " hypothetical one-dimensional subatomic particle"
 
(last t)
;-> "e"
 
(t 2)            ; counting from 0
;-> "h"

You can also use this technique with lists. See Selecting items from lists.

String slices[edit]

slice gives you a new slice of an existing string, counting either forwards from the cut (positive integers) or backwards from the end (negative integers), for the given number of characters or to the specified position:

(set 't "a hypothetical one-dimensional subatomic particle")
(slice t 15 13)
;-> "one-dimension"
 
(slice t -8 8)
;-> "particle"
 
(slice t 2 -9)
;-> "hypothetical one-dimensional subatomic"
 
(slice "schwarzwalderkirschtorte" 19 -1)
;-> "tort"

There's a shortcut to do this, too. Put the required start and length before the string in a list:

(15 13 t)
;-> "one-dimension"
 
(0 14 t)
;-> "a hypothetical"

If you don't want a continuous run of characters, but want to cherry-pick some of them for a new string, use select followed by a sequence of character index numbers:

(set 't "a hypothetical one-dimensional subatomic particle")
(select t 3 5 24 48 21 10 44 8)
;-> "yosemite"
 
(select t (sequence 1 49 12)) ; every 12th char starting at 1
;-> " lime"

which is good for finding secret coded messages buried in text.

Changing the ends of strings[edit]

trim and chop are both constructive string-editing functions that work from the ends of the original strings inwards.

chop works from the end:

(chop t)       ; defaults to the last character
;-> "a hypothetical one-dimensional subatomic particl"
 
(chop t 9)     ; chop 9 characters off
;-> "a hypothetical one-dimensional subatomic"

trim can remove characters from both ends:

(set 's "        centred       ")
(trim s)            ; defaults to removing spaces
;-> "centred"
 
(set 's "------centred------")
(trim s "-")
;-> "centred"
 
(set 's "------centred******")
(trim s "-" "*")    ; front and back
;-> "centred"

push and pop work on strings too[edit]

You've seen push and pop adding and removing items from lists. They work on strings too. Use push to add characters to a string, and pop to remove one character from a string. Strings are added to or removed from the start of the string, unless you specify an index.

(set 't "some ")
(push "this is " t)
(push "text " t -1)
;-> t is now "this is some text"

pop always returns what was popped, but push returns the modified target of the action. It's useful when you want to break up a string and process the pieces as you go. For example, to print the newLISP version number, which is stored as a 4 or 5 digit integer, use something like this:

(set 'version-string (string (sys-info -2)))
; eg: version-string is now "10303"
(set 'dev-version (pop version-string -2 2))   ; always two digits
; dev-version is "03", version-string is "103"
(set 'point-version (pop version-string -1))   ; always one digit
; point-version is "3", version-string is now "10"
(set 'version version-string)                  ; one or two digits
(println version "." point-version "." dev-version)
10.3.03

It's easier to work from the right-hand side of the string and use pop to extract the information and remove it in one operation.

Modifying strings[edit]

There are two approaches to changing characters inside a string. Either use the index numbers of the characters, or specify the substring you want to find or change.

Using index numbers in strings[edit]

To change characters by their index numbers, use setf, the general purpose function for changing strings, lists, and arrays:

(set 't "a hypothetical one-dimensional subatomic particle")
(setf (t 0) "A")
;-> "A"
t
;-> "A hypothetical one-dimensional subatomic particle"

You could also use nth with setf to specify the location:

(set 't "a hypothetical one-dimensional subatomic particle")
;-> "a hypothetical one-dimensional subatomic particle"
(setf (nth 0 t) "A")
;-> "A"
t
;-> "A hypothetical one-dimensional subatomic particle"

Here's how to 'increment' the first (zeroth) letter of a string:

(set 'wd "cream")
;-> "cream"
(setf (wd 0) (char (+ (char $it) 1)))
;-> "d"
wd
;-> "dream"

$it contains the value found by the first part of the setf expression, and its numeric value is incremented to form the second part.

Changing substrings[edit]

If you don't want to - or can't - use index numbers or character positions, use replace, a powerful destructive function that does all kinds of useful operations on strings. Use it in the form:

(replace old-string source-string replacement)

So:

(set 't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" t "theor")
;-> "a theoretical one-dimensional subatomic particle"

replace is destructive, but if you want to use replace or another destructive function constructively for its side effects, without modifying the original string, use the copy function:

(set 't "a hypothetical one-dimensional subatomic particle")
(replace "hypoth" (copy t) "theor")
;-> "a theoretical one-dimensional subatomic particle"
t
;-> "a hypothetical one-dimensional subatomic particle"

The copy is modified by replace. The original string t is unaffected.

Regular expressions[edit]

replace is one of a group of newLISP functions that accept regular expressions for defining patterns in text. For most of them, you add an extra number at the end of the expression which specifies options for the regular expression operation: 0 means basic matching, 1 means case-insensitive matching, and so on.

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {h.*?l(?# h followed by l but not too greedy)} t {} 0) 
 
;-> "a  one-dimensional subatomic particle"

Sometimes I put comments inside regular expressions, so that I know what I was trying to do when I read the code some days later. Text between (?# and the following closing parenthesis is ignored.

If you're happy working with Perl-compatible Regular Expressions (PCRE), you'll be happy with replace and its regex-using cousins (find, regex, find-all, parse, starts-with, ends-with, directory, and search ). Full details are in the newLISP reference manual.

You have to steer your pattern through both the newLISP reader and the regular expression processor. Remember the difference between strings enclosed in quotes and strings enclosed in braces? Quotes allow the processing of escaped characters, whereas braces don't. Braces have some advantages: they face each other visually, they don't have smart and dumb versions to confuse you, your text editor might balance them for you, and they let you use the more commonly occurring quotation characters in strings without having to escape them all the time. But if you use quotes, you must double the backslashes, so that a single backslash survives intact as far as the regular expression processor:

(set 'str "\s")
(replace str "this is a phrase" "|" 0)  ; oops, not searching for \s (white space) ...
;-> thi| i| a phra|e                    ; but for the letter s 
 
(set 'str "\\s")
(replace str "this is a phrase" "|" 0)
;-> this|is|a|phrase                    ; ah, better!

System variables: $0, $1 ...[edit]

replace updates a set of system variables $0, $1, $2, up to $15, with the matches. These refer to the parenthesized expressions in the pattern, and are the equivalent of the \1, \2 that you might be familiar with if you've used grep. For example:

(set 'quotation {"I cannot explain." She spoke in a low, eager voice,
with a curious lisp in her utterance. "But for God's sake do what I 
ask you. Go back and never set foot upon the moor again."})
 
(replace {(.*?),.*?curious\s*(l.*p\W)(.*?)(moor)(.*)} 
    quotation 
    (println {$1 } $1 { $2 } $2 { $3 } $3 { $4 } $4 { $5 } $5)
    4)
$1 "I cannot explain." She spoke in a low $2 lisp  $3 in her utterance.
"But for God's sake do what I ask you. Go back and never set foot upon 
the $4 moor $5 again."

Here we've looked for five patterns, separated by any string starting with a comma and ending with the word curious. $0 stores the matched expression, $1 stores the first parenthesized sub-expression, and so on.

If you prefer to use quotation marks rather than the braces I used here, remember that certain characters have to be escaped with a backslash.

The replacement expression[edit]

The previous example demonstrates that an important feature of replace is that the replacement doesn't have to be just a simple string or list, it can be any newLISP expression. Each time the pattern is found, the replacement expression is evaluated. You can use this to provide a replacement value that's calculated dynamically, or you could do anything else you wanted to with the found text. It's even possible to evaluate an expression that's got nothing to do with found text at all.

Here's another example: search for the letter t followed either by the letter h or by any vowel, and print out the combinations that replace found:

(set 't "a hypothetical one-dimensional subatomic particle")
(replace {t[h]|t[aeiou]} t (println $0) 0)
th
ti
to
ti
;-> "a hypothetical one-dimensional subatomic particle"

For every matching piece of text found, the third expression

(println $0)

was evaluated. This is a good way of seeing what the regular expression engine is up to while the function is running. In this example, the original string appears to be unchanged, but in fact it did change, because (println $0) did two things: it printed the string, and it returned the value to replace, thus replacing the found text with itself. Invisible mending! If the replacement expression doesn't return a string, no replacement occurs.

You could do other useful things too, such as build a list of matches for later processing, and you can use the newLISP system variables and any other function to use any of the text that was found.

In the next example, we look for the letters a, e, or c, and force each occurrence to upper-case:

(replace "a|e|c" "This is a sentence" (upper-case $0) 0)
;-> "This is A sEntEnCE"

As another example, here's a simple search and replace operation that keeps count of how many times the letter 'o' has been found in a string, and replaces each occurrence in the original string with the count so far. The replacement is a block of expressions grouped into a single begin expression. This block is evaluated every time a match is found:

(set 't "a hypothetical one-dimensional subatomic particle")
(set 'counter 0)
(replace "o" t 
 (begin 
  (inc counter)
  (println {replacing "} $0 {" number } counter) 
  (string counter))         ; the replacement text should be a string 
 0)
replacing "o" number 1
replacing "o" number 2
replacing "o" number 3
replacing "o" number 4
"a hyp1thetical 2ne-dimensi3nal subat4mic particle"


The output from println doesn't appear in the string; the final value of the entire begin expression is a string version of the counter, so that gets inserted into the string.

Here's yet another example of replace in action. Suppose I have a text file, consisting of the following:

1 a = 15
2 another_variable = "strings"
4 x2 = "another string"
5 c = 25 
3x=9


I want to write a newLISP script that re-numbers the lines in multiples of 10, starting at 10, and aligns the text so that the equals signs line up, like this:

10 a                   = 15
20 another_variable    = "strings"
30 x2                  = "another string"
40 c                   = 25 
50 x                   = 9

(I don't know what language this is!)

The following script will do this:

(set 'file (open ((main-args) 2) "read"))
(set 'counter 0)
(while (read-line file)
 (set 'temp 
   (replace {^(\d*)(\s*)(.*)}        ; the numbering
     (current-line)
     (string (inc counter 10) " " $3) 
     0))
 (println 
   (replace {(\S*)(\s*)(=)(\s*)(.*)}  ; the spaces around =
    temp 
    (string $1 (dup " " (- 20 (length $1))) $3 " " $5) 
    0)))
(exit)

I've used two replace operations inside the while loop, to keep things clearer. The first one sets a temporary variable to the result of a replace operation. The search string ({^(\d*)(\s*)(.*)}) is a regular expression that's looking for any number at the start of a line, followed by some space, followed by anything. The replacement string ((string (inc counter 10) " " $3) 0)) consists of a incremented counter value, followed by the third match (ie the anything I just looked for).

The result of the second replace operation is printed. I'm searching the temporary variable temp for more strings and spaces with an equals sign in the middle:

({(\S*)(\s*)(=)(\s*)(.*)})

The replacement expression is built up from the important found elements ($1, $3, $5) but it also includes a quick calculation of the amount of space required to bring the equals sign across to character 20, which should be the difference between the first item's width and position 20 (which I've chosen arbitrarily as the location for the equals sign).

Regular expressions aren't very easy for the newcomer, but they're very powerful, particularly with newLISP's replace function, so they're worth learning.

Testing and comparing strings[edit]

There are various tests that you can run on strings. newLISP's comparison operators work by finding and comparing the code numbers of the characters until a decision can be made:

(> {Higgs Boson} {Higgs boson})         ; nil
(> {Higgs Boson} {Higgs})               ; true
(< {dollar} {euro})                     ; true
(> {newLISP} {LISP})                    ; true
(= {fred} {Fred})                       ; nil
(= {fred} {fred})                       ; true

and of course newLISP's flexible argument handling lets you test loads of strings at the same time:

(< "a" "c" "d" "f" "h") 
;-> true

These comparison functions also let you use them with a single argument. If you supply only one argument, newLISP helpfully assumes that you mean 0 or "", depending on the type of the first argument:

(> 1)                               ; true - assumes > 0
(> "fred")                          ; true - assumes > ""

To check whether two strings share common features, you can either use starts-with and ends-with, or the more general pattern matching commands member, regex, find, and find-all. starts-with and ends-with are simple enough:

(starts-with "newLISP" "new")       ; does newLISP start with new?
;-> true
(ends-with "newLISP" "LISP")
;-> true

They can also accept regular expressions, using one of the regex options (0 being the most commonly used):

(starts-with {newLISP} {[a-z][aeiou](?\#lc followed by lc vowel)} 0)
;-> true
(ends-with {newLISP} {[aeiou][A-Z](?\# lc vowel followed by UCase)} 0)
;-> false

find, find-all, member, and regex look everywhere in a string. find returns the index of the matching substring:

(set 't "a hypothetical one-dimensional subatomic particle")
(find "atom" t)
;-> 34
 
(find "l" t)
;-> 13
 
(find "L" t)
;-> nil                             ; search is case-sensitive

member looks to see if one string is in another. It returns the rest of the string, including the search string, rather than the index of the first occurrence.

(member "rest" "a good restaurant")
;-> "restaurant"

Both find and member let you use regular expressions:

(set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})
 
(find "lisp" quotation)            ; without regex
;-> 69                             ; character 69
 
(find {i} quotation 0)             ; with regex
;-> 15                             ; character 15
 
(find {s} quotation 1)             ; case insensitive regex
;-> 20                             ; character 20
 
(println "character " 
 (find {(l.*?p)} quotation 0) ": " $0)  ; l followed by a p
;-> character 13: lain." She sp

find-all works like find, but returns a list of all matching strings, rather than the index of just the first match. It always takes regular expressions, so - for once - you don't have to put regex option numbers at the end.

(set 'quotation {"I cannot explain." She spoke in a low,
eager voice, with a curious lisp in her utterance. "But for
Gods sake do what I ask you. Go back and never set foot upon
the moor again."})
 
(find-all "[aeiou]{2,}" quotation $0)       ; two or more vowels
;-> ("ai" "ea" "oi" "iou" "ou" "oo" "oo" "ai")

Or you could use regex. This returns nil if the string doesn't contain the pattern, but, if it does contain the pattern, it returns a list with the matched strings and substrings, and the start and length of each string. The results can be quite complicated:

(set 'quotation 
 {She spoke in a low, eager voice, with a curious lisp in her utterance.})
 
(println (regex {(.*)(l.*)(l.*p)(.*)} quotation 0))
("She spoke in a low, eager voice, with a curious lisp in
her utterance." 0 70 "She spoke in a " 0 15 "low, eager
voice, with a curious " 15 33 "lisp" 48 4 " in her
utterance." 52 18)


This results list can be interpreted as 'the first match was from character 0 continuing for 70 characters, the second from character 0 continuing for 15 characters, another from character 15 for 33 characters', and so on.

The matches are also stored in the system variables ($0, $1, ...) which you can inspect easily with a simple loop:

(for (x 1 4)
 (println {$} x ": " ($ x)))
$1: She spoke in a 
$2: low, eager voice, with a curious 
$3: lisp 
$4: in her utterance.

Strings to lists[edit]

Two functions let you convert strings to lists, ready for manipulation with newLISP's extensive list-processing powers. The well-named explode function cracks open a string and returns a list of single characters:

(set 't "a hypothetical one-dimensional subatomic particle")
(explode t)
 
:-> ("a" " " "h" "y" "p" "o" "t" "h" "e" "t" "i" "c" "a" "l"
" " "o" "n" "e" "-" "d" "i" "m" "e" "n" "s" "i" "o" "n" "a"
"l" " " "s" "u" "b" "a" "t" "o" "m" "i" "c" " " "p" "a" "r"
"t" "i" "c" "l" "e")


The explosion is easily reversed with join. explode can also take an integer. This defines the size of the fragments. For example, to divide up a string into cryptographer-style 5 letter groups, remove the spaces and use explode like this:

(explode (replace " " t "") 5)
;-> ("ahypo" "theti" "calon" "e-dim" "ensio" "nalsu" "batom" "icpar" "ticle")

You can do similar tricks with find-all. Watch the end, though:

(find-all ".{3}" t)                 ; this regex drops chars!
;-> ("a h" "ypo" "the" "tic" "al " "one" "-di" "men" 
; "sio" "nal" " su" "bat" "omi" "c p" "art" "icl")

Parsing strings[edit]

parse is a powerful way of breaking strings up and returning the pieces. Used on its own, it breaks strings apart, usually at word boundaries, eats the boundaries, and returns a list of the remaining pieces:

(parse t)                               ; defaults to spaces...
;-> ("a" "hypothetical" "one-dimensional" "subatomic" "particle")

Or you can supply a delimiting character, and parse breaks the string whenever it meets that character:

(set 'pathname {/System/Library/Fonts/Courier.dfont})
(parse pathname {/})
;-> ("" "System" "Library" "Fonts" "Courier.dfont")

By the way, I could eliminate that first empty string from the list by filtering it out:

(clean empty? (parse pathname {/}))
;-> ("System" "Library" "Fonts" "Courier.dfont")

You can also specify a delimiter string rather than a delimiter character:

(set 't (dup "spam" 8))
;-> "spamspamspamspamspamspamspamspam"
 
(parse t {am})                          ; break on "am"
;-> ("sp" "sp" "sp" "sp" "sp" "sp" "sp" "sp" "")

Best of all, though, you can specify a regular expression delimiter. Make sure you supply the options flag (0 or whatever), as with most of the regex functions in newLISP:

(set 't {/System/Library/Fonts/Courier.dfont})
(parse t {[/aeiou]} 0)                  ; split at slashes and vowels
;-> ("" "Syst" "m" "L" "br" "ry" "F" "nts" "C" "" "r" "" "r.df" "nt")

Here's that well-known quick and not very reliable HTML-tag stripper:

(set 'html (read-file "/Users/Sites/index.html"))
(println (parse html {<.*?>} 4))        ; option 4: dot matches newline

For parsing XML strings, newLISP provides the function xml-parse. See Working with XML.

Take care when using parse on text. Unless you specify exactly what you want, it thinks you're passing it newLISP source code. This can produce surprising results:

(set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t)
;-> ("Eats" "," "shoots" "," "and" "leaves")    ; she's gone!

The semicolon is considered a comment character in newLISP, so parse has ignored it and everything that followed on that line. Tell it what you really want, using delimiters or regular expressions:

(set 't {Eats, shoots, and leaves ; a book by Lynn Truss})
(parse t " ")
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

or

(parse t "\\s" 0)                   ; white space
;-> ("Eats," "shoots," "and" "leaves" ";" "a" "book" "by" "Lynn" "Truss")

If you want to chop strings up in other ways, consider using find-all, which returns a list of strings that match a pattern. If you can specify the chopping operation as a regular expression, you're in luck. For example, if you want to split a number into groups of three digits, use this technique:

(set 'a "1212374192387562311")
(println (find-all {\d{3}|\d{2}$|\d$} a))
;-> ("121" "237" "419" "238" "756" "231" "1")

; alternatively
(explode a 3)
;-> ("121" "237" "419" "238" "756" "231" "1")

The pattern has to consider cases where there are 2 or 1 digits left over at the end.

parse eats the delimiters once they've done their work - find-all finds things and returns what it finds.

(find-all {\w+} t )                     ; word characters
;-> ("Eats" "shoots" "and" "leaves" "a" "book" "by" "Lynn" "Truss")
 
(parse t {\w+} 0 )                      ; eats and leaves delimiters
;-> ("" ", " ", " " " "; " " " " " " " " " "")

Other string functions[edit]

There are other functions that work with strings. search looks for a string inside a file on disk:

(set 'f (open {/private/var/log/system.log} {read}))
(search f {kernel})
(seek f (- (seek f) 64))                ; rewind file pointer
(dotimes (n 3)
 (println (read-line f)))
(close f)

This example looks in system.log for the string kernel. If it's found, newLISP rewinds the file pointer by 64 characters, then prints out three lines, showing the line in context.

There are also functions for working with base64-encoded files, and for encrypting strings.

Formatting strings[edit]

It's worth mentioning the format function, which lets you insert the values of newLISP expressions into a pre-defined template string. Use %s to represent the location of a string expression inside the template, and other % codes to include numbers. For example, suppose you want to display a list of files like this:

folder: Library
 file:  mach

A suitable template for folders (directories) looks like this:

"folder: %s" ; or
"  file: %s"

Give the format function a template string, followed by the expression (f) that produces a file or folder name:

(format "folder: %s" f) ; or
(format "  file: %s" f)

When this is evaluated, the contents of f is inserted into the string where the %s is. The code to generate a directory listing in this format, using the directory function, looks like this:

(dolist (f (directory)) 
 (if (directory? f)
  (println (format "folder: %s" f))
  (println (format "  file: %s" f))))

I'm using the directory? function to choose the right template string. A typical listing looks like this:

folder: .
folder: ..
  file: .DS_Store
  file: .hotfiles.btree
folder: .Spotlight-V100
folder: .Trashes
folder: .vol
  file: .VolumeIcon.icns
folder: Applications
folder: Applications (Mac OS 9)
folder: automount
folder: bin
folder: Cleanup At Startup
folder: cores
...

There are lots of formatting codes that you use to produce the output you want. You use numbers to control the alignment and precision of the strings and numbers. Just make sure that the % constructions in the format string match the expressions or symbols that appear after it, and that there are the same number of each.

Here's another example. We'll display the first 400 or so Unicode characters in decimal, hexadecimal, and binary. We'll use the bits function to generate a binary string. We feed a list of three values to format after the format string, which has three entries:

(for (x 32 0x01a0)
 (println (char x)                 ; the character, then
   (format "%4d\t%4x\t%10s"        ; decimal \t hex \t binary-string
    (list x x (bits x)))))
   32       20     100000
!  33       21     100001
"  34       22     100010
#  35       23     100011
$  36       24     100100
%  37       25     100101
&  38       26     100110
'  39       27     100111
(  40       28     101000
)  41       29     101001
...

Strings that make newLISP think[edit]

Lastly, I must mention eval and eval-string. Both of these let you give newLISP code to newLISP for evaluation. If it's valid newLISP, you'll see the result of the evaluation. eval wants an expression:

(set 'expr '(+ 1 2))
(eval expr)
;-> 3

eval-string wants a string:

(set 'expr "(+ 1 2)")
(eval-string expr)
;-> 3

This means that you can build newLISP code, using any of the functions we've met, and then have it evaluated by newLISP. eval is particularly useful when you're defining macros - functions that delay evaluation until you choose to do it. See Macros.

You could use eval and eval-string to write programs that write programs.

The following curious piece of newLISP continually and mindlessly rearranges a few strings and tries to evaluate the result. Unsuccessful attempts are safely caught. When it finally becomes valid newLISP, it will be evaluated successfully and the result will satisfy the finishing condition and finish the loop.

(set 'code '(")" "set" "'valid" "true" "("))
(set 'valid nil)
(until valid
 (set 'code (randomize code))
 (println (join code " "))
 (catch (eval-string (join code " ")) 'result))
true 'valid set ) (
) ( set true 'valid
'valid ( set true )
set 'valid true ( )
'valid ) ( true set
set true ) ( 'valid
true ) ( set 'valid
'valid ( true ) set
true 'valid ( ) set
'valid ) ( true set
true ( 'valid ) set
set ( 'valid ) true
set true 'valid ( )
( set 'valid true )

I've used programs that were obviously written using this programming technique...