Jump to content

Programming with ooc/Strings

From Wikibooks, open books for an open world

String type

[edit | edit source]

Usage

[edit | edit source]

The String type is defined in the core module lang/types. String literals such as "abc", are of this type.

str := "This is a string literal"

On the right hand side a String object is created represented by the string literal. Its reference is assigned to the str variable.

A few operators are overloaded for strings by default. For example, ( + ) is used as a string concatenation operator. For example:

str := "I love" + " french cheese"
// is the same as
str := "I love french cheese"

( + ) is overloaded for pretty much every type inherited from C (strings, chars, numeric types) so that you can do

str := "The answer is " + 42

Also overloaded in the standard SDK is [ ], used as a slicing operator

"Sand" println()
// is the same as
"Sandwich"[0..4] println()

Note: the slicing operator applies on bytes, not characters. See the UTF-8 Support section below for more informations.

Many more operators are overloaded on strings, such as == for comparison, * for repeating, [ ]= for bound-checked modification, etc.

Strings in ooc aren't immutable, but every method in the standard type String returns a copy of the given string, and never modifies the original.

The standard type String provides a nice set of methods for string manipulation. Since they all return copies, you should do things like that

 name = name trim() // remove whitespace at both ends

But never:

 name trim() // wrong!

This would create a new trimmed string and throw it away. It is good to study the String type and know its methods in order not to reinvent the wheel.

UTF-8 support

[edit | edit source]

At the time of this writing, there is no built-in UTF-8 support in ooc. The length() function returns the number of bytes used to store the String, never the number of characters.

The reason for that is that there is no clearly defined boundaries between character in the Unicode standard. One can roughly determinate 'grapheme clusters', i.e. associate modifiers with glyphs that correspond to characters, etc. but it's a very difficult problem, and there are edge cases with non-European/non-American languages.

Therefore, for now, there's no UTF-8 character, nor codepoint, etc. but that doesn't prevent one from using UTF-8 in ooc programs. The ICU and utf8proc libraries seem especially interesting for handling such encoding matters in an ooc codebase.

Note that the language design on this issue is not definitive, and is subject to changes in the future, as soon as other, more pressing matters are decided upon.

Length in bytes

[edit | edit source]

As a consequence of the lack of UTF-8 support, the length() methods returns the number of bytes so that:

"o/" length()

is 2, but

"漢字" length()

is 6, because 3 bytes are used to store each of the characters that make up the Japanese word "Kanji"

Creating new strings

[edit | edit source]

You can create a new String from a char:

str := String new('\n')

or just allocate a fixed number of chars:

str := String new(128)

A String literal, such as "abc", is also of type String

str := "Curiosity killed the cat."

Iterating through a string

[edit | edit source]

You can iterate over the bytes of a String, because it implements the iterator() method.

for (c in "Hello, vertical world!") {
  c println()
}

Comparing strings

[edit | edit source]

You can use the == operator to compare two Strings, because it's overloaded in the SDK. It calls the equals() method, so that, in ooc:

 name == "JFK"
 // is the same as
 name equals("JFK")

This behavior greatly enhances the readability of ooc code, as opposed to, say, Java.

You can still compare the addresses of Strings by casting them to Pointers first:

 // equals won't be called here
 name as Pointer == otherName

The compare method can be used to test parts of strings for equality, for example:

 "awesome" compare("we", 1, 2) // is true
 "Turn right" compare("Turn left", 0, 6) // is false

Substrings and slicing

[edit | edit source]

The [ ] operator can be used with a range to obtain the same effect as calling substring

 // both these statements are true
 "happiness"[3..6] == "pin"
 "happiness" substring(3, 6) equals("pin")

Searching in a string

[edit | edit source]

The indexOf() and lastIndexOf() methods allow to search for the first and last occurrence, respectively, of a byte or a string in another string.

 str := "Proud of you, son."
 // returns 6
 str indexOf("of") toString() println()
 // returns 15
 str lastIndexOf('o') toString() println()

Repeating a string

[edit | edit source]

A String can be repeated multiple time using the overloaded ( * ) operator, or with the times() method:

 // these lines print the same thing
 println("The cake is a lie!" * 5)
 "The cake is a lie!" times(5) println()

Note that because of precedence, we can't write:

 // wrong!
 "The cake is a lie!" * 5 println()

Because the compiler would read that as:

 "The cake is a lie" * (5 println())

Which is definitely not what we intended.

Reversing a string

[edit | edit source]

A String can be reversed using the reverse method.

 // prints 'lebon nob el'
 "le bon nobel" reverse() println()

Be aware that reverse() works on bytes, not characters. See the UTF-8 Support section for more information.

Appending strings

[edit | edit source]

You can use either the ( + ) operator or the append() and prepend() methods:

 // results in the string "indirect"
 "in" + "direct"
 "in" append("direct")
 "direct" prepend("in")

As with all other string methods, a copy is returned, the original string is not modified.

However, if you are building a string from many smaller parts, it is better to use a Buffer instead, as detailed below.

StringTokenizer - splitting a string

[edit | edit source]

Buffer - the efficient way to concatenate strings

[edit | edit source]