The Way of the Java/Strings and things

From Wikibooks, open books for an open world
< The Way of the Java
Jump to: navigation, search

Strings and things[edit]

Invoking methods on objects[edit]

In Section graphics we used a Graphics object to draw circles in a window, and I used the phrase ``invoke a method on an object, to refer to the statements like

g.drawOval (0, 0, width, height);

In this case drawOval is the method being invoked on the object named g. At the time I didn't provide a definition of object, and I still can't provide a complete definition, but it is time to try.

object[edit]

In Java and other object-oriented languages, objects are collections of related data that come with a set of methods. These methods operate on the objects, performing computations and sometimes modifying the object's data.

So far we have only seen one object, g, so this definition might not mean much yet. Another example is Strings. Strings are objects (and ints and doubles are not). Based on the definition of object, you might ask What is the data contained in a String object? and What are the methods we can invoke on String objects?

The data contained in a String object are the letters of the string. There are quite a few methods that operate on Strings, but I will only use a few in this book. The rest are documented at the java site

The first method we will look at is charAt, which allows you to extract letters from a String. In order to store the result, we need a variable type that can store individual letters (as opposed to strings). Individual letters are called characters, and the variable type that stores them is called char.

char[edit]

chars work just like the other types we have seen:

   char fred = 'c';
   if (fred == 'c') 
     System.out.println (fred);

Character values appear in single quotes ('c'). Unlike string values (which appear in double quotes), character values can contain only a single letter or symbol.

Here's how the charAt method is used:

   String fruit = "banana";
   char letter = fruit.charAt(1);
   System.out.println (letter);

The syntax fruit.charAt indicates that I am invoking the charAt method on the object named fruit. I am passing the argument 1 to this method, which indicates that I would like to know the first letter of the string. The result is a character, which is stored in a char named letter. When I print the value of letter, I get a surprise:

a

a is not the first letter of "banana". Unless you are a computer scientist. For perverse reasons, computer scientists always start counting from zero. The 0th letter (``zeroeth) of "banana" is b. The 1th letter (``oneth) is a and the 2th (``twoeth) letter is n.

If you want the zereoth letter of a string, you have to pass zero as an argument:

   char letter = fruit.charAt(0);

Length[edit]

The second String method we'll look at is length, which returns the number of characters in the string. For example:

   int length = fruit.length();

length takes no arguments, as indicated by (), and returns an integer, in this case 6. Notice that it is legal to have a variable with the same name as a method (although it can be confusing for human readers).

To find the last letter of a string, you might be tempted to try something like:

   int length = fruit.length();
   char last = fruit.charAt (length);       // WRONG!!

That won't work. The reason is that there is no 6th letter in "banana". Since we started counting at 0, the 6 letters are numbered from 0 to 5. To get the last character, you have to subtract 1 from length.

Traversal[edit]

A common thing to do with a string is start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a traversal. A natural way to encode a traversal is with a while statement:

   int index = 0;
   while (index < fruit.length()) 
     char letter = fruit.charAt (index);
     System.out.println (letter);
     index = index + 1;

This loop traverses the string and prints each letter on a line by itself. Notice that the condition is index < fruit.length(), which means that when index is equal to the length of the string, the condition is false and the body of the loop is not executed. The last character we access is the one with the index fruit.length()-1.

The name of the loop variable is index. An index is a variable or value used to specify one member of an ordered set (in this case the set of characters in the string). The index indicates (hence the name) which one you want. The set has to be ordered so that each letter has an index and each index refers to a single character.

As an exercise, write a method that takes a String as an argument and that prints the letters backwards, all on one line.

Run-time errors[edit]

Way back in Section run-time I talked about run-time errors, which are errors that don't appear until a program has started running. In Java run-time errors are called exceptions.

So far, you probably haven't seen many run-time errors, because we haven't been doing many things that can cause one. Well, now we are. If you use the charAt command and you provide an index that is negative or greater than length-1, you will get an exception: specifically, a StringIndexOutOfBoundsException. Try it and see how it looks.

If your program causes an exception, it prints an error message indicating the type of exception and where in the program it occurred. Then the program terminates.

Reading documentation[edit]

If you go to http://java.sun.com/j2se/1.4/docs/api/java/lang/String.html and click on charAt, you get the following documentation (or something like it):

public char charAt(int index)

Returns the character at the specified index.
An index ranges from 0 to length() - 1. 

Parameters: index - the index of the character. 

Returns: the character at the specified index of this string.
        The first character is at index 0. 

Throws: StringIndexOutOfBoundsException if the index is out of range.
verbatim

The first line is the method's prototype (see Section prototype), which indicates the name of the method, the type of the parameters, and the return type.

The next line describes what the method does. The next two lines explain the parameters and return values. In this case the explanations are a bit redundant, but the documentation is supposed to fit a standard format. The last line explains what exceptions, if any, can be caused by this method.

The indexOf method[edit]

In some ways, indexOf is the opposite of charAt. charAt takes an index and returns the character at that index. indexOf takes a character and finds the index where that character appears.

charAt fails if the index is out of range, and causes an exception. indexOf fails if the character does not appear in the string, and returns the value -1.

   String fruit = "banana";
   int index = fruit.indexOf('a');

This finds the index of the letter 'a' in the string. In this case, the letter appears three times, so it is not obvious what indexOf should do. According to the documentation, it returns the index of the first appearance.

In order to find subsequent appearances, there is an alternate version of indexOf (for an explanation of this kind of overloading, see Section overloading). It takes a second argument that indicates where in the string to start looking. If we invoke

   int index = fruit.indexOf('a', 2);

it will start at the twoeth letter (the first n) and find the second a, which is at index 3. If the letter happens to appear at the starting index, the starting index is the answer. Thus,

   int index = fruit.indexOf('a', 5);

returns 5. Based on the documentation, it is a little tricky to figure out what happens if the starting index is out of range:

indexOf returns the index of the first occurrence of the character in the character sequence represented by this object that is greater than or equal to fromIndex, or -1 if the character does not occur.

One way to figure out what this means is to try out a couple of cases. Here are the results of my experiments:

  • If the starting index is greater than or equal to length(), the result is -1, indicating that the letter does not appear at any index greater than the starting index.
  • If the starting index is negative, the result is 1, indicating the first appearance of the letter at an index greater than the starting index.

If you go back and look at the documentation, you'll see that this behavior is consistent with the definition, even if it was not immediately obvious. Now that we have a better idea how indexOf works, we can use it as part of a program.

Looping and counting[edit]

The following program counts the number of times the letter 'a' appears in a string:

   String fruit = "banana";
   int length = fruit.length();
   int count = 0;
   
   int index = 0;
   while (index < length) 
     if (fruit.charAt(index) == 'a') 
       count = count + 1;
     
     index = index + 1;
   
   System.out.println (count);

This program demonstrates a common idiom, called a counter. The variable count is initialized to zero and then incremented each time we find an 'a' (to increment is to increase by one; it is the opposite of decrement, and unrelated to excrement, which is a noun). When we exit the loop, count contains the result: the total number of a's.

As an exercise, encapsulate this code in a method named countLetters, and generalize it so that it accepts the string and the letter as arguments.

As a second exercise, rewrite the method so that it uses indexOf to locate the a's, rather than checking the characters one by one.

Increment and decrement operators[edit]

Incrementing and decrementing are such common operations that Java provides special operators for them. The ++ operator adds one to the current value of an int or char. -- subtracts one. Neither operator works on doubles, booleans or Strings.

Technically, it is legal to increment a variable and use it in an expression at the same time. For example, you might see something like:

   System.out.println (i++);

Looking at this, it is not clear whether the increment will take effect before or after the value is printed. Because expressions like this tend to be confusing, I would discourage you from using them. In fact, to discourage you even more, I'm not going to tell you what the result is. If you really want to know, you can try it.

Using the increment operators, we can rewrite the letter-counter:

   int index = 0;
   while (index < length) 
     if (fruit.charAt(index) == 'a') 
       count++;
     
     index++;

It is a common error to write something like

   index = index++;             // WRONG!!

Unfortunately, this is syntactically legal, so the compiler will not warn you. The effect of this statement is to leave the value of index unchanged. This is often a difficult bug to track down.

Remember, you can write index = index +1;, or you can write index++;, but you shouldn't mix them.

Character arithmetic[edit]

It may seem odd, but you can do arithmetic with characters! The expression 'a' + 1 yields the value 'b'. Similarly, if you have a variable named letter that contains a character, then letter - 'a' will tell you where in the alphabet it appears (keeping in mind that 'a' is the zeroeth letter of the alphabet and 'z' is the 25th).

This sort of thing is useful for converting between the characters that contain numbers, like '0', '1' and '2', and the corresponding integers. They are not the same thing. For example, if you try this

   char letter = '3';
   int x = (int) letter;
   System.out.println (x);

you might expect the value 3, but depending on your environment, you might get 51, which is the ASCII code that is used to represent the character '3', or you might get something else altogether. To convert '3' to the corresponding integer value you can subtract '0':

   int x = (int)(letter - '0');

Technically, in both of these examples the typecast ((int)) is unnecessary, since Java will convert type char to type int automatically. I included the typecasts to emphasize the difference between the types, and because I'm a stickler about that sort of thing.

Since this conversion can be a little ugly, it is preferable to use the digit method in the Character class. For example:

   int x = Character.digit (letter, 10);

converts letter to the corresponding digit, interpreting it as a base 10 number.

Another use for character arithmetic is to loop through the letters of the alphabet in order. For example, in Robert McCloskey's book Make Way for Ducklings, the names of the ducklings form an abecedarian series, something like Jack, Kack, Lack, Mack, Nack, Oack, Pack and Qack. Here is a loop that prints these names in order:

   char letter = 'J';
   while (letter <= 'Q') 
     System.out.println (letter + "ack");
     letter++;

Notice that in addition to the arithmetic operators, we can also use the conditional operators on characters. The output of this program is:

Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack

Of course, that's not quite right because I've misspelled Ouack and Quack. As an exercise, modify the program to correct this error.

Typecasting for experts[edit]

Here's a puzzler: normally, the statement x++ is exactly equivalent to x = x + 1. Unless x is a char! In that case, x++ is legal, but x = x + 1 causes an error.

Try it out and see what the error message is, then see if you can figure out what is going on.

Strings are immutable[edit]

As you look over the documentation of the String methods, you might notice toUpperCase and toLowerCase. These methods are often a source of confusion, because it sounds like they have the effect of changing (or mutating) an existing string. Actually, neither these methods nor any others can change a string, because strings are immutable.

When you invoke toUpperCase on a String, you get a new String as a return value. For example:

   String name = "Alan Turing";
   String upperName = name.toUpperCase ();

After the second line is executed, upperName contains the value "ALAN TURING", but name still contains "Alan Turing".

Strings are incomparable[edit]

It is often necessary to compare strings to see if they are the same, or to see which comes first in alphabetical order. It would be nice if we could use the comparison operators, like == and >, but we can't.

In order to compare Strings, we have to use the equals and compareTo methods. For example:

   String name1 = "Alan Turing";
   String name2 = "Ada Lovelace";

   if (name1.equals (name2)) 
     System.out.println ("The names are the same.");
   

   int flag = name1.compareTo (name2);
   if (flag == 0) 
     System.out.println ("The names are the same.");
    else if (flag < 0) 
     System.out.println ("name1 comes before name2.");
    else if (flag > 0) 
     System.out.println ("name2 comes before name1.");

The syntax here is a little weird. To compare two things, you have to invoke a method on one of them and pass the other as an argument.

The return value from equals is straightforward enough; true if the strings contain the same characters, and false otherwise.

The return value from compareTo is a little odd. It is the difference between the first characters in the strings that differ. If the strings are equal, it is 0. If the first string (the one on which the method is invoked) comes first in the alphabet, the difference is negative. Otherwise, the difference is positive. In this case the return value is positive 8, because the second letter of Ada comes before the second letter of Alan by 8 letters.

Using compareTo is often tricky, and I never remember which way is which without looking it up, but the good news is that the interface is pretty standard for comparing many types of objects, so once you get it you are all set.

Just for completeness, I should admit that it is legal, but very seldom correct, to use the == operator with Strings. But what that means will not make sense until later, so for now, don't do it.

Glossary[edit]

  • object A collection of related data that comes with a set of

methods that operate on it. The objects we have used so far are the Graphics object provided by the system, and Strings.

  • index A variable or value used to select one of the

members of an ordered set, like a character from a string.

  • traverse To iterate through all the elements of a set

performing a similar operation on each.

  • counter A variable used to count something, usually

initialized to zero and then incremented.

  • increment Increase the value of a variable by one.

The increment operator in Java is ++.

  • decrement Decrease the value of a variable by one.

The decrement operator in Java is --.

  • exception A run time error. Exceptions cause the execution

of a program to terminate.