Modern Fortran has a wide range of facilities for handling string or text data but some of these language-defined facilities have not been widely implemented by the compiler developers. It should be remembered that Fortran is designed for scientific computing and is probably not a good choice for writing a new word processor.
The main feature in Fortran that supports strings is the intrinsic data type CHARACTER. A CHARACTER literal constant can be delimited by either single or double quotes, and, where necessary, these can be escaped by using two consecutive single or double quotes. The concatenation operator is // (but this cannot be used to concatenate CHARACTER entities of different KIND). CHARACTER scalar variables and arrays are allowed. CHARACTER variables have a sub-string notation to refer to and extract sub-strings.
program string_1 implicit none !declaration character(len=6) :: word1 character(len=2) :: word2 !assignment word1 = "abcdef" !substring word2 = word1(5:6) !escape with a double quote word1 = 'Don''t ' !Concatenation write(*,*) word2//word1 end program string_1
In the above example, the two CHARACTER variables WORD1 and WORD2 are declared to have length 6 and 2 characters respectively.
In CHARACTER assignment operations, if the right hand side of the assignment is shorter than the left hand side, the remaining characters on the left hand side are filled with blanks. If the right hand side is longer than the left hand side, then the right hand side is truncated. In neither case is an error raised either by the compiler or at run time.
CHARACTER arrays and coarrays are permitted and can be declared and accessed in the same way as any other Fortran array. Where the array index and substring notations are to be combined, the array indices appear first and the substring expression appears second as illustrated in the final line of the following example:
implicit none character(len=120), dimension(10) :: text text(1) = 'This is the first element of the array "text"' text(2:3) = ' ' !elements 2 and 3 are blank text(4)(20:20) = '!' !character 20 of element 4
Unlike some programming languages, Fortran CHARACTER data and variables do not require an explicit character to terminate a string. Also, unlike C-type languages, Fortran CHARACTER data do not accommodate embedded and escaped control characters (e.g. /n) and all processing of output control is done via an extensive FORMAT sub-system.
CHARACTER Collating Sequence
Internally, Fortran maintains a collating sequence for all the permitted characters. Non-printing characters may be included in the collating sequence. The collating sequence is not specified by the language standard but most vendors support either ASCII or EBCDIC. This collating sequence means that lexical comparisons can be performed to ascertain whether e.g. 'a' < 'b', but the outcome is essentially vendor specific. Hence there is a difference between functions such as ICHAR and IACHAR that is described below.
CHARACTER can also have KIND, but this is vendor-specific. It can allow compilers to support unicode, or the Russian alphabet or Japanese characters etc. It is not necessary to specify the length or kind of a character variable. If a CHARACTER variable is declared with neither, the result is a variable of default kind and one character long. A single number is to indicate length, and two numbers indicate length and kind in that order. It is generally much clearer, but slightly more verbose to be explicit, as shown in lines 6-8 of the following example. The compiler vendor has control over which kinds of character are supported and the integer values assigned to access the corresponding character sets.
program string_2 implicit none character :: one character(5) :: english_name character(5,2) :: japanese_name character(len=80) :: line character(len=120, kind=3) :: unicode_line character(kind=4, len=256) :: ebcdic_string ... end program string_2
The intrinsic function
selected_char_kind(name) returns the positive integer kind value of the character set with the corresponding name (e.g default, ascii, kanji, iso_10646 etc) but the only character set that must be supported is
default, and if the name is not supported then -1 will be returned. Disappointingly, vendors generally have been slow to implement more than the default kind but gfortran, for instance, is a notable exception.
Language-defined Intrinsic Functions and Subprograms
Fortran has a fairly limited set of intrinsic functions to support character manipulation, searching and conversion. But the basic set is enough to construct some powerful features as required. There are some strange absences such as the ability to convert from lower-case to upper-case but this can be understood and forgiven since these concepts may not exist in many of the languages or character sets that may be represented by different character kinds. Functions such as SIZE, LBOUND and UBOUND which apply to arrays of any data type, including CHARACTER type, are not described here.
ACHAR(i, kind) returns the ith character in the ASCII collating sequence for the characters of the specified kind. The integer i must be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. ACHAR(72) has the value 'H'. One really useful feature of ACHAR is that it permits access to the non-printing ASCII characters such as return (ACHAR(13)). ACHAR will always return the ASCII character even if the processor's collating sequence is not ASCII. If kind is present, the kind parameter of the result is that specified by kind; otherwise, the kind parameter of the result is that of default character. If the processor cannot represent the result value in the kind of the result, the result is undefined. Using ACHAR is highly recommended in preference to CHAR, described below, because it is portable from one processor to another.
ADJUSTL(string) left justifies by removing leading (left) blanks from string and filling the right of string with blanks so that the result has the same length as the input string.
ADJUSTR(string) right justifies by removing trailing (right) blanks from string and filling the left of the string with blanks so that the result has the same length as the input string.
CHAR(i, kind) returns the ith character in the processor collating sequence for the characters of the specified kind. The integer i does not have to be in the range 0 < i < 127. Kind is an optional integer. If kind is not specified the default kind is assumed. If the processor cannot represent the result value in the kind of the result, the result is undefined. Using CHAR is not recommended because it is not portable from one processor to another.
IACHAR(c, kind) is the inverse of ACHAR described above. c is a single input character and IACHAR(c) returns the position of c in the ASCII character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by IACHAR.
ICHAR(c, kind) is the inverse of CHAR described above. c is a single input character and ICHAR(c) returns the position of c in the selected character set as a default integer. Kind is an optional input integer and if kind is specified, it specifies the kind of the integer returned by ICHAR.
INDEX(string, substring) returns a default integer representing the position of the first instance of substring in string searching from left to right. There are two optional arguments: back and kind. If the logical back is set true the search is conducted from right to left, and if the integer kind is specified, then the integer returned by INDEX will be of that kind. If substring does not appear in string the result is 0.
LEN(c, kind) returns an integer representing the declared length of CHARACTER c. This can be extremely useful in subprograms which receive CHARACTER dummy arguments. c can be a CHARACTER array. Kind is an optional integer which controls the kind of the integer returned by LEN.
LEN_TRIM(c, kind) returns the length of c excluding any trailing blanks (but including leading blanks). If c is only blanks the result is 0. Hence expressions like LEN_TRIM(ADJUSTL(c)) can be used to count the number of characters in c between the first and last non-blank characters. Kind is an optional integer which controls the kind of the integer returned by LEN_TRIM.
NEW_LINE(c) is a CHARACTER function that returns the new line character for the current processor. The kind of the returned character will be the same as the kind of c. A blank character may be returned if the character kind from which c is drawn does not contain a relevant newline character. This function is not likely to be used except in some very specific circumstances.
REPEAT(string, ncopies) concatenates integer ncopies of the string. Hence REPEAT('=',72) is a string of 72 equals signs. String must be scalar but can be of any length. Trailing blanks in string are included in the result.
SCAN(string, set, back, kind) returns a default integer (or an integer of the optional kind) that represents the first position that any character in set appears in string. To search right to left, the optional logical back must be set true. string can be an array in which case, the result in an integer array. If string is an array then set can be an array of the same size and shape as string and each element of set is scanned for in the corresponding element of string. INDEX, described above, is a special case of SCAN, because every character of set must be found and in the order of the characters in set.
SELECTED_CHAR_KIND(name) is an integer function that returns the kind value of the character set named. The only set that must be supported by the language standard is name='DEFAULT'. If name is not supported the result is -1.
TRIM(string) is a CHARACTER valued function that returns STRING with the trailing blanks removed. If string is all blanks the result has zero length.
VERIFY(string, set, back, kind) is an integer function that returns the position of the first character in string that is not in set. So VERIFY is roughly the obverse of SCAN. In VERIFY back and kind are both optional and have the same role as described in SCAN above. If every character in string is also in set (or string has zero length), then the function returns 0.
Fortran does not have any language-defined REGEX or sorting capability for CHARACTER data. Fortran does not have a language-defined text tokenizer but, with a little ingenuity, list directed input can provide a partial solution.
I/O of CHARACTER Data
READ for CHARACTER data can be list-directed or formated using the "a" or "an" forms of this edit descriptor. In the "a" form, the width is taken from the width of the corresponding item in the list. In the "an" form, the integer n specifies the number of characters to transfer. The general edit description "gn" can also be used.
implicit none character(120) :: line open(10, "test.dat", iostat=...) read(10,'(a)', iostat=...) line !read up to 120 characters into line read(10,'(a5)', iostat=...) line(115:)!read 5 character and put them at the end of line
The a and g edit descriptors exist for WRITE as described above. The "a" form will write the whole CHARACTER variable including all the trailing blanks so it is common to use TRIM or ADJUSTL or both.
implicit none character(len=512) :: line ... write(10,'(a)', iostat=...) trim(adjustl(line))
Internal Read and Write
Fortran has many hidden secrets and one of the most useful is that READ and WRITE statements can be used on CHARACTER variables as if they were files. Hence the otherwise mystifying lack of functions to convert numbers to strings and vice versa. The CHARACTER variable is treated as an 'internal file'
implicit none character(120) :: text_in, text_out integer :: i real :: x ... write(text_in,'(A,I0)', iostat=...) 'i = ', i !Formatted ... read(text_out,*, iostat=...) x !list-directed
In addition to type conversion, this internal read/write can be used as a very flexible and bullet proof method of reading files where the contents may be of uncertain format. The external file is read line by line into a character variable, SCAN and VERIFY can be used on the line to determine what is present and then an internal file read is done on the character variable to convert to REAL, INTEGER COMPLEX etc as appropriate.
One restriction on Fortran CHARACTER data that has now been relaxed is that CHARACTER scalar data can be deferred, allocatable and therefore free from being required to be declared of a specific length. The resulting scalar can then be formally allocated, or it can be automatically allocated as shown in the following example.
implicit none character(:), allocatable :: string ... string = 'abcdef' ... string = '1234567890' ... string = trim(line) ...
It is even possible to declare an array of assumed length elements, as illustrated below.
implicit none character(:), dimension(:), allocatable :: strings
However, this feature should be used carefully and some restrictions apply
Actual/Dummy Arguments of type CHARACTER
It is frequently the case that a procedure may be written with a CHARACTER dummy argument where the length of that argument is not known in advance. Modern Fortran allows dummy arguments to be declared with assumed length using LEN=*. Functions of type CHARACTER can be written so that the result assumed a length related to the length of the dummy arguments.
... call this('Hello') call this('Goodbye') ... subroutine this(string) implicit none character(len=*), intent(in) :: string character(len=len(string)+5) :: temp ...
In the above example, the CHARACTER variable temp is declared to have 5 more characters than string, no matter how long the actual argument is. In the next example, a function return a string, the length of which is related to the length of one or more arguments.
... string = that('thing', 7) ... function that(in_string, n) result(out_string) implicit none character(len=*), intent(in) :: in_string integer, intent(in) :: n character(len=len(in_string)*n) :: out_string ...
In circumstances where the CHARACTER function has to return a string and the length of this string is not simply related to the inputs, the assumed length, allocatable form described above can be used, and is illustrated in the case conversion examples below.
Character parameters can be declared without explicitly stating the length, for example;
implicit none character (*), parameter :: place = 'COEFF_LIST_initialise'
Approaches to Case Conversion
Here are some further examples of the ideas above, but directed to the case conversion for languages where case conversion as a concept exists. In the first example, the ASCII character set functions IACHAR and ACHAR are used to check each CHARACTER in a string consecutively.
function up_case(in) RESULT (out) implicit none character (*), intent(in) :: in character(:), allocatable :: out integer :: i, j out = in !transfer whole array do i = 1, LEN_TRIM(out) !each character j = iachar(out(i:i)) !get the ASCII position select case (j) case (97:122) !The lower case characters out(i:i) = ACHAR(j-32) !Offset to the upper case end select end do end function up_case
An alternative approach that does not rely on the ASCII representation function could be as follows:
function to_upper(in) result(out) implicit none character(*), intent(in) :: in character(:), allocatable :: out integer :: i, j character(*), parameter :: upp = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' character(*), parameter :: low = 'abcdefghijklmnopqrstuvwxyz' out = in !transfer all characters do i = 1, LEN_TRIM(out) !all non-blanks j = INDEX(low, out(i:i)) !is ith character in low if (j > 0) out(i:i) = upp(j:j) !yes, then subst with upp end do end function to_upper
Which routine is quicker will depend on the relative speed of the INDEX and IACHAR intrinsics. In one less than very scientific test, the first method above seemed to be slightly more than twice as fast as the second method, but this will vary from vendor to vendor.