C& CPP Tutorials
About C
Program Structure
Running C Program
Variables, constants
Operators
Control Structure
Array
Function
Pointers
Preprocessor
Structure
File
Bitwise Operators
Recursion
 Data Structures
Introduction
Stacks
Queues
Linked List
Sorting 
Searching
 Test Your Skill
Fundamentals
Input and Output
Branching and Looping
Function
Pointers I
Pointers II
Structure and Union
Sample Problems I
Sample Problems II
 Help and Support
C Forum
Source code
C  Yahoo Group
C Compilers
 

Strings and pointer


Because of the ``equivalence'' of arrays and pointers, it is extremely common to refer to and manipulate strings as character pointers, or char *'s. It is so common, in fact, that it is easy to forget that strings are arrays, and to imagine that they're represented by pointers. (Actually, in the case of strings, it may not even matter that much if the distinction gets a little blurred; there's certainly nothing wrong with referring to a character pointer, suitably initialized, as a ``string.'') Let's look at a few of the implications

Any function that manipulates a string will actually accept it as a char * argument. The caller may pass an array containing a string, but the function will receive a pointer to the array's (string's) first element (character).  The %s format in printf expects a character pointer. 

Although you have to use strcpy to copy a string from one array to another, you can use simple pointer assignment to assign a string to a pointer. The string being assigned might either be in an array or pointed to by another pointer. In other words, given .

   char string[] = "Hello, world!";
   char *p1, *p2;
   p1 = string
   p2 = p1


line 3 and line 4 are legal (Remember, though, that when you assign a pointer, you're making a copy of the pointer but not of the data it points to. In the first example, p1 ends up pointing to the string in string. In the second example, p2 ends up pointing to the same string as p1. In any case, after a pointer assignment, if you ever change the string (or other data) pointed to, the change is ``visible'' to both pointers.
Many programs manipulate strings exclusively using character pointers, never explicitly declaring any actual arrays. As long as these programs are careful to allocate appropriate memory for the strings, they're perfectly valid and correct.
When you start working heavily with strings, however, you have to be aware of one subtle fact.
When you initialize a character array with a string constant


 char string[] = "Hello, world!";


you end up with an array containing the string, and you can modify the array's contents to your heart's content


 string[0] = 'J';


However, it's possible to use string constants (the formal term is string literals) at other places in your code. Since they're arrays, the compiler generates pointers to their first elements when they're used in expressions, as usual. That is, if you say 
char *p1 = "Hello";
int len = strlen("world");
it's almost as if you'd said 

  char internal_string_1[] = "Hello";
  char internal_string_2[] = "world";
  char *p1 = &internal_string_1[0];
  int len = strlen(&internal_string_2[0]);


Here, the arrays named internal_string_1 and internal_string_2 are supposed to suggest the fact that the compiler is actually generating little temporary arrays every time you use a string constant in your code. However, the subtle fact is that the arrays which are ``behind'' the string constants are not necessarily modifiable. In particular, the compiler may store them in read-only-memory. Therefore, if you write .

 
 char *p3 = "Hello, world!";
 p3[0] = 'J';

your program may crash, because it may try to store a value (in this case, the character 'J') into non writable memory.

The moral is that whenever you're building or modifying strings, you have to make sure that the memory you're building or modifying them in is writable. That memory should either be an array you've allocated, or some memory which you've dynamically allocated by the techniques which we'll see in the next chapter. Make sure that no part of your program will ever try to modify a string which is actually one of the unnamed, unwritable arrays which the compiler generated for you in response to one of your string constants. (The only exception is array initialization, because if you write to such an array, you're writing to the array, not to the string literal which you used to initialize the array.) 

Breaking a Line into ``Words''


an earlier assignment, an ``extra credit'' version of a problem asked you to write a little checkbook balancing program that accepted a series of lines of the form 

   deposit 1000
   check 10
   check 12.34
   deposit 50
   check 20

It was a surprising nuisance to do this in an ad hoc way, using only the tools we had at the time. It was easy to read each line, but it was cumbersome to break it up into the word (``deposit'' or ``check'') and the amount. 

I find it very convenient to use a more general approach: first, break lines like these into a series of whitespace-separated words, then deal with each word separately. To do this, we will use an array of pointers to char, which we can also think of as an ``array of strings,'' since a string is an array of char, and a pointer-to-char can easily point at a string. Here is the declaration of such an array: 


 char *words[10];


This is the first complicated C declaration we've seen: it says that words is an array of 10 pointers to char. We're going to write a function, getwords, which we can call like this: 

 int nwords;
 nwords = getwords(line, words, 10);


where line is the line we're breaking into words, words is the array to be filled in with the (pointers to the) words, and nwords (the return value from getwords) is the number of words which the function finds. (As with getline, we tell the function the size of the array so that if the line should happen to contain more words than that, it won't overflow the array).

Here is the definition of the getwords function. It finds the beginning of each word, places a pointer to it in the array, finds the end of that word (which is signified by at least one whitespace character) and terminates the word by placing a '\0' character after it. (The '\0' character will overwrite the first whitespace character following the word.) Note that the original input string is therefore modified by getwords: if you were to try to print the input line after calling getwords, it would appear to contain only its first word (because of the first inserted '\0'). 


 #include <stddef.h>
 #include <ctype.h>
 getwords(char *line, char *words[], int maxwords)
  {
     char *p = line;
     int nwords = 0;
     while(1)
       {
          while(isspace(*p))
            p++;
            if(*p == '\0')
            return nwords;
            words[nwords++] = p;
            while(!isspace(*p) && *p != '\0')
              p++;
            if(*p == '\0')
             return nwords;
              *p++ = '\0';
             if(nwords >= maxwords)
            return nwords;
    }
  }


Each time through the outer while loop, the function tries to find another word. First it skips over whitespace (which might be leading spaces on the line, or the space(s) separating this word from the previous one). The isspace function is new: it's in the standard library, declared in the header File <ctype.h>, and it returns nonzero (``true'') if the character you hand it is a space character (a space or a tab, or any other whitespace character there might happen to be). 

When the function finds a non-whitespace character, it has found the beginning of another word, so it places the pointer to that character in the next cell of the words array. Then it steps though the word, looking at non-whitespace characters, until it finds another whitespace character, or the \0 at the end of the line. If it finds the \0, it's done with the entire line; otherwise, it changes the whitespace character to a \0, to terminate the word it's just found, and continues. (If it's found as many words as will fit in the words array, it returns prematurely.) 

Each time it finds a word, the function increments the number of words (nwords) it has found. Since arrays in C start at [0], the number of words the function has found so far is also the index of the cell in the words array where the next word should be stored. The function actually assigns the next word and increments nwords in one expression


 words[nwords++] = p;


out should convince yourself that this arrangement works, and that (in this case) the preincrement form 


  words[++nwords] = p; /* WRONG */


would not behave as desired. When the function is done (when it finds the \0 terminating the input line, or when it runs out of cells in the words array) it returns the number of words it has found. Here is a complete example of calling getwords: 


  char line[] = "this is a test";
  int i;
  nwords = getwords(line, words, 10);
  for(i = 0; i < nwords; i++)
    printf("%s\n", words[i]);

Back Next
 

Google