Lexical Elements - Constants

A constant does not change its value while the program is running. The value of any constant must be in the range of representable values for its type.

The C and C++ languages contain the following types of constants (also called literals):


Integer Constants
Integer constants can represent decimal, octal, or hexadecimal values.

Data Types for Integer Constants
The data type of an integer constant is determined by the form, value, and suffix of the constant. The following table lists the integer constants and shows the possible data types for each constant. The smallest data type that can represent the constant value is used to store the constant.

Data Types for Integer Constants
Assigned Constant Value Data Type
unsuffixed decimal int, long int, unsigned long int
unsuffixed octal int, unsigned int, long int, unsigned long int
unsuffixed hexadecimal int, unsigned int, long int, unsigned long int
suffixed by u or U unsigned int, unsigned long int
suffixed by l or L long int, unsigned long int
suffixed by both u or U, and l or L unsigned long int
suffixed by ll or LL long long int, unsigned long long int
suffixed by both u or U, and ll or LL unsigned long long int

A plus (+) or minus (-) symbol can precede the constant. It is treated as a unary operator rather than as part of the constant value.

Note that the integer constant -2147483648 is not valid because 2147483648 is an unsigned int value, which cannot have the unary minus operator applied to it. Instead, this value should be coded as -(2147483647 + 1). To avoid such problems with very small integral values, you should use the identifiers INT_MIN (for int), SHRT_MIN (for short int), and SCHAR_MIN (for signed char). These and other limits for integer values are set in the limits.h include file.

Decimal Constants
A decimal constant contains any of the digits 0 through 9. The first digit cannot be 0.

Integer constants beginning with the digit 0 are interpreted as an octal constant, rather than as a decimal constant.

The following are examples of decimal constants:

485976
-433132211
+20
5

Hexadecimal Constants
A hexadecimal constant begins with the 0 digit followed by either an x or X, followed by any combination of the digits 0 through 9 and the letters a through f or A through F. The letters A (or a) through F (or f) represent the values 10 through 15, respectively.

The following are examples of hexadecimal constants:

0x3b24
0XF96
0x21
0x3AA
0X29b
0X4bD

Octal Constants
An octal constant begins with the digit 0 and contains any of the digits 0 through 7.

The following are examples of octal constants:

0
0125
034673
03245


Floating-Point Constants
A floating-point constant consists of:

Both the integral and fractional parts are made up of decimal digits. You can omit either the integral part or the fractional part, but not both. You can omit either the decimal point or the exponent part, but not both.

A suffix of f or F indicates a type of float, and a suffix of l or L indicates a type of long double. If a suffix is not specified, the floating-point constant has a type double.

A plus (+) or minus (-) symbol can precede a floating-point constant. However, it is not part of the constant; it is interpreted as a unary operator.

The limits for floating-point values are set in the float.h include file.

The following are examples of floating-point constants:

Floating-Point Constant Value
5.3876e4 53876
4e-11 0.00000000004
1e+5 100000
7.321E-3 0.007321
3.2E+4 32000
0.5e-6 0.0000005
0.45 0.45
6.e10 60000000000

 

If a floating-point constant is too large in magnitude in C, it is set to the largest value representable by the type. If it is too small in magnitude, it is set to zero.

In C++, constant values that are too large or too small in magnitude cause a compile-time error.


Character Constants
A character constant contains a sequence of characters or escape sequences enclosed in single quotation mark symbols.

At least one character or escape sequence must appear in the character constant. The characters can be any from the source program character set, excluding the single quotation mark, backslash and new-line symbols. The prefix L indicates a wide character constant. A character constant must appear on a single logical source line.

The value of a character constant containing a single character is the numeric representation of the character in the character set used at run time. The value of a wide character constant containing a single multibyte character is the code for that character, as defined by the mbtowc function. If the character constant contains more than one character, the last 4 bytes represent the character constant. In C++, a character constant can contain only one character.

In C, a character constant has type int. In C++, a character constant has type char. A wide character constant is represented by a double-byte character of type wchar_t, an integral type defined in the <stddef.h> include file. On AIX, each multibyte character can contain up to 4 bytes. On Intel, each multibyte character can contain up to 2 bytes.

To represent the single quotation symbol, backslash, and new-line characters, you must use the corresponding escape sequence.

The following are examples of character constants:

'a' '0' 'x' '7' 'C'
'\'' '(' '\n' '\117'

Notes:

In extended mode, a character constant longer than 2 characters causes a warning to be issued by the C compiler. Only the rightmost 4 characters are used. A character constant with 4 characters has an unsigned int value.

In ansi mode, a character constant longer than 1 character causes a warning to be issued. Only the rightmost 4 characters are used. For example the character constant 'too_long' causes the following message:

1506-076 (W) Character constant has more than one character.
Rightmost four characters are used.

String Literals
A string constant or literal contains a sequence of characters or escape sequences enclosed in double quotation mark symbols.

The prefix L indicates a wide-character string literal.

A null ('\0') character is appended to each string. For a wide character string (a string prefixed by the letter L), the value '\0' of type wchar_t is appended. By convention, programs recognize the end of a string by finding the null character.

Multiple spaces contained within a string constant are retained.

To continue a string on the next line, use the line continuation sequence (\ symbol immediately followed by a new-line character). A carriage return must immediately follow the backslash. In the following example, the string literal second causes a compile-time error.

char *first = "This string continues onto the next\
  line, where it ends.";                /* compiles successfully. */
char *second = "The comment makes the \ /* continuation symbol    */
  invisible to the compiler.";          /* compilation error.     */

Another way to continue a string is to have two or more consecutive strings. Adjacent string literals are concatenated to produce a single string. You cannot concatenate a wide string constant with a character string constant. For example:

"hello " "there"     /* is equivalent to "hello there"  */
"hello " L"there"    /* is not valid                    */
"hello" "there"      /* is equivalent to "hellothere"   */

Characters in concatenated strings remain distinct. For example, the strings "\xab" and "3" are concatenated to form "\xab3". However, the characters \xab and 3 remain distinct and are not merged to form the hexadecimal character \xab3.

Following any concatenation, '\0' of type char is appended at the end of each string. Programs find the end of a string by scanning for this value. For a wide-character string literal, '\0' of type wchar_t is appended. For example:

char *first = "Hello ";            /* stored as "Hello \0"       */
char *second = "there";            /* stored as "there\0"        */
char *third = "Hello " "there";    /* stored as "Hello there\0"  */

A character string constant has type array of char and static storage duration. A wide character constant has type array of wchar_t and static storage duration.

Use the escape sequence \n to represent a new-line character as part of the string. Use the escape sequence \\ to represent a backslash character as part of the string. You can represent the single quotation mark symbol by itself ', but you use the escape sequence \" to represent the double quotation mark symbol. For example:

#include <stdio.h>
void main ()
{
      char *s = "Hi there! \n";
      char *p = "The backslash character \\.";
      char *q = "The double quotation mark \".\n";
      printf("%s%s\n%s", s, p, q);
}

This program produces the following output:

Hi there!
The backslash character \.
The double quotation mark ".

You should be careful when modifying string literals because the resulting behavior depends on whether your strings are stored in read/write static memory.

Use the qro compiler option or the #pragma strings preprocessor directive to change the default storage for string literals. The #pragma strings preprocessor directive can also be used to specify whether string literals are read-only or read/write.

The following are examples of string literals:

char titles[ ] = "Handel's \"Water Music\"";
char *mail_addr = "Last Name    First Name    MI   Street Address   \
   City     Province   Postal code ";
char *temp_string = "abc" "def" "ghi";  /* *temp_string = "abcdefghi\0" */
wchar_t *wide_string = L"longstring";

Escape Sequences
You can represent any member of the execution character set by an escape sequence. They are primarily used to put nonprintable characters in character and string literals. For example, you can use escape sequences to put such characters as tab, carriage return, and backspace into an output stream.

An escape sequence contains a backslash (\) symbol followed by one of the escape sequence characters or an octal or hexadecimal number. A hexadecimal escape sequence contains an x followed by one or more hexadecimal digits (0-9, A-F, a-f). An octal escape sequence uses up to three octal digits (0-7). The value of the hexadecimal or octal number specifies the value of the desired character or wide character.

Note: The line continuation sequence (\ followed by a new-line character) is not an escape sequence. It is used in character strings to indicate that the current line continues on the next line.

The escape sequences and the characters they represent are:

Escape Sequence Character Represented
\a Alert (bell, alarm)
\b Backspace
\f Form feed (new page)
\n New-line
\r Carriage return
\t Horizontal tab
\v Vertical tab
\' Single quotation mark
\" Double quotation mark
\? Question mark
\\ Backslash

The value of an escape sequence represents the member of the character set used at run time. Escape sequences are translated during preprocessing. For example, the AIX Version 4 operating system uses the ASCII character set, where the value of the escape sequence \x56 is the letter V.

Use escape sequences only in character constants or in string literals.

If an escape sequence is not recognized, the compiler removes the backslash and issues a warning message. For example, the string "abc\def" becomes "abcdef". Note that this behavior is implementation-defined.

When a hexadecimal escape sequence is longer than two digits, the compiler issues a warning. Only the rightmost two digits are used. For example, in the following statement

printf ("\x06asset \n");

only the digits 6a are retained.

In string and character sequences, when you want the backslash to represent itself (rather than the beginning of an escape sequence), you must use a \\ backslash escape sequence.

#include <stdio.h>
void main()
{
      char a,b,c,d,e;
      a='a';
      b=97;       /* ASCII integer value      */
      c='\141';   /* ASCII octal value        */
      d='\x61';   /* ASCII hexadecimal value  */
      e='\n';
      printf("%c %c %c %c %c\n", a, b, c, d, e);
}

 



Constant Expressions
Lexical Elements - Tokens
Lexical Elements of C - Comments
Lexical Elements - Identifiers
Type Specifiers