bộ sưu tập kiến trúc máy tính: Tài liệu The New C Standard- P6 docx

6.2.5 Types
509
1 void f(int
*
p1, long
*
p2, int
*
p3)
2 { /
*

*
/ }
It might be assumed that the objects pointed to by
p1
and
p2
do not overlap because they are pointers to
different types, while the objects pointed to by
p1
and
p3
could overlap because they are pointers to the same
type.
Coding Guidelines
C does not provide any mechanism for developers to specify that two typedef names, deﬁned using the same
integer type, are different types. The beneﬁts of such additional type-checking machinery are usually lost on
1633 typedef
is synonym
the C community.
Example
1 typedef int APPLES;
2 typedef int ORANGES;
3
4 APPLES coxes;
5 ORANGES jafa;
6
7 APPLES totals(void)
8 {
9 return coxes + jafa; /
*
Adding apples to oranges is suspicious.
*
/
10 }
509
31) The same representation and alignment requirements are meant to imply interchangeability as arguments
footnote
31
to functions, return values from functions, and members of unions.
Commentary
This interchangeability does not extend to being considered the same for common initial sequence purposes.
1038 common ini-
tial sequence
The sentence that references this footnote does not discuss any alignment issues. This footnote is identical to
footnote 39.
565 footnote
39
Prior to C90 there were no function prototypes. Developers expected to be able to interchange arguments
that had signed and unsigned versions of the same integer type. Having to cast an argument, if the parameter
type in the function deﬁnition had a different signedness, was seen as counter to C’s easy-going type-checking
system and a little intrusive. The introduction of prototypes did not completely do away with the issue of
interchangeability of arguments. The ellipsis notation speciﬁes that nothing is known about the expected
1601 ellipsis
supplies no
information
type of arguments.
Similarly, for function return values, prior to C99 it was explicitly speciﬁed that if no function declaration
was visible the translator provided one. These implicit declarations defaulted to a return type of
int
. If the
actual function happened to return the type
unsigned int
, such a default declaration might have returned
an unexpected result. A lot of developers had a casual attitude toward function declarations. The rest of us
have to live with the consequences of the Committee not wanting to break all the source code they wrote.
The interchangeability of function return values is now a moot point, because C99 requires that a function
declaration be visible at the point of call (a default declaration is no longer provided).
Having slid further down the slippery slope, we arrive at union types. From the efﬁciency point of view,
having to assign a member of a union to another member, having the corresponding (un)signed integer type,
knowing that the value is representable, seems overly cautious. If the value is representable in both types, it
is a big simpliﬁcation not to have to be concerned about which member was last assigned to.
This footnote does not explicitly discuss casting pointers to the same signed/unsigned integer type. If
objects of these types have the same representation and alignment requirements, which they do, and the value
June 24, 2009 v 1.2
6.2.5 Types
509
pointed at is within the range common to both types, everything ought to work. However, meant to imply
does not explicitly apply in this case.
DR #070
The program is not strictly conforming. Since many pre-existing programs assume that objects with the same
representation are interchangeable in these contexts, the C Standard encourages implementors to allow such
code to work, but does not require it.
The program referred to, in this DR, was very similar to the following:
1 #include <stdio.h>
2
3 void output(c)
4 int c;
5 {
6 printf("C == %d\n", c);
7 }
8
9 void DR_070(void)
10 {
11 output(6);
12 /
*
13
*
The following call has undefined behavior.
14
*
/
15 output(6U);
16 }
Other Languages
Few languages support unsigned types as such. Languages in the Pascal family allow subranges to be
speciﬁed, which could consist of nonnegative values only. However, such subrange types are not treated any
differently by the language semantics than when the subrange includes negative values. Consequently, other
languages tend to say nothing about the interchangeability of objects having the corresponding signed and
unsigned types.
Common Implementations
The standard does not require that this interchangeability be implemented. But it gives a strong hint to
implementors to investigate the issue. There are no known implementations that don’t do what they are
implyed to do.
Coding Guidelines
If the guideline recommendation dealing with use of function prototypes is followed, the visible prototype
function
declaration
use prototype
1810.1
will cause arguments to be cast to the declared type of the parameter. The function return type will also always
be known. However, for arguments corresponding to the ellipsis notation, translators will not perform any
implicit conversions. If the promoted type of the argument is not compatible with the type that appears in any
invocation of the va_arg macro corresponding to that argument, the behavior is undeﬁned. Incompatibility
between an argument type and its corresponding parameters type (when no prototype is visible) is known to
be a source of faults (hence the guideline recommendation dealing with the use of prototypes). So it is to be
function
declaration
use prototype
1810.1
expected that the same root cause will also result in use of the
va_arg
macro having the same kinds of fault.
However, use of the
va_arg
macro is relatively uncommon and for this reason no guideline recommendation
is made here.
Signed and unsigned versions of the same type may appear as members of union types. However, this
footnote does not give any additional access permissions over those discussed elsewhere. Interchangeability
union
member
when written to
589
of union members is rarely a good idea.
What about a pointer-to objects having different signed types? Accessing objects having different types,
signed or otherwise, may cause undeﬁned behavior and is discussed elsewhere. The interchangeability being
effective type 948
discussed applies to values, not objects.
v 1.2 June 24, 2009
6.2.5 Types
512
Example
1 union {
2 signed int m_1;
3 unsigned int m_2;
4 } glob;
5
6 extern int g(int, );
7
8 void f(void)
9 {
10 glob.m_2=3;
11 g(2, glob.m_1);
12 }
510
32) See “future language directions” (6.11.1). footnote
32
511
33) A speciﬁcation for imaginary types is in informative annex G. footnote
33
Commentary
This annex is informative, not normative, and is applicable to IEC 60559-compatible implementations.
18 Normative
references
C
++
There is no such annex in the C
++
Standard.
512
34) An implementation may deﬁne new keywords that provide alternative ways to designate a basic (or any
footnote
34
other) type;
Commentary
Some restrictions on the form of an identiﬁer used as a keyword are given elsewhere. A new keyword,
490 footnote
28
provided by an implementation as an alternative way of designating one of the basic types, is not the same as
a typedef name. Although a typedef name is a synonym for the underlying type, there are restrictions on how
1633 typedef
is synonym
it can be used with other type speciﬁers (it also has a scope, which a keyword does not have). For instance, a
1378 type speciﬁer
syntax
vendor may supply implementations for a range of processors and chose to support the keyword
_ _int_32
.
On some processors this keyword is an alternative representation for the type
long
, on others an alternative
for the type int, while on others it may not be an alternative for any of the basic types.
C90
Deﬁning new keywords that provide alternative ways of designating basic types was not discussed in the C90
Standard.
C
++
The object-oriented constructs supported by C
++
removes most of the need for implementations to use
additional keywords to designate basic (or any other) types
Other Languages
Most languages do not give explicit permission for new keywords to be added to them.
Common Implementations
Microsoft C supports the keyword _ _int64, which speciﬁes the same type as long long.
Coding Guidelines
Another difference between an implementation-supplied alternative designation and a developer-deﬁned
typedef name is that one is under the control of the vendor and the other is under the control of the
June 24, 2009 v 1.2
6.2.5 Types
515
developer. For instance, if
_ _int_32
had been deﬁned as a typedef name by the developer, then it would
be the developer’s responsibility to ensure that it has the appropriate deﬁnition in each environment. As an
implementation-supplied keyword, the properties of
_ _int_32
will be selected for each environment by the
vendor.
The intent behind supporting new keywords that provide alternative ways to designate a basic type is to
provide a mechanism for controlling the use of different types. In the case of integer types the guideline
recommendation dealing with the use of a single integer type, through the use of a speciﬁc keyword, is
applicable here.
object
int type only
480.1
Example
1 /
*
2
*
Assume vend_int is a new keyword denoting an alternative
3
*
way of designating the basic type int.
4
*
/
5 typedef int DEV_INT;
6
7 unsigned DEV_INT glob_1; /
*
Syntax violation.
*
/
8 unsigned vend_int glob_2; /
*
Can combine with other type specifiers.
*
/
513
this does not violate the requirement that all basic types be different.
Commentary
The implementation-deﬁned keyword is simply an alternative representation, like trigraphs are an alternative
representation of some characters.
514
Implementation-deﬁned keywords shall have the form of an identiﬁer reserved for any use as described in
7.1.3.
Commentary
This sentence duplicates the wording in footnote 28.
footnote
28
490
515
The three types char, signed char, and unsigned char are collectively called the character types.character types
Commentary
This deﬁnes the term character types.
C
++
Clause 3.9.1p1 does not explicitly deﬁne the term character types, but the wording implies the same deﬁnition
as C.
Other Languages
Many languages have a character type. Few languages have more than one such type (because they do not
usually support unsigned types).
Coding Guidelines
This terminology is not commonly used by developers who sometimes refer to char types (plural), a usage that
could be interpreted to mean the type
char
. The term character type is not immune from misinterpretation
either (as also referring to the type
char
). While it does have the advantage of technical correctness, there is
no evidence that there is any cost/beneﬁt in attempting to change existing, sloppy, usage.
v 1.2 June 24, 2009
6.2.5 Types
517
Table 515.1:
Occurrence of character types in various declaration contexts (as a percentage of all character types appearing in all
of these contexts). Based on the translated form of this book’s benchmark programs.
Type Block Scope Parameter File Scope typedef Member Total
char 16.4 3.6 1.2 0.1 6.6 28.0
signed char 0.2 0.3 0.0 0.1 0.3 1.0
unsigned char 18.1 10.6 0.4 0.8 41.2 71.1
Total 34.7 14.6 1.5 1.0 48.2
516
The implementation shall deﬁne
char
to have the same range, representation, and behavior as either
signed char
range, representa-
tion and behavior
char or unsigned char.
35)
Commentary
This is a requirement on the implementation. However, it does not alter the fact that the type
char
is a
different type than signed char or unsigned char.
C90
This sentence did not appear in the C90 Standard. Its intent had to be implied from wording elsewhere in that
standard.
C
++
3.9.1p1
A
char
, a
signed char
, and an
unsigned char
occupy the same amount of storage and have the same
alignment requirements (3.9); that is, they have the same object representation.
. . .
In any particular implementation, a plain
char
object can take on either the same values as
signed char
or an
unsigned char; which one is implementation-deﬁned.
In C
++
the type
char
can cause different behavior than if either of the types
signed char
or
unsigned
char
were used. For instance, an overloaded function might be deﬁned to take each of the three distinct
character types. The type of the argument in an invocation will then control which function is invoked. This is
not an issue for C code being translated by a C
++
translator, because it will not contain overloaded functions.
517
An enumeration comprises a set of named integer constant values. enumeration
set of named
constants
Commentary
There is no phase of translation where the names are replaced by their corresponding integer constant.
Enumerations in C are tied rather closely to their constant values. The language has never made the ﬁnal
jump to treating such names as being simply that— an abstraction for a list of names.
Rationale
The C89 Committee considered several alternatives for enumeration types in C:
1. leave them out;
2. include them as deﬁnitions of integer constants;
3. include them in the weakly typed form of the UNIX C compiler;
4. include them with strong typing as in Pascal.
The C89 Committee adopted the second alternative on the grounds that this approach most clearly reﬂects
common practice. Doing away with enumerations altogether would invalidate a fair amount of existing code;
stronger typing than integer creates problems, for example, with arrays indexed by enumerations.
Enumeration types were ﬁrst speciﬁed in a document listing extensions made to the base document.
1 base docu-
ment
June 24, 2009 v 1.2
6.2.5 Types
517
Other Languages
Enumerations in the Pascal language family are distinct from the integer types. In these languages, enumera-
tions are treated as symbolic names, not integer values (although there is usually a mechanism for getting
symbolic
name
822
at the underlying representation value). Pascal does not even allow an explicit value to be given for the
enumeration names; they are assigned by the implementation. Java did not offer support for enumerated
types until version 1.5 of its speciﬁcation.
Coding Guidelines
The beneﬁts of using a name rather than a number in the visible source to denote some property, state,
or attribute is discussed elsewhere. Enumerated types provide a mechanism for calling attention to the
symbolic
name
822
association between a list (they may also be considered as forming a set) of identiﬁers. This association
is a developer-oriented one. From the translators point of view there is no such association (unlike many
other languages, which treat members as belonging to their own unique type). The following discussion
concentrates on the developer-oriented implications of having a list of identiﬁers deﬁned together within the
same enumeration deﬁnition.
While other languages might require stronger typing checks on the use of enumeration constants and
objects deﬁned using an enumerated type, there are no such requirements in C. Their usage can be freely
intermixed, with values having other integer types, without a diagnostic being required to be generated.
Enumerated types were not speciﬁed in K&R C and a developer culture of using macros has evolved. Because
enumerated types were not seen to offer any additional functionality, in particular no additional translator
checking, that macros did not already provide, they have not achieved widespread usage.
Some coding guideline documents recommend the use of enumerated types over macro names because
of the motivation that “using of the preprocessor is poor practice”.
[809]
Other guideline documents specify
ways of indicating that a sequence of macro deﬁnitions are associated with each other (by, for instance, using
comments at the start and end of the list of deﬁnitions). The difference between such macro deﬁnition usage
and enumerations is that the latter has an explicit syntax associated with it, as well as established practices
from other languages.
The advantage of using enumerated types, rather than macro deﬁnitions, is that there is an agreed-on
notation for specifying the association between the identiﬁers. Static analysis tools can (and do) use this
information to perform a number of consistency checks on the occurrence of enumeration constants and
objects having an enumerated type in expressions. Without tool support, it might be claimed that there is
no practical difference between the use of enumerated types and macro names. Tools effectively enforce
stricter type compatibility requirements based on the belief that the deﬁnition of identiﬁers in enumerations
can be taken as a statement of intent. The identiﬁers and objects having a particular enumerated type are
being treated as a separate type that is not intended to be mixed with literals or objects having other types.
It is not known whether deﬁning a list of identiﬁers in an enumeration type rather than as a macro deﬁnition
affects developer memory performance (e.g., whether developers more readily recall them, their associated
properties, or fellow group member names with fewer errors). The issue of identiﬁer naming conventions
identiﬁer
learning a list of
792
based on the language construct used to deﬁne them is discussed elsewhere
source code
context
identiﬁer
792
The selection of which, if any, identiﬁers should be deﬁned as part of the same enumeration is based on
concepts that exist within an application (or at least within a program implementing it), or on usage patterns
of these concepts within the source code. There are a number of different methods that might be used to
measure the extent to which the concepts denoted by two identiﬁers are similar. The human-related methods
of similarity measuring, and mathematical methods based on concept analysis, are discussed elsewhere.
catego-
rization
0
concept
analysis
1821
Resnick
[1177]
describes a measure of semantic similarity based on the is-a taxonomy that is based on the idea
of shared information content.
While two or more identiﬁers may share a common set of attributes, it does not necessarily mean that they
should, or can, be members of the same enumerated type. The C Standard places several restrictions on what
can be deﬁned within an enumerated type, including:
•
The same identiﬁer, in a given scope, can only belong to one enumeration (Ada allows the same
v 1.2 June 24, 2009
6.2.5 Types
517
identiﬁer to belong to more than one enumeration in the same scope; rules are deﬁned for resolving
the uses of such overloaded identiﬁers).
•
The value of an enumeration constant must be representable in the type
int
(identiﬁers that denote
1440 enumeration
constant
representable in int
ﬂoating-point values or string literals have to be deﬁned as macro names).
• The values of an enumeration must be translation-time constants.
Given the premise that enumerated types have an interpretation for developers that is separate from the
C type compatibility rules, the kinds of operations supported by this interpretation need to be considered.
For instance, what are the rules governing the mixing of enumeration constants and integer literals in an
expression? If the identiﬁers deﬁned in an enumeration are treated as symbolic names, then the operators
applicable to them are assignment (being passed as an argument has the same semantics); the equality
operators; and, perhaps, the relational operators, if the order of deﬁnition has meaning within the concept
embodied by the names (e.g, the baud rates that follow are ordered in increasing speed).
The following two examples illustrate how symbolic names might be used by developers (they are derived
from the clause on device- and class-speciﬁc functions in the POSIX Standard
[667]
). They both deal with the
attributes of a serial device.
•
A serial device will have a single data-transfer rate (for simplicity, the possibility that the input rate
may be different from the output rate is ignored) associated with it (e.g., its baud rate). The different
rates might be denoted using the following deﬁnition:
1 enum baud_rates {B_0, B_50, B_300, B_1200, B_9600, B_38400};
where the enumerated constants have been ordered by data-transfer rate (enabling a test using the
relational operators to return meaningful information).
• The following deﬁnition denotes various attributes commonly found in serial devices:
1 enum termios_c_iflag {
2 BRKINT, /
*
Signal interrupt on break
*
/
3 ICRNL, /
*
Map CR to NL on input
*
/
4 IGNBRK, /
*
ignore break condition
*
/
5 IGNCR, /
*
Ignore CR
*
/
6 IGNPAR, /
*
Ignore characters with parity errors
*
/
7 INLCR, /
*
Map NL to CR on input
*
/
8 INPCK, /
*
Enable input parity check
*
/
9 ISTRIP, /
*
Strip character
*
/
10 IXOFF, /
*
Enable start/stop input control
*
/
11 IXON, /
*
Enable start/stop output control
*
/
12 PARMRK /
*
Mark parity errors
*
/
13 };
where it is possible that more than one of them can apply to the same device at the same time. These
enumeration constants are members of a set. Given the representation of enumerations as integer
constants, the obvious implementation technique is to use disjoint bit-patterns as the value of each
identiﬁer in the enumeration (POSIX requires that the enumeration constants in
termios_c_iflag
have values that are bitwise distinct, which is not met in the preceding deﬁnition). The bitwise operators
might then be used to manipulate objects containing these values.
The order in which enumeration constants are deﬁned in an enumerated type has a number of consequences,
including:
• If developers recognize the principle used to order the identiﬁers, they can use it to aid recall.
• The extent to which relational operators may be applied.
June 24, 2009 v 1.2
6.2.5 Types
519
•
Enhancements to the code need to ensure that any ordering is maintained when new members are
added (e.g., if a new baud rate, say 4,800, is introduced, should
B_4800
be added between
B_1200
and
B_9600 or at the end of the list?).
The extent to which a meaningful ordering exists (in the sense that subsequent readers of the source would be
capable of deducing, or predicting, the order of the identiﬁers given a description in an associated comment)
and can be maintained when applications are enhanced is an issue that can only be decided by the author of
the code.
Rev
517.1
When a set of identiﬁers are used to denote some application domain attribute using an integer constant
representation, the possibility of them belonging to an enumeration type shall be considered.
Cg
517.2
The value of an enumeration constant shall be treated as representation information.
Cg
517.3
If either operand of a binary operator has an enumerated type, the other operand shall be declared
using the same enumerated type or be an enumeration constant that is part of the deﬁnition of that
type.
If an enumerated type is to be used to represent elements of a set, it is important that the values of all of its
enumeration constants be disjoint. Adding or removing one member should not affect the presence of any
other member.
Usage
A study by Gravley and Lakhotia
[527]
looked at ways of automatically deducing which identiﬁers, deﬁned
as object-like macros denoting an integer constant, could be members of the same, automatically created,
macro
object-like
1931
enumerated type. The heuristics used to group identiﬁers were based either on visual clues (block of
#define
s bracketed by comments or blank lines), or the value of the macro body (consecutive values in
increasing or decreasing numeric sequence; bit sequences were not considered).
The 75 header ﬁles analyzed contained 1,225 macro deﬁnitions, of which 533 had integer constant bodies.
The heuristics using visual clues managed to ﬁnd around 55 groups (average size 8.9 members) having more
than one member, the value based heuristic found 60 such groups (average size 6.7 members).
518
Each distinct enumeration constitutes a different enumerated type.enumeration
different type
Commentary
Don’t jump to conclusions. Each enumerated type is required to be compatible with some integer type. The
enumeration
type com-
patible with
1447
C type compatibility rules do not always require two types to be the same. This means that objects declared
compati-
ble type
if
631
to have an enumerated type effectively behave as if they were declared with the appropriate, compatible
integer type.
C
++
The C
++
Standard also contains this sentence (3.9.2p1). But it does not contain the integer compatibility
requirements that C contains. The consequences of this are discussed elsewhere.
enumeration
type com-
patible with
1447
Other Languages
Languages that contain enumerated types usually also treat them as different types that are not compatible
with an integer type (even though this is the most common internal representation used by implementations).
v 1.2 June 24, 2009
6.2.5 Types
519
Coding Guidelines
These coding guidelines maintain this speciﬁcation of enumerations being different enumerated types and
recommends that the requirement that they be compatible with some integer type be ignored.
1447 enumeration
type compatible
with
519
The type
char
, the signed and unsigned integer types, and the enumerated types are collectively called integer
integer types
types.
Commentary
This deﬁnes the term integer types. Some developers also use the terminology integral types as used in the
C90 Standard.
C90
In the C90 Standard these types were called either integral types or integer types. DR #067 lead to these two
terms being rationalized to a single term.
C
++
3.9.1p7
Types
bool
,
char
,
wchar_t
, and the signed and unsigned integer types are collectively called integral types.
43)
A synonym for integral type is integer type.
In C the type
_Bool
is an unsigned integer type and
wchar_t
is compatible with some integer type. In C
++
they are distinct types (in overload resolution a
bool
or
wchar_t
will not match against their implementation-
deﬁned integer type, but against any deﬁnition that uses these named types in its parameter list).
In C
++
the enumerated types are not integer types; they are a compound type, although they may be
converted to some integer type in some contexts.
493 standard
integer types
Other Languages
Many other languages also group the character, integer, boolean, and enumerated types into a single
classiﬁcation. Other terms used include discrete types and ordinal types.
Coding Guidelines
Both of the terms integer types and integral types are used by developers. Character and enumerated types
are not always associated, in developers’ minds with this type category.
integer types
char
signed
integer types
unsigned
integer types
enumerated
types
extended
signed integer types
standard
signed integer types
standard
unsigned integer types
extended
unsigned integer types
_Bool
signed
char
unsigned
char
signed
short
unsigned
short
signed
int
unsigned
int
signed
long
unsigned
long
signed
long long
unsigned
long long
implementation
defined
corresponding
standard unsigned integer types
implementation
defined
Figure 519.1: The integer types.
June 24, 2009 v 1.2
6.2.5 Types
521
real types
integer types real floating types
float double long double
Figure 520.1: The real types.
Table 519.1: Occurrence of integer types in various declaration contexts (as a percentage of those all integer types appearing in
all of these contexts). Based on the translated form of this book’s benchmark programs.
Type Block Scope Parameter File Scope typedef Member Total
char 1.8 0.4 0.1 0.0 0.7 3.1
signed char 0.0 0.0 0.0 0.0 0.0 0.1
unsigned char 2.0 1.2 0.0 0.1 4.6 7.9
short 0.7 0.3 0.0 0.0 0.4 1.4
unsigned short 2.3 0.8 0.1 0.1 3.2 6.5
int 28.4 10.6 4.2 0.1 6.4 49.7
unsigned int 5.6 3.6 0.3 0.1 4.2 13.8
long 3.0 1.2 0.1 0.1 0.8 5.1
unsigned long 4.8 1.9 0.2 0.1 2.1 9.1
enum 0.9 0.9 0.4 0.4 0.8 3.3
Total 49.6 20.8 5.4 0.9 23.2
520
The integer and real ﬂoating types are collectively called real types.real types
Commentary
This deﬁnes the term real types.
C90
C90 did not include support for complex types and this deﬁnition is new in C99.
C
++
The C
++
Standard follows the C90 Standard in its deﬁnition of integer and ﬂoating types.
Coding Guidelines
This terminology is not commonly used outside of the C Standard. Are there likely to be any guideline
recommendations that will apply to real types but not arithmetic types? If there are, then writers of coding
guideline documents need to be careful in their use of terminology.
521
Integer and ﬂoating types are collectively called arithmetic types.arithmetic type
Commentary
This deﬁnes the term arithmetic types, so-called because they can appear as operands to the binary operators
normally thought of as arithmetic operators.
C90
Exactly the same wording appeared in the C90 Standard. Its meaning has changed in C99 because the
introduction of complex types has changed the deﬁnition of the term ﬂoating types.
ﬂoating types
three real
497
C
++
The wording in 3.9.1p8 is similar (although the C
++
complex type is not a basic type).
The meaning is different for the same reason given for C90.
v 1.2 June 24, 2009

bộ sưu tập kiến trúc máy tính

Thứ Sáu, 28 tháng 2, 2014

Tài liệu The New C Standard- P6 docx

Xem chi tiết: Tài liệu The New C Standard- P6 docx

Không có nhận xét nào:

Đăng nhận xét