Propagation of Data Types

I haven't seen this explained. If I do something like:

float a;
unsigned long int b;
long int c;
int d;

how is the following calculated? Or is mixing data types just bad practice?

a = b + c * d

Taking things in order of precedence it seems to me that the product is first calculated using the longer data type of c and d (long int). Then the addition is done using the longer data type of b and c*d (unsigned long int). And then that is finally converted to a float; the data type of a. Is this correct? Is there somewhere this is explained?

Thanks,

Dave

Or is mixing data types just bad practice?

Like that yes it is bad practice. The result is compiler specific and can and will vary between compilers.

The best thing to do is to cast everything to the size you want the result to be.

Grumpy_Mike:

Or is mixing data types just bad practice?

Like that yes it is bad practice. The result is compiler specific and can and will vary between compilers.

The best thing to do is to cast everything to the size you want the result to be.

No, type promotion is specified by the C and C++ standards. In C99, this is called the 'Usual arithmetic conversions' in section 6.3.1.8. Basically you look at a particular operation, and the types of each side.

  • If you have two floating point types, the result is the larger of the two types (not an issue on AVR's);
  • If you have a floating point type and an integral type, the integral type is converted to the floating point type;
  • If you have integral types on both sides, first short/char types are promoted to the corresponding int type, and then it is converted to a common type (there are various rules for signed vs. unsigned).

So for:

float a;
unsigned long int b;
long int c;
int d;
// ...
a = b + c * d;

First c * d is done. C is long int, and d is int, thus d is converted to long int.

Then b + (c * d) is done. B is an unsigned long int and (c * d) is long int, so it is converted to unsigned long int.

Then the result is assigned to a, converting the unsigned long int to float.

long int tmp1 = c * (long int)d;
unsigned long int tmp2 = b + (unsigned long int)tmp1;
a = (float)tmp2;

I do tend to agree with Grumpy Mike that it is best to be explicit when you have multiple types.

In a job at a place where I worked, there was a line of code that was identical in two different programs, for converting the current time into the number of seconds since the year 2000. One fateful May the 4th the two programs started producing different answers due to exactly this problem.

Grumpy_Mike:
In a job at a place where I worked, there was a line of code that was identical in two different programs, for converting the current time into the number of seconds since the year 2000. One fateful May the 4th the two programs started producing different answers due to exactly this problem.

Presumably it was due to the underlying types (whether it was ints are 32-bits or 64-bits) or unsigned vs. signed. Note, if you are using signed arithmetic and it overflows, the results are undefined. But the usual arithmetic conventions has been standardized since the 1989 Ansi C standard (that became the 1990 ISO C standard, and then modified for the 1999 and 2011 C standards, with similar rules for the C++ standards). Between the original C language defined in K&R, and the 1989 C standard, there was a change in integral promotion (value preserving vs. signedness preserving) that could change things if signed and unsigned are mixed.

Thanks. I had a feeling that mixing data types in statements could have unintended consequences, but had never really seen a warning about it. Guess I'll check for similar data types in the future.

Dave

Guess I'll check for similar data types in the future.

No need to check, just cast them.

The biggest place that causes is mixing unsigned and signed types. GCC in fact has a warning when you are mixing signed and unsigned values, but I suspect the IDE does not enable this warning.

The second biggest place that can trap people is either unintentional overflows or places where you were depending on the overflow and it didn't happen because 'int' and 'long' are different on your current system than the system the code was developed for (for example going between an Uno and a Due, where ints are 16 bits in the Uno and 32 bits in the Due).

'int' and 'long' are different on your current system than the system the code was developed for

Which is why I never use them.

I spent days trying to port a large program to a Burroughs (I think) mainframe, our company had a strict "K&R only" policy for the code used on our product which almost always helped but this computer had 9-bit bytes and we had some fancy hashing algorithms that bit twiddled everything in lots of 8.

Oops.

I can't remember how the heck I figured that one out, probably mentioned something to a local guru and he said "Waddaya mean 8-bits?" :slight_smile:


Rob

Graynomad:

'int' and 'long' are different on your current system than the system the code was developed for

Which is why I never use them.

I spent days trying to port a large program to a Burroughs (I think) mainframe, our company had a strict "K&R only" policy for the code used on our product which almost always helped but this computer had 9-bit bytes and we had some fancy hashing algorithms that bit twiddled everything in lots of 8.

IIRC, Burroughs also used sign magnitude to represent integers (Univac and CDC used one's complement) which is a different representation for negative numbers, and both sign magnitude/one's complement have a representation for -0 (though of course, two's complement has a negative number that has no positive counterpart). Note, both sign/magnitude and one's complement make unsigned arithmetic much more expensive than it is on two's complement machines.

I was on the original ANSI C standards committee, and there were some companies that had existing machines that were challenging to C as people coded it (as compared to the K&R C, which was a little stricter, but the common C compiler on DEC/x86 machines of the day was more releaxed). Lets see:

  • My employer for the first few years (Data General) had two different pointer formats, one for pointers to characters and one for pointers to 16-bit words. The word pointer format had the top bit reserved for use as an indirection bit, while the byte pointer format was shifted left one bit and the bottom bit selected the byte. Just to add to the fun, arithmetic shifts that propigated the sign bit were expensive, and the machine used IBM system 360 floating point instead of IEEE;
  • Pr1me also had multiple pointer formats, but they had it worse in that their byte pointer was larger than the word pointer, and it did not fit in any default integer type;
  • Burroughs had 36 bit words, with 9-bit bytes, and sign magnitude arithmetic;
  • Univac also had 36 bit words and one's complement arithmetic -- during the years when the meetings were held, Burroughs and Univac combined, and the rep that then represented the merged company (Unisys) had two unusual platforms to deal with;
  • IBM had to deal with EBCDIC which is an alternate encoding of characters from ASCII. EBCDIC was based on encodings from card punches, and an unfortunate consequence was that the alphabetic characters were not contiguous. Also, printable characters had the top bit set, which meant that by default chars were unsigned instead of sign. They also had the IBM format for floating point that preceded IEEE 754 floating point, and this has various differences in the detail. Finally, the linker was maintained by a different organization in the company, it had limits in terms of length of external names.
  • Digital Equipment Corporation (DEC) also had a linker that the compiler group had little control over, though it wasn't as bad as IBM (the IBM linker of the time only allowed 8 characters, all upper case, while the DEC linker allowed longer names, but still only one case).
  • There were some vendors targeting the Intel 80286 had the concept of near and far pointers, and while the standard didn't add those keywords, it did touch upon the discussions, notably that pointers to functions could be a different size from pointers to data.

Thanks Michael, good hysterical data there, must have interesting to have been in on that lot.

I guess from what you say I was in fact on a Burroughs, I did so many ports in those days I lost track. My job was to fly in to a city with a tape under my arm and spend as long as required (or two days which every came first :)) getting the product working on the client's machine. I'd be placed in a cubical with a terminal, a pile of users manuals and the phone # of a sys admin person who could load my tape.

I'd be using a machine, user interface and tool chain I'd never seen before, half the time didn't know where to make coffee or even have a crap.

Not my favourite job but I sure learned a lot and learned it quick.

Also interesting you mention Pr1me, I worked for their R&D section (comms) in Canberra up until the time they closed it down in Aus and moved it all back to the US. 60-odd engineers on the job market over night. Luckily I'd been head-hunted the week before :slight_smile:

I don't remember the pointer setup with Pr1me, but I normally worked well below that level, yes below that level, on the 290x bit slice processor with a custom (64-bit IIRC) instruction set that worked directly with ALUs and hardware registers.

Pr1me was eventually bought by corporate raiders I think and stripped of it's assets, this was during the 80s when the Wall St arseholes made money from destroying things (and people), I doubt that's changed much but it was rampant in the 80s. Greed is good eh?


Rob

Thanks Michael, good hysterical data there,

Didn't even make me laugh, must be something about the Ausy sense of humor.

Yes is a very dry sense of humor, strange that given the beer-swilling stereotype.

Anyway got to go, it's happy hour (somewhere).


Rob

Graynomad:
Thanks Michael, good hysterical data there, must have interesting to have been in on that lot.

Yes, it was an interesting group of characters. I was on the committee for 10 1/2 years, first representing Data General, and then Open Software Foundation (and at OSF, my alternate for the meeting had been the Pr1me guy before we had both moved on from our respective companies). These meetings lasted a week (Monday through Friday morning), and invariably you are with the people all day, and at times tempers will flare.

We often met in hotel conference rooms, and one time there was one of these seminars of 'How to deal with difficult people' was in the adjacent room. We thought about putting up a sign on our door saying the meeting was 'How to be a difficult person', since we had several people that could teach master level classes in that regard.

We often ate at mall food courts for lunch, and we came to the conclusion that just like we were standardizing the C language, there must be an ANSI standard committee for mall design, since after awhile, they all looked the same.

I do recall going home once and watching CSPAN (USA cable/sat. channel that focuses on government meetings) and was amused to watch the real pros use Roberts Rules of Order effectively. We were rather amateurs.

Graynomad:
I guess from what you say I was in fact on a Burroughs, I did so many ports in those days I lost track. My job was to fly in to a city with a tape under my arm and spend as long as required (or two days which every came first :)) getting the product working on the client's machine. I'd be placed in a cubical with a terminal, a pile of users manuals and the phone # of a sys admin person who could load my tape.

I'd be using a machine, user interface and tool chain I'd never seen before, half the time didn't know where to make coffee or even have a crap.

Not my favourite job but I sure learned a lot and learned it quick.

I've known consultants like that, and you have to be fairly adaptable. When I first started working at Data General, it was a cube farm, and figured we really should have a cube with a pile of cheese in it, since I felt like a rat in a psych maze.

Graynomad:
Also interesting you mention Pr1me, I worked for their R&D section (comms) in Canberra up until the time they closed it down in Aus and moved it all back to the US. 60-odd engineers on the job market over night. Luckily I'd been head-hunted the week before :slight_smile:

I don't remember the pointer setup with Pr1me, but I normally worked well below that level, yes below that level, on the 290x bit slice processor with a custom (64-bit IIRC) instruction set that worked directly with ALUs and hardware registers.

Pr1me was eventually bought by corporate raiders I think and stripped of it's assets, this was during the 80s when the Wall St arseholes made money from destroying things (and people), I doubt that's changed much but it was rampant in the 80s. Greed is good eh?

After I left, DG got bought by EMC for its disk unit, and they eventually closed the computer side of the business. However, from a compiler point of view, the DG hardware was a mess (particularly the C compiler that I wrote, since that machine was C hostile).

DG, Pr1me, Univac, Burroughs, DEC-10's, Cray-1, CDC, etc. all were word oriented machines with byte addressing added in as an after thought. C was designed on a machine (initially PDP-7, and then PDP-11) where all addresses were uniform byte addresses. C programmers tend to be rather lax about passing pointers around and converting to integral types, assuming that all pointers smell the same, though C++ now is a lot stricter than C as it was used in the 1980's was.

Totally fascinating... Rare insight's into the industry and a good feel for "People Will Be People"...

Bob