Java Forum / General / May 2006
Integer to Floating-Point conversions
Kai Koehne - 13 May 2006 09:30 GMT Hi,
I need an int to real32 conversion with IEEE-754 round-towards-zero mode ... That means, a method x, so that
int i = 44554435 float f = x(i)
results in f = 3355432.0, and not f = 335546.0 .
Are you aware of any mathematical packages that provide such a function, or a way to do this with pure Java?
Regards,
Kai Koehne
joseph_daniel_zukiger@yahoo.com - 13 May 2006 10:12 GMT > Hi, > [quoted text clipped - 12 lines] > > Kai Koehne Huh?
I was going to mumble something automatic integer to floating point promotion should work as advertised, but your question does not seem to make sense.
If you absolutely must work with 32 bit floats instead of 64 bit doubles or java.math.BigDecimal, you will lose precision when promoting large integers. There's no way to squeeze 32 bits of precision into less than 32 bits of precision, expecially when a 32 bit float needs a few bits for the exponent.
But if you really need a method that converts 44554435 stored in an int to either 3355432.0 or 335546.0 in 32 bit floating point representation, you can always write your own. I'd prefer you give it a reasonable name, like "mungeNumber()" or "hashAFloat()" or my.own.strange.Promotions.conversion_x() so that if it ever gets close to any of my code my code doesn't get infected. But you can definitely build it. I'd point you to the xxxToYyyBits() and yyyBitsToXxx() methods of java.lang.Double and java.lang.Float, but I wouldn't want to be accused of contributing to the crime.
Kai Koehne - 13 May 2006 10:52 GMT joseph_daniel_zukiger@yahoo.com schrieb:
>> Hi, >> [quoted text clipped - 7 lines] > promotion should work as advertised, but your question does not seem to > make sense. Just to clarify my motivation: I am building an Occam 2 Java compiler ... and as it happens, occam supports two ways of converting an integer to a floating-point variable:
REAL32 ROUND n -- conversion with round-towards-middle REAL32 TRUNC n -- conversion with round-towards-zero
The first has the same semantics as (float)n , but I couldn't find a way to implement the second conversion.
> [...] > But if you really need a method that converts 44554435 stored in an int [quoted text clipped - 6 lines] > methods of java.lang.Double and java.lang.Float, but I wouldn't want to > be accused of contributing to the crime. Huh, this is the tough way ... but I will try to find my way through the implementation details, thank you for the idea :-)
Kind regards,
Kai Koehne
joseph_daniel_zukiger@yahoo.com - 13 May 2006 13:40 GMT > joseph_daniel_zukiger@yahoo.com schrieb: > >> Hi, [quoted text clipped - 18 lines] > The first has the same semantics as (float)n , but I couldn't find a way > to implement the second conversion. Ahh. I see.
> > [...] > > But if you really need a method that converts 44554435 stored in an int [quoted text clipped - 9 lines] > Huh, this is the tough way ... but I will try to find my way through the > implementation details, thank you for the idea :-) If what Chris suggests doesn't get there fast enough for you, you might want to consider implementing it borrowing from java.math.BigDecimal and BigInteger. If it turns out too slow, it could at least give you a good baseline.
> Kind regards, > > Kai Koehne Patricia Shanahan - 13 May 2006 16:28 GMT ...
> Huh, this is the tough way ... but I will try to find my way through the > implementation details, thank you for the idea :-) ...
Here's my attempt. The test cases shown in the main method are ALL the testing I've done.
public class FloatConvert {
public static void main(String[] args) { // Original example test(44554435, 44554432); // Negation of original example test(-44554435, -44554432); // No rounding needed test(44554432, 44554432); // Rounding goes in correct direction test(44554433, 44554432); }
private static void test(int input, int expected) { double actual = truncateIntToFloat(input); if (actual == expected) { System.out.printf("Success: input %d actual %f\n", input, actual); } else { System.out.printf("Failure: input %d actual %f expected %d\n", input, actual, expected); } }
/** * Convert int to float with truncation towards zero. * * @param input * Number to convert * @return Largest magnitude float that has the same sign as input * and no greater absolute value. */ public static float truncateIntToFloat(int input) { float rounded = (float) input; if (Math.abs((double) rounded) <= Math.abs((double) input)) { return rounded; } else { int roundedBits = Float.floatToIntBits(rounded); float truncated = Float.intBitsToFloat(roundedBits - 1); return truncated; } } }
Chris Uppal - 14 May 2006 13:50 GMT > Here's my attempt. The test cases shown in the main method > are ALL the testing I've done. It works for the other 4294967292 inputs too. I've tried 'em all ;-)
> int roundedBits = Float.floatToIntBits(rounded); > float truncated = Float.intBitsToFloat(roundedBits - 1); > return truncated; That's clever. At least, I /hope/ it's clever because I don't understand why it works. Do you care to expand a bit ? Please ?
I got interested in this myself and have tried out a number of approaches, including an attempt to convert rounding into truncation by consideration of ULPs (as I suggested to the OP earlier). The best, IMO, is to do the truncation in the integer domain before converting into a float. Its a little complicated to do, but it runs fast, and has the aesthetic advantage of not requiring double-precision operations to produce a single-precision result (which all the other things I tried do).
I've appended the code, including the test-harness, for anyone who's interested.
-- chris
=========== Test.java =============== /* * contains several static implementations of truncating a 32-bit integer * to a 32-bit float with rounding "towards zero" rather that to the nearest * float (as in the Java language spec for casting an int to a float). * Also contains an exhaustive test which checks all the implementations * against the simplest one (presumed to be correct). * * Approximate times when run on a 1.5.0_06-b05 JVM, in "client" mode on * a 1.5 GHz celeron WinXP Pro box. In each case the time is the time in * nanoseconds to convert one value averaged over the full 32-bit range of * integers. * * Slow: 130 * ULP: 63.5 * FastULP: 65 * Twiddle: 19 * Patricia: 74.5 * * Approximate times for inputs in the resticted range [-10,000,000, 10,000,000), * 83% of which is exactly representable as a float. * * Slow: 12 * ULP: 12 * FastULP: 20.5 * Twiddle: 18.5 * Patricia: 24 */ public class Test { public static void main(String[] args) { int i = 0; int errors = 0; for (;;) { if (i % 10000000 == 0) { // progress... System.out.printf("%11d...\r", i); System.out.flush(); } float shouldBe = slowTruncate(i); float is;
if ((is = ulpTruncate(i)) != shouldBe) { System.out.printf("Slow: %d: %f != %f%n", i, is, shouldBe); errors++; } if ((is = fastTruncate(i)) != shouldBe) { System.out.printf("Fast: %d: %f != %f%n", i, is, shouldBe); errors++; } if ((is = twiddleTruncate(i)) != shouldBe) { System.out.printf("Twiddle: %d: %f != %f%n", i, is, shouldBe); errors++; } if ((is = patriciaTruncate(i)) != shouldBe) { System.out.printf("Patricia: %d: %f != %f%n", i, is, shouldBe); errors++; }
if (errors > 10) break; if (i == -1) break; // we've wrapped all the way around i++; } System.out.printf("%ndone, %d errors%n", errors); }
/** * Slow implementation of truncating an int to a 32-bit floating point. * This implementation is intended to be the "so simple it can't possibly * be wrong" benchmark against which other implementations can be compared */ public static float slowTruncate(int i) { // since all +ve numbers have negations, but the reverse // isn't true, we work with -ve numbers if (i > 0) return -slowTruncate(-i);
// loop up (towards zero) until we find the right answer for (;;) { float f = (float)i; if ((double)f >= (double)i) return f; i++; } }
/** * Implementation of truncating an int to a 32-bit floating point based on * ULP manipulation */ public static float ulpTruncate(int i) { float f = (float)i; double error = (double)f - (double)i;
// no error ? if (error == 0) return f;
// or rounded towards 0 anyway ? if ((error < 0) == (i > 0)) return f;
// adjust by one ulp towards 0 // the double ulp() is because ulp is the distance to the next // number away from 0, we need the distance to the next float // nearest zero if (i > 0) { float ulp = Math.ulp(f-Math.ulp(f)); f -= ulp; } else { float ulp = Math.ulp(f+Math.ulp(f)); f += ulp; }
return f; }
/** * Faster implementation of truncating an int to a 32-bit floating point. * This implementation uses a simple test to catch a useful range of * "eaasy" cases. If the input is likely to be in that range then * this provides a useful optimisation, otherwise it's marginally * slower. * NB1: this version uses ulpTruncate() for the difficult cases, but * it could just as easily use a different one -- even slowTruncate(). * NB2: since this consists only of a simple test guarding simple operations, * it seems reasonable to hope that the JIT will inline it in callers */ public static float fastTruncate(int i) { // these are all exact if (i >= -8388608 && i <= 8388608) return (float)i; else return ulpTruncate(i); // or any other xxxTruncate() }
/** * Faster implementation of truncating an int to a 32-bit floating point based * on bit-twiddling. * The idea is to take the number, discover how many high-bits it has which * don't fit into the 24-bit mantissa of a float, and to mask off that many * low bits before converting to a float. I.e. we do the truncation in the * integer domain before converting to a float, so the conversion is always * is always exact. * The actual implementation is complicated by sign issues, but that's the * basic idea. * Note that we don't actually mess with the physical layout of an IEEE float, * but only use the fact that it has exactly 24 bits in which to represent the * mantissa */ public static float twiddleTruncate(int i) { if (i >= 0) { // because i is +ve the high bit is always zero, // and so (i >>> 24) is always in the range [0, 128) return (float)(i & MASKS[i >>> 24]); }
// nasty special case if (i == Integer.MIN_VALUE) return -2147483648.0f;
return -twiddleTruncate(-i); }
// helper data for twiddleTruncate(). A set of precomuted masks which // will be indexed by bits [24, 30] (zero-based) of the input integer. private static final int[] MASKS = new int[128]; static { for (int i = 0; i < 128; i++) MASKS[i] = -1; for (int bit = 0; bit < 7; bit++) { int threshold = 1 << bit; int mask = 1 << bit; for (int i = threshold; i < 128; i++) MASKS[i] ^= mask; } }
/** * Clever implementation of truncating an int to a 32-bit floating point posted * to comp.lang.java.programmer by Patricia Shanahan on 13 May 2006 * (Message-ID: <TWm9g.126$921.60@newsread4.news.pas.earthlink.net> ) */ public static float patriciaTruncate(int i) { float rounded = (float) i;
if (Math.abs((double) rounded) <= Math.abs((double) i)) return rounded;
int roundedBits = Float.floatToIntBits(rounded); float truncated = Float.intBitsToFloat(roundedBits - 1);
return truncated; } } ==========================
Patricia Shanahan - 14 May 2006 14:37 GMT >> Here's my attempt. The test cases shown in the main method are ALL >> the testing I've done. [quoted text clipped - 6 lines] > That's clever. At least, I /hope/ it's clever because I don't > understand why it works. Do you care to expand a bit ? Please ? Sure. First I split the problem into two cases. If the result did not require rounding, or the rounding was towards zero, the rounded result and truncated result are equal, so I just returned the rounded result.
The code you quote is for those cases that round away from zero. The float we want is the one with the same sign as rounded, and the largest magnitude that is less than rounded.
NaNs, infinities, and zeros would all be special cases in a general attempt to make the smallest possible reduction in the magnitude of a float, but can be ignored here. Zero can only result from converting int zero, and getting an exact result. The others cannot result from int conversion.
If rounded is not an exact power of two, the truncation has a mantissa one less than rounded. If it is an exact power of two, it has an exponent one less than rounded, and mantissa all ones. Either way, subtracting one from the rounded bit pattern gets the bit pattern for truncated.
> I got interested in this myself and have tried out a number of > approaches, including an attempt to convert rounding into truncation [quoted text clipped - 4 lines] > double-precision operations to produce a single-precision result > (which all the other things I tried do). If you prefer to avoid double, just do the abs in int:
public static float truncateIntToFloat(int input) { float rounded = (float) input; if (Math.abs((int) rounded) <= Math.abs(input)) { return rounded; } else { int roundedBits = Float.floatToIntBits(rounded); float truncated = Float.intBitsToFloat(roundedBits - 1); return truncated; } }
Both numbers whose absolute values are to be compared can be exactly represented in either int or double, so either type should work.
Patricia
Chris Uppal - 15 May 2006 14:22 GMT > If rounded is not an exact power of two, the truncation has a mantissa > one less than rounded. If it is an exact power of two, it has an > exponent one less than rounded, and mantissa all ones. Either way, > subtracting one from the rounded bit pattern gets the bit pattern for > truncated. Ah, I see. The IEEE fp layout is subtle.
Thank you.
> If you prefer to avoid double, just do the abs in int: > [quoted text clipped - 8 lines] > } > } Unfotunately, the comparison doesn't work for Integer.MAX_VALUE, nor for values near Integer.MIN_VALUE. Two example failure cases are:
2147483647 -2147483638
Some test code (which uses a routine from yesterday's post): ============= public class Test2 { public static void main(String[] args) { test(2147483647); test(-2147483638); }
static void test(int input) { System.out.printf("input: %d%n", input); System.out.printf("desired output: %f%n", Test.slowTruncate(input));
float rounded = (float)input; System.out.printf("rounded: %f%n", rounded);
System.out.printf("(int)rounded: %d%n", (int)rounded); System.out.printf("Math.abs((int)rounded): %d%n", Math.abs((int)rounded)); System.out.printf("Math.abs(input): %d%n", Math.abs(input)); System.out.println(); } } =============
Which produces output:
input: 2147483647 desired output: 2147483520.000000 rounded: 2147483648.000000 (int)rounded: 2147483647 Math.abs((int)rounded): 2147483647 Math.abs(input): 2147483647
input: -2147483638 desired output: -2147483520.000000 rounded: -2147483648.000000 (int)rounded: -2147483648 Math.abs((int)rounded): -2147483648 Math.abs(input): 2147483638
I find the negative output from Math.abs() particularly entertaining ;-)
-- chris
Patricia Shanahan - 15 May 2006 16:21 GMT >>If rounded is not an exact power of two, the truncation has a mantissa >>one less than rounded. If it is an exact power of two, it has an [quoted text clipped - 7 lines] > >>If you prefer to avoid double, just do the abs in int: ...
> Unfotunately, the comparison doesn't work for Integer.MAX_VALUE, nor for values > near Integer.MIN_VALUE. Two example failure cases are: ...
> I find the negative output from Math.abs() particularly entertaining ;-) Good point. Shows the folly of not testing at least the maximum and minimum value, as well as some ordinary values.
Patricia
Kai Koehne - 14 May 2006 15:12 GMT Wow! Thank you all for this thorough discussion ... Two days ago I was not even sure if truncating conversion can be done at all, and now I have five working solutions + performance tests. I have the strong feeling to owe you a beverage of your choice :-)
Thank you!
Kai Koehne
Chris Uppal - 13 May 2006 11:11 GMT > I need an int to real32 conversion with IEEE-754 round-towards-zero mode Java doesn't include a spectacularly complete set of IEEE-754 operations, that's one of the ones that's missing. I believe that more stuff will be added in 1.6 (see http://download.java.net/jdk6/docs/api/java/lang/Math.html ) but I don't know if that will provide what you need.
If not, or if you can't wait, then I think you'll just have to program it yourself. Math.ulp(float) may help.
-- chris
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|