## Sunday, September 30, 2007

## Wednesday, September 26, 2007

### Naked as We Came

One more video... According to the link, in addition to creating the music, the brilliant Sam Beam also directed the video. If you pay attention to the lyrics, you'll find it's a sad song, but it's one of the best sad songs I've heard.

Artist: Iron & Wine

Song: Naked as We Came

Disc: Our Endless Numbered Days

Year: 2004

Artist: Iron & Wine

Song: Naked as We Came

Disc: Our Endless Numbered Days

Year: 2004

## Tuesday, September 25, 2007

### Roots, Linear RGB and sRGB

^{}

^{}

Given an efficient implementation of cube root and quintic root, I want to note the implications for conversions between linear RGB and sRGB. These conversions involve powers of 2.4 and 1/2.4 for gamma correction.

Consider the case of x

^{2.4}

x

^{2.4}= x

^{1.2}* x

^{1.2}

x

^{1.2}= x*x

^{1/5}

Given a fast and accurate calculation of quintic roots, efficient calculation of x

^{2.4}is straightforward.

For the case of x

^{1/2.4}, since 1/2.4 = 5/12, we can consider some combination of powers such as the following:

x

^{1/2.4}=((x

^{1/4})

^{5})

^{1/3}

For the 1/4th power, if our target processor has an efficient square root built in, we might rely on sqrt(sqrt(x)); otherwise, we might opt for an efficient implementation of quartic root. With a little luck, we may wind up with something faster than calling pow( ).

If we use a 2.2. gamma, a quintic root will also come in handy because:

x

^{2.2}= x

^{2}*x

^{1/5}

The challenge with 2.2 gamma is the inverse, which amounts to x

^{5/11}. For x in the range 0 to 1, this amounts to the square root with a small correction: x

^{5/11}= x

^{1/2}/ x

^{1/22}.

I'll leave that as something to think about.

*Note: Hopefully none of the ideas here have been patented. If not, consider this another demonstration of prior art.*

### Quintic roots via Halley

In the previous post, I posted a couple of C++ template functions to approximate n

This time, I'm going to post a little function that uses Halley's method to calculate the quintic root of a real number.

inline double

{

// iterative quintic root approximation using Halley's method (double)const double x5 = x*x*x*x*x;

return x*(3*R + 2*x5) / (2*R + 3*x5);

}

Given a guess 'x' for the quintic root of R, this function will return an approximation that is closer to the quintic root of R than 'x', and the function will exhibit cubic convergence.

You can combine calls to this function with the previously posted nth_roota( ) function to generate an approximation of a quintic root. The iteration, halley_quint1d( ), can be called repeatedly to a desired degree of accuracy.

For example:

double

{

return halley_quint1d(nth_roota<5>(R), R);

}

Of course one could also use Newton-Raphson, but I'll leave that as an exercise to the reader.

Lazily, I used wxMaxima to work out the simplified version of Halley's method applied to quintic roots. If you'd like to verify it yourself, fire up wxMaxima (an excellent free program I highly recommend) and enter the following:

(%i1) x^5-R;

(%i2)

(%i3) ratsimp(%);

^{th}roots via a bit hack. The double case can be optimized using a method such as Kahan's, but both the 32-bit and 64-bit cases appear to be working.This time, I'm going to post a little function that uses Halley's method to calculate the quintic root of a real number.

inline double

**halley_quint1d**(double x, double R){

// iterative quintic root approximation using Halley's method (double)const double x5 = x*x*x*x*x;

return x*(3*R + 2*x5) / (2*R + 3*x5);

}

Given a guess 'x' for the quintic root of R, this function will return an approximation that is closer to the quintic root of R than 'x', and the function will exhibit cubic convergence.

You can combine calls to this function with the previously posted nth_roota( ) function to generate an approximation of a quintic root. The iteration, halley_quint1d( ), can be called repeatedly to a desired degree of accuracy.

For example:

double

**quintica**(double R){

return halley_quint1d(nth_roota<5>(R), R);

}

Of course one could also use Newton-Raphson, but I'll leave that as an exercise to the reader.

Lazily, I used wxMaxima to work out the simplified version of Halley's method applied to quintic roots. If you'd like to verify it yourself, fire up wxMaxima (an excellent free program I highly recommend) and enter the following:

(%i1) x^5-R;

(%i2)

*x - 2*%o1*diff(%o1,x)/(2*diff(%o1,x)^2-%o1*diff(%o1,x,2));*(%i3) ratsimp(%);

### N-th root approximations

Continuing along with a previous post on fast calculation of cube roots, here are a couple of C++ template functions for bit hack n

template <int n>

__forceinline float nth_roota(float x)

{

const int ebits = 8;

const int fbits = 23;

int& i = (int&) x;

const int bias = (1 <<(ebits-1))-1;

i = (i - (bias <<fbits)) / n + (bias <<fbits);

return x;

}

template <int n>

__forceinline double nth_roota(double x)

{

const int ebits = 11;

const int fbits = 52;

_int64& i = (_int64&) x;

const _int64 bias = (1 <<(ebits-1))-1;

i = (i - (bias <<fbits)) / n + (bias <<fbits);

return x;

}

I'll discuss this more in a subsequent post. Sorry for the lack of indentation. (Blogger)And the handling of less than and greater than symbols which seems to make Blogger choke as well. If you see ampersands followed by 'gt' or 'lt', these need to be translated.

^{th}root approximations.template <int n>

__forceinline float nth_roota(float x)

{

const int ebits = 8;

const int fbits = 23;

int& i = (int&) x;

const int bias = (1 <<(ebits-1))-1;

i = (i - (bias <<fbits)) / n + (bias <<fbits);

return x;

}

template <int n>

__forceinline double nth_roota(double x)

{

const int ebits = 11;

const int fbits = 52;

_int64& i = (_int64&) x;

const _int64 bias = (1 <<(ebits-1))-1;

i = (i - (bias <<fbits)) / n + (bias <<fbits);

return x;

}

I'll discuss this more in a subsequent post. Sorry for the lack of indentation. (Blogger)And the handling of less than and greater than symbols which seems to make Blogger choke as well. If you see ampersands followed by 'gt' or 'lt', these need to be translated.

## Monday, September 24, 2007

## Sunday, September 23, 2007

### "Because I'm cheap..."

Both Spolsky and Graham can write, but my favorite geeky writer is Paul Ford:

"...Admitting that I lack real talent for it, I've turned computer science into a hobby, like weekend landscape watercolorists or dabblers in piano. I mostly like it because I'm cheap. Piano, calligraphy, swinging, macrame--those require trips to special stores, catalog orders, lessons. Computer science is only there inside the machine, waiting for you to sit down with a copy of some heavy book or eager tutorial (the Web has many pushers trying to hook you on a syntax) and start twiddling. It seems that the discipline is entirely about cheapness. The glory goes to parsimony, to the algorithms that invest the fewest CPU cycles for the greatest return..."

link

"...Admitting that I lack real talent for it, I've turned computer science into a hobby, like weekend landscape watercolorists or dabblers in piano. I mostly like it because I'm cheap. Piano, calligraphy, swinging, macrame--those require trips to special stores, catalog orders, lessons. Computer science is only there inside the machine, waiting for you to sit down with a copy of some heavy book or eager tutorial (the Web has many pushers trying to hook you on a syntax) and start twiddling. It seems that the discipline is entirely about cheapness. The glory goes to parsimony, to the algorithms that invest the fewest CPU cycles for the greatest return..."

link

## Saturday, September 22, 2007

### Save Myself

It's about time for another Willy Mason tune...

Artist: Willy Mason

Song: Save Myself

Disc: If the Ocean Gets Rough

Year: 2007

Artist: Willy Mason

Song: Save Myself

Disc: If the Ocean Gets Rough

Year: 2007

### Social Networking

"I just don't get the social networking thing. It doesn't appeal to me. I have no interest in it."

So said a colleague of mine yesterday.

A thought crossed my mind.

There must be millions of people who feel the same way.

An untapped market, perhaps.

If only there were a place for these people to connect...

Some place online...

So said a colleague of mine yesterday.

A thought crossed my mind.

There must be millions of people who feel the same way.

An untapped market, perhaps.

If only there were a place for these people to connect...

Some place online...

## Friday, September 21, 2007

### Clamping integers to bytes

This is yet another bit hack to avoid branching.

If you want to clamp an n-bit signed integer such that negative values become zero, you can do it with a bitwise AND of the integer and the bitwise NOT of the integer shifted right by n-1 bits. E.g., in the case of 32-bit integers, the following:

int clamp_lower_to_zero(int i)

{

return (i & ~(i>>31));

}

This works because in the case of negative numbers, the result of the shift is all 32 bits set to one. In the case of non-negative numbers the result is all 32 bits set to zero. Applying the NOT reverses the situation, so the negative case results in zeroes and the positive case results in ones. Once this is AND-ed with the original value, all negative cases result in zero.

You can use a similar trick to clamp the upper bound of a signed int to 255 (or some other case of 2

unsigned char clamp_upper_to_255(int i)

{

return (unsigned char) (((255-i) >> 31) OR i);

}

When i <= 255, the result of the shift is 0. When i > 255, the result of the shift is 0xffffffff, which is cast to 255 (0xff).

You can combine the two to clamp a signed integer to a byte:

unsigned char clamp_int_to_byte(int i)

{

return (unsigned char) (((255-i)>>31) OR (i & ~(i>>31)));

}

This isn't groundbreaking stuff by any means, but I haven't seen this technique posted anywhere--and people* patent the stupidest things--so, if this hasn't been patented yet, it's documented as prior art here.

If you want to clamp an n-bit signed integer such that negative values become zero, you can do it with a bitwise AND of the integer and the bitwise NOT of the integer shifted right by n-1 bits. E.g., in the case of 32-bit integers, the following:

int clamp_lower_to_zero(int i)

{

return (i & ~(i>>31));

}

This works because in the case of negative numbers, the result of the shift is all 32 bits set to one. In the case of non-negative numbers the result is all 32 bits set to zero. Applying the NOT reverses the situation, so the negative case results in zeroes and the positive case results in ones. Once this is AND-ed with the original value, all negative cases result in zero.

You can use a similar trick to clamp the upper bound of a signed int to 255 (or some other case of 2

^{n}-1).unsigned char clamp_upper_to_255(int i)

{

return (unsigned char) (((255-i) >> 31) OR i);

}

When i <= 255, the result of the shift is 0. When i > 255, the result of the shift is 0xffffffff, which is cast to 255 (0xff).

You can combine the two to clamp a signed integer to a byte:

unsigned char clamp_int_to_byte(int i)

{

return (unsigned char) (((255-i)>>31) OR (i & ~(i>>31)));

}

*Note: I had to use "OR" rather than a vertical bar. Blogger can't seem to handle the bar.*This isn't groundbreaking stuff by any means, but I haven't seen this technique posted anywhere--and people* patent the stupidest things--so, if this hasn't been patented yet, it's documented as prior art here.

**Restraint is exerised in the word choice.*## Thursday, September 20, 2007

### 1234

A break from cube roots. Via Morty Blog, the Feist tune in the new iPod nano commericial.

Artist: Feist

Song: 1234

Disc: The Reminder

Year: 2007

Artist: Feist

Song: 1234

Disc: The Reminder

Year: 2007

## Saturday, September 15, 2007

### Faster Cube Root III

This is post #3 on the subject of faster cube roots (#1, #2).

One of the challenges with efficiently leveraging iterative methods such as Newton-Raphson and Halley's is coming up with a good initial guess to feed into the iteration. This is the topic of this post.

One approach involves a bit hack exploiting properties of IEEE 754 floating point numbers. This sort of strategy is used in Kahan's implementation of cbrt( ) as found in the BSD sources. Some knowledge of the guts of floating point numbers is helpful.

Single precision numbers consist of three bit fields consuming 32 bits in the following way:

S[1] E[8] F[23]

The first bit, S, determines the sign of the number. If the bit is set, the floating point number is negative; otherwise, it's positive. The next 8 bits represent the exponent with a bias of 127. The last 23 bits form the fractional part such that:

value = 2^(E-127) * (1.0 + F/(2^24))

A key insight comes when the memory occupied by a IEEE 754 float is reinterpreted as an integer. When this is done, the integer in question is approximately logarithmic; after the exponent bias is subtracted, what remains is a log

For example, a function like the following can be used to approximate log

float log2_approx(float x)

{

unsigned int& i = (unsigned int&) x;

return float(((i&0x7fffffff) - (127<<23))) * (1.0f / (1<<23));

}

In the case of cube root, we'll divide the integer representation by 3 for an approximation of cbrt(x), which we can then refine using an iterative method such as Newton-Raphson or Halley's. The following function returns a cube root approximation with max relative error around 5% (in my tests). The exponent bias is subtracted, the integer representation is divided by three and then the exponent bias is restored. C

float cbrta(float x)

{

int& i = (int&) x;

i = (i - (127<<23)) / 3 + (127<<23);

return x;

}

Note: I've collected some notes on my investigations into faster cube roots here:In Search of a Fast Cube Root. Some sample C code is also available.

Homework:

1. Reduce to a divide and an add.

2. Replace the divide with a multiply.

3. Implement a double precision version.

4. Create general n-th root approximation.

One of the challenges with efficiently leveraging iterative methods such as Newton-Raphson and Halley's is coming up with a good initial guess to feed into the iteration. This is the topic of this post.

One approach involves a bit hack exploiting properties of IEEE 754 floating point numbers. This sort of strategy is used in Kahan's implementation of cbrt( ) as found in the BSD sources. Some knowledge of the guts of floating point numbers is helpful.

Single precision numbers consist of three bit fields consuming 32 bits in the following way:

S[1] E[8] F[23]

The first bit, S, determines the sign of the number. If the bit is set, the floating point number is negative; otherwise, it's positive. The next 8 bits represent the exponent with a bias of 127. The last 23 bits form the fractional part such that:

value = 2^(E-127) * (1.0 + F/(2^24))

*See Wikipedia and attendant references for a more detailed explanation:**IEEE 754**.*A key insight comes when the memory occupied by a IEEE 754 float is reinterpreted as an integer. When this is done, the integer in question is approximately logarithmic; after the exponent bias is subtracted, what remains is a log

_{2}(x) approximation scaled by 2^23.For example, a function like the following can be used to approximate log

_{2}(x) by masking off the sign bit (u & 0x7fffffff), subtracting the exponent bias and scaling appropriately.float log2_approx(float x)

{

unsigned int& i = (unsigned int&) x;

return float(((i&0x7fffffff) - (127<<23))) * (1.0f / (1<<23));

}

In the case of cube root, we'll divide the integer representation by 3 for an approximation of cbrt(x), which we can then refine using an iterative method such as Newton-Raphson or Halley's. The following function returns a cube root approximation with max relative error around 5% (in my tests). The exponent bias is subtracted, the integer representation is divided by three and then the exponent bias is restored. C

*aveat:**It is assumed x >= 0. If negative x is possible, the sign bit needs to be saved and restored.*float cbrta(float x)

{

int& i = (int&) x;

i = (i - (127<<23)) / 3 + (127<<23);

return x;

}

Note: I've collected some notes on my investigations into faster cube roots here:In Search of a Fast Cube Root. Some sample C code is also available.

Homework:

1. Reduce to a divide and an add.

2. Replace the divide with a multiply.

3. Implement a double precision version.

4. Create general n-th root approximation.

## Friday, September 14, 2007

### Faster Cube Root II

Yesterday's post discussed an easily calculated cube root approximation with cubic convergence. More spelunking led me to realize the function is the result of a root-finding method known as Halley's method (the fellow with the comet named after him). So it's actually quite old, and, it seems, over the years it has been rediscovered a few times and attributed to others (Lambert, Bailey, etc).

If I ever studied Halley's method, I don't remember doing so. A major strength is that it does exhibit cubic convergence (in contrast to Newton's quadratic convergence). Even if it requires more ops per iteration than Newton's method, an iteration of either method requires a divide. In my experience, there are cases where Halley's method can pay off.

Here's the Wikipedia entry on it: Halley's method.

Halley's method:

x

In the case of the cube root of R:

f(x) = x

f'(x) = 3x

f''(x) = 6x

Plugging them into Halley:

x

Which simplifies to:

x

And it's only a little shuffling to get to the function I posted.

The next post looks at coming up with an initial guess for cube root.

If I ever studied Halley's method, I don't remember doing so. A major strength is that it does exhibit cubic convergence (in contrast to Newton's quadratic convergence). Even if it requires more ops per iteration than Newton's method, an iteration of either method requires a divide. In my experience, there are cases where Halley's method can pay off.

Here's the Wikipedia entry on it: Halley's method.

Halley's method:

x

_{n+1}= x_{n}- 2f(x_{n})f'(x_{n}) / (2 (f'(x_{n})^{2}- f(x_{n})f''(x_{n}))In the case of the cube root of R:

f(x) = x

^{3}- Rf'(x) = 3x

^{2}f''(x) = 6x

Plugging them into Halley:

x

_{n+1}= x - (6x^{2})(x^{3}-R) / (18x^{4}- 6x(x^{3}-R))Which simplifies to:

x

_{n+1}= (2xR + x^{4}) / (R + 2x^{3})And it's only a little shuffling to get to the function I posted.

The next post looks at coming up with an initial guess for cube root.

## Thursday, September 13, 2007

### Faster Cube Root

Over the weekend, I decided to do some spelunking for a faster cube root. After a lot of searching, I realized most of the existing implementations (Kahan/BSD, Turkowski, et al.) rely on Newton-Raphson.

I found a little gem in a 1942 (!) volume of the Journal of the American Statistical Association by Otis Lancaster,

Given an approximation

In my tests, at the cost of an extra add* and multiply, Lancaster converges faster than Newton-Raphson--the number of bits of precision seems to triple with each iteration, which is better than Newton-Raphson's doubling. With two iterations, it's the difference between a 4X improvement and a 9X improvement (quadratic vs cubic convergence).

Consequences. Your initial guess doesn't have to be much better than 5 bits of precision to get 16 bits of precision with one iteration of Lancaster. With a 6 bit initial guess, two iterations of Lancaster will give you near double precision (52 bit). If an initial guess is in the 7 to 8 bit range, one iteration will give you single precision (23 bit).

This is another subject I hope to write more about.

*The (a3 + R) add may be done only once.

** I've realized Lancaster's equation is the derived from Halley's method. More here.

I found a little gem in a 1942 (!) volume of the Journal of the American Statistical Association by Otis Lancaster,

*Machine Method for the Extraction of Cube Root*. With a little massaging of Lancaster's equation**, I offer you:

double Lancaster(const double a, const double R)

{

const double a3 = a*a*a;

const double b= a * (a3 + R + R) / (a3 + a3 + R);

return b;

}

Given an approximation

**a**of the cube root of**R**, the function will return an approximation of the cube root of**R**that is closer than**a**.In my tests, at the cost of an extra add* and multiply, Lancaster converges faster than Newton-Raphson--the number of bits of precision seems to triple with each iteration, which is better than Newton-Raphson's doubling. With two iterations, it's the difference between a 4X improvement and a 9X improvement (quadratic vs cubic convergence).

Consequences. Your initial guess doesn't have to be much better than 5 bits of precision to get 16 bits of precision with one iteration of Lancaster. With a 6 bit initial guess, two iterations of Lancaster will give you near double precision (52 bit). If an initial guess is in the 7 to 8 bit range, one iteration will give you single precision (23 bit).

*Note: Kahan's cbrt( ) includes a bit hack that gives a 5-bit initial guess.*This is another subject I hope to write more about.

*The (a3 + R) add may be done only once.

** I've realized Lancaster's equation is the derived from Halley's method. More here.

*(Excuse the lack of indentation in the function. Blogger is still horrible at this.)*## Tuesday, September 11, 2007

### Every day is like Friday

via my brother...

What if Morrissey were happier?*

Artist: Voxtrot

Song: The Start of Something

What if Morrissey were happier?*

Artist: Voxtrot

Song: The Start of Something

**Or maybe there's a little**Housemartins**channeling here.*### Idiots

"Now I

*know*somethin about idiots. Probly the only thing I do know bout, but I done read up on em--all the way from that Doy-chee-eveski guy's idiot, to King Lear's fool, an Faulkner's idiot, Benjie, an even ole Boo Radley in*To Kill a Mockingbird*--now he was a*serious*idiot. The one I like best tho is ole Lennie in*Of Mice and Men*. Mos of them writer fellers got it straight--cause their idiots always smarter than people give em credit for. Hell, I'd agree with that. Any idiot would. Hee Hee." --*Forrest Gump, p. 2.*## Thursday, September 06, 2007

## Wednesday, September 05, 2007

### Cheby to the Levee...*

In the interest of the gentrification of mathematical knowledge and odd tricks with JavaScript, I threw together a Chebyshev Approximation Applet, something I haven't been able to find online.

So, if you've been champing at the bit for an accurate polynomial approximation of a function, give it a whirl--you just might get lucky.

*File under "Bad Puns"

So, if you've been champing at the bit for an accurate polynomial approximation of a function, give it a whirl--you just might get lucky.

*More to come in a subsequent post.**File under "Bad Puns"

## Saturday, September 01, 2007

### 8 Puzzle

Another great puzzle I picked up at SIGGRAPH.

Consider the following set of symbols:

By adding these symbols in various combinations, each of the numbers 720 through 729 can be transformed into an expression yielding 8.

For example, the answer to 720 is:

7 + 2^0 = 8

The answers to 721, 722, 723, 724, 725, 726, 727, 728, 729 are left to you.

Some are easy. Some are pretty challenging.

Enjoy!

(Note to those less familiar with certain notation: The circumflex symbol indicates powers; i.e., x^2 = x to the power of 2 = x*x)

Consider the following set of symbols:

**( ) . + - * / ^**By adding these symbols in various combinations, each of the numbers 720 through 729 can be transformed into an expression yielding 8.

For example, the answer to 720 is:

7 + 2^0 = 8

The answers to 721, 722, 723, 724, 725, 726, 727, 728, 729 are left to you.

Some are easy. Some are pretty challenging.

Enjoy!

(Note to those less familiar with certain notation: The circumflex symbol indicates powers; i.e., x^2 = x to the power of 2 = x*x)