##
Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal

# Introduction

Last month we released the June 08 CTP of the Parallel Extensions to the .NET Framework. For more information and to get the latest updates, see http://msdn.microsoft.com/concurrency and the team blog at http://blogs.msdn.com/pfxteam. In this post (of a longer series) I want to highlight one particular feature called **futures**. But first, to get started: download the CTP, install it, open VS 2008 and create a new Console Application (I'll be using C#) and add a reference to the System.Threading.dll assembly that gets installed by the Parallel Extensions:

There's a bunch of stuff in this CTP, as you can see from the Object Browser:

I'll be covering quite some of this in subsequent posts, but for now we simply stick with **System.Threading.Tasks.Future<T>**. So, what's a future? In short, a future is a task that's supposed to produce a value some time in the *future*. Question becomes: what's a task? One can think of a task as a modern thread, maintained by a task manager which implements a (work-stealing) scheduler. Without going in details for now, it suffices to think of tasks as the parallel version of procedures while futures play the role of parallel equivalents to functions. Actually, a future is a task with just something more: a value.

# The Future<T> 101

Using a future is simple. To create one, use the Future.Create factory method:

All you have to supply is a function that returns a value of type T. That function can take all the time it wants to calculate that particular value without blocking other "work" in the application. This obviously only holds till someone needs the value being produced by the future. Let's make an easy sample:

Here we have a lottery mechanism that picks a ball with numbers from 1-42 which takes between 1 and 10 seconds to complete. To specify the future's code, we're using a C# 3.0 lambda expression although the 2.0 style "delegate" keyword would be appropriate here too. The important thing here is the fact the "Lottery in progress..." message will be shown while the future is calculating its value. In other words, the main thread isn't blocked by the background calculation, till we call the Value property on the future which waits for the value to be calculated in order to obtain it. Obviously, a lottery would be more complicated since the system has to wait for more balls to get picked (let's not discuss whether or not the numbers need to be unique; uniqueness gives some interesting concurrent collection access worries depending on how you implement it). A possible refinement is this:

Here we use 6 futures to represent the 6 balls in the lottery and we wait for all balls to get picked using the (new in the last CTP) **WaitAll** method. I'll leave it as an exercise to figure out what it would take to print the selection of balls to the screen immediately as soon as they arrive (tip: **WaitAny** goes in the right direction but the array might not be appropriate).

# An ancient passion: calculating PI

It has been done before but it's still an attractive business for math freaks: calculating PI. There are lots of algorithms that do this based on the development of series (e.g. by rewriting PI in terms of arctan functions which can be represented using infinite McLaurin series). One more recent algorithm is the **Bailey-Borwein-Plouffe** (shorthand BBP - On the Rapid Computation of Various Polylogarithmic Constants) algorithm which has some interesting characteristics:

**Non-decimal:**calculates PI in base 16 hexadecimal.**Random access:**allows to calculate an individual digit without having to calculate the digits that precede it.**Distribution friendly:**can be used to calculate portions of PI in parallel.

The formula looks like this and is based on infinite series:

The crux - due to Plouffe - is the 16^k in the denominator which is precisely what base 16 fractionals look like, e.g.:

In other words, to calculate the digit at a given position it suffices to calculate the following:

with the overbar notation standing for "fraction of". Obviously the left-hand side stands for the digit at position *pos* in the (hexadecimally represented) value of PI. The right-hand side is a simple rewrite of the original formula multiplied by 16^pos, allowing for the following refactoring:

where

This allows for efficient implementation using some binary exponentiation techniques for the power evaluations and with some modulo calculus. I tried to document the code as much as possible to illustrate how it works:

/// <summary>

/// Calculation of PI using the BBP (Bailey-Borwein-Plouffe) formula.

/// </summary>

static class Pi

{

/// <summary>

/// Powers of two. Used in the binary exponentiation algorithm (Knuth Vol II - 4.6.3).

/// </summary>

static int[] s_powsOf2;/// <summary>

/// Hexadecimal characters.

/// </summary>

static string HEX = "0123456789ABCDEF";/// <summary>

/// Static constructor.

/// </summary>

static Pi()

{

s_powsOf2 = new int[25];//

// Set powers of two for use in binary exponentiation algorithm.

//

s_powsOf2[0] = 1;

for (int i = 1; i < s_powsOf2.Length; i++)

s_powsOf2[i] = s_powsOf2[i - 1] * 2;

}

/// <summary>

/// Calculates a given number of hexadecimal digits of PI from the given start position.

/// </summary>

/// <param name="start">Hexadecimal position to start calculating hexadecimal digits from.</param>

/// <param name="length">Number of hexadecimal digits to calculate. Needs to be multiple of 10.</param>

/// <returns>Requested hexadecimal digits of PI.</returns>

public static string Calculate(int start, int length)

{

if (start < 0)

throw new ArgumentException("start");if (length < 0 || length % 10 != 0)

throw new ArgumentException("length");StringBuilder sb = new StringBuilder();

//

// Calculate in chunks of 10 given our precision.

//

for (int pos = start; pos < start + length; pos += 10)

{

double val = 4 * Sum(1, pos) - 2 * Sum(4, pos) - Sum(5, pos) - Sum(6, pos);

val = val - (int)val + 1;

sb.Append(ToHex(val));

}return sb.ToString();

}/// <summary>

/// This method calculates the fractional part of SUM[i = 0..+INF](1 / (16^i * (8 * i + j))).

/// </summary>

/// <param name="j">See formula.</param>

/// <param name="pos">Starting position for requested hexadecimal part of PI's fraction.</param>

/// <returns>See formula.</returns>

static double Sum(int j, int pos)

{

/*

* Implementation based on the following observation:

*

* SUM[i = 0..+INF](1 / (16^i * (8 * i + j)))

*

* = SUM[k = 0..d]((16^(d - k) mod (8 * k + j)) / (8 * k + j)))

* + SUM[k = d + 1..+INF](16^(d - k) / (8 * k + j)))

*/double res = 0.0;

int k = 0;

//

// First part of the sum using binary exponentiation with modulo operation.

//

for (; k < pos; k++)

{

double n = 8 * k + j;

res += PowMod(16.0, pos - k, n) / n;

res -= (int)res;

}//

// For the remainder terms, go till we reach the precision of 10^-17,

// which allows to reach the required precision in hexadecimals.

//

for (; ; k++)

{

double t = Math.Pow(16.0, pos - k) / (8 * k + j);

if (t < 1e-17)

break;

res += t;

res -= (int)res;

}return res;

}/// <summary>

/// Returns a specified number raised to the specified power, module the given modulus.

/// </summary>

/// <param name="x">A double-precision floating-point number to be raised to a power.</param>

/// <param name="y">A double-precision floating-point number that specifies a power.</param>

/// <param name="m">A double-precision floating-point number that specified the modulus.</param>

/// <returns>x^y mod m</returns>

static double PowMod(double x, double y, double m)

{

if (m == 1.0)

return 0.0;//

// We look for the largest bit position used by the exponent.

// E.g. 16^9 mod 9 = (16 * 16^(2^3)) mod 9

// ^

//

int i;

for (i = 0; i < s_powsOf2.Length; i++)

if (s_powsOf2[i] > y)

break;double pow2 = s_powsOf2[i - 1];

//

// Set remaining number of exponentiations and result to initial values.

//

double rem = y;

double res = 1.0;for (int j = 0; j < i; j++)

{

//

// "Remainder" steps that do not fit in binary exponentiation.

//

// j res res'

// -- -------------- ----

// 0 16^0 mod 9 = 1 16^1 mod 9 = (1 * 16) mod 9 = 7

// 3 16^8 mod 9 = 4 16^9 mod P = (4 * 16) mod 9 = 1

//

if (rem >= pow2)

{

res = (res * x) % m;

rem -= pow2;

}pow2 /= 2;

//

// Binary exponentation step.

//

// j res res'

// -- -------------- ------------------------------

// 0 16^1 mod 9 = 7 16^2 mod 9 = (7 * 7) mod 9 = 4

// 1 16^2 mod 9 = 4 16^4 mod 9 = (4 * 4) mod 9 = 7

// 2 16^4 mod 9 = 7 16^8 mod 9 = (7 * 7) mod 9 = 4

//

if (pow2 >= 1.0)

{

res = (res * res) % m;

}

}return res;

}/// <summary>

/// Converts the fractional part of the specified double-precision floating-point number

/// to hexadecimal representation.

/// </summary>

/// <param name="d">Double-precision floating-point number in base 10.</param>

/// <returns>String representation of the number's fractional part in base 16.</returns>

/// <example>

/// Sample: d := 0.14159265358979334

///

/// f := (d - (int)d ) 16 * f Hex((int)(16 * f))

/// ------------------- ------------------- ------------------

/// 0. 0.14159265358979334 2.2654824574366934 2

/// 1. 0.26548245743669340 4.2477193189870945 4

/// 2. 0.24771931898709450 3.9635091037935126 3

/// 3. 0.96350910379351260 15.4161456606962020 F

/// 4. 0.41614566069620200 6.6583305711392313 6

/// 5. 0.65833057113923130 10.5332891382277010 A

/// 6. 0.53328913822770100 8.5326262116432190 8

/// 7. 0.53262621164321900 8.5220193862915039 8

/// 8. 0.52201938629150390 8.3523101806640625 8

/// 9. 0.35231018066406250 5.6369628906250000 5

///

/// Reverse: h := 0.243F6A8885

///

/// Weight Value

/// ----------- -----------

/// 0. 2 * 16^-1 0.12500000000000000000

/// 1. 4 * 16^-2 0.01562500000000000000

/// 2. 3 * 16^-3 0.00073242187500000000

/// 3. F * 16^-4 0.00022888183593750000

/// 4. 6 * 16^-5 0.00000572204589843750

/// 5. A * 16^-6 0.00000059604644775391

/// 6. 8 * 16^-7 0.00000002980232238770

/// 7. 8 * 16^-8 0.00000000186264514923

/// 8. 8 * 16^-9 0.00000000011641532183

/// 9. 5 * 16^-10 0.00000000000454747351

/// +----------------------

/// 0.14159265358921400000

/// ^^^^^^^^^^^^

/// </example>

static string ToHex(double d)

{

char[] c = new char[10];for (int i = 0; i < c.Length; i++)

{

d = 16 * (d - (int)d);

c[i] = HEX[(int)d];

}return new string(c);

}

}

# Putting futures to work

The actual implementation is actually of little interest in this context; using it with futures is more of interest now. Here's a sample use of it:

And here's the result:

The parallel version only takes **58%** of the original sequential execution! Before going any further, there's a little interesting thing in the implementation above: some manual partitioning. The first future calculates from 0 to 7000 and the second one from 7000 to 10000. In general with work-stealing applied in the scheduler such partitioning can be eliminated as long as the individual work items that get executed by tasks or futures do not have a clear predictable relative execution duration. For this particular case however, we have intrinsic knowledge about the algorithm and the fact it actually scales linearly based on the position of the hexadecimal digits requested. In particular, if you'd plot the duration of calculations of chunks against the time it takes, you'd see something along those lines:

To partition the total work correctly it suffices to solve one quadratic equation derived from an integration so that the yellow (0 to x) and blue (x to 10000) portions have the same surface:

You'll find x to be about 7000 in this case which indicates the optimum partitioning for calculation of digits 0 to 10000. If you'd partition around 5000, it's clear that the second future would have a longer execution time than the first one as seen from the surface under the curve (line) in the graph above. Also, the fact we're partitioning in two halves implies we're not optimizing for anything beyond dual core (higher-order partitioning of the problem is less trivial but still possible). Also, with arbitrary inputs from the user we'd have to calculate the partitioning at runtime instead.

So, how does execution look like?

Clearly, in the sequential case (between the first two red lines) not all processors are busy but in the case with futures (between the last two red lines), both processors go up to 100% utilization and as one can observe, the algorithm runs faster. In terms of threads you'll see this in the Performance Monitor:

As soon as the parallel execution kicks in, three additional threads are created (of which two are doing work for our futures). In the curve above this is observed by the jump from 6 to 9 threads.

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicksFiled under: Parallel Extensions

## # 2 Static » Blog Archive » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal

Pingback from 2 Static » Blog Archive » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal

## # Reflective Perspective - Chris Alcock » The Morning Brew #129

Pingback from Reflective Perspective - Chris Alcock » The Morning Brew #129

## # » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal A Side: What The World Is Saying About A Side

Pingback from » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal A Side: What The World Is Saying About A Side

## # 2 Static » Blog Archive » 2 Static ?? Blog Archive ?? Parallel Extensions - Using Futures to …

Pingback from 2 Static » Blog Archive » 2 Static ?? Blog Archive ?? Parallel Extensions - Using Futures to …

## # Dew Drop - July 4, 2008 | Alvin Ashcraft's Morning Dew

Pingback from Dew Drop - July 4, 2008 | Alvin Ashcraft's Morning Dew

## # re: Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal

Neat, but we're still in the O(N^2).. I wonder if it can be pushed closer to O(N logN)?

Have you checked to see how much gain there would be had with memoization? It seems to me that the first part of the calculation of res is needlessly duplicated between the two threads.

Obviously, memory would be consumed, but you could compromise and track every 1000th res(k), which later calculations could shortcut to. You would start k and res, basically, at a much later point in the calculation as you calculate for later pos.

## # Ancient » Blog Archive » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal

Pingback from Ancient » Blog Archive » Parallel Extensions - Using Futures to Calculate Pi in Hexadecimal