This quiz was derived from an old piece of code I reviewed for a student's project somewhere in the past. Actually, I was doing some perf-related work today, so I thought it might be useful to have a little quiz around this. Below is the code (reduced in size) of a matrix implementation in C#. Using operator overloads, it supports various operations of which I only kept the multiplication operation (implemented in a very naive way):

```using System;
using System.Diagnostics;

class Matrix
{
private double[,] m;

public Matrix(int dim0, int dim1)
{
m = new double[dim0,dim1];
}

public int Height { get { return m.GetLength(0); } }
public int Width  { get { return m.GetLength(1); } }

public double this[int x, int y]
{
get { return m[x,y];  }
set { m[x,y] = value; }
}

public static Matrix operator*(Matrix m1, Matrix m2)
{
if (m1.Width != m2.Height)
throw new InvalidOperationException("Matrices should have compatible dimensions for multiplication.");

Matrix m = new Matrix(m1.Height, m2.Width);
for(int i = 0; i < m.Height; i++)
{
for(int j = 0; j < m.Width; j++)
{
m[i,j] = 0;
for (int k = 0; k < m1.Width; k++)
m[i,j] += m1[i,k] * m2[k,j];
}
}
return m;
}
}

class Program
{
static void Main()
{
Random rand = new Random();
Matrix m1 = new Matrix(20,30);
for (int i = 0; i < m1.Height; i++)
for (int j = 0; j < m1.Width; j++)
m1[i,j] = rand.Next(-100,100);

Matrix m2 = new Matrix(30,40);
for (int i = 0; i < m2.Height; i++)
for (int j = 0; j < m2.Width; j++)
m2[i,j] = rand.Next(-100,100);

Matrix r = null;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int k = 0; k < 10000; k++)
r = m1 * m2;
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
}```

Question:

1. Measure the execution time of the code fragment above on your machine (compile with debugging disabled and optimization enabled, i.e. using csc /o).
2. Perform optimizations to the code (warning: the code should - duh - still produce the same multiplication result r). You can touch everything except for the Program::Main method.
3. Go back to 1.

Feel free to break the loop above at any time if you're satisfied with the result. What's the speed-up factor you can realize? (Tip: You even might want to change something in the process steps 1-3 outlined above to realize a performance boost...)  A follow-up post to this quiz question will follow later. Enjoy!

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 12:15 AM by Knaģis

Currently I have optimized to work 4x faster. Seems the most time consuming thing about the algorithm is to use indexer (and public properties). If I change those to just the private members, works much faster.

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 12:21 AM by Peter Petrov

Without changing the algorithm(using the naive) only perform some very simple optimization - avoid the method calls to GetLength and the method is 28% faster :)

<pre><code>

public static Matrix operator *(Matrix m1, Matrix m2)

{

int w1 = m1.Width;

if (w1 != m2.Height)

throw new InvalidOperationException("Matrices should have compatible dimensions for multiplication.");

int h = m1.Height;

int w = m2.Width;

Matrix m = new Matrix(h, w);

for (int i = 0; i < h; i++)

{

for (int j = 0; j < w; j++)

{

for (int k = 0; k < w1; k++)

m[i, j] += m1[i, k] * m2[k, j];

}

}

return m;

}

</code></pre>

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 12:50 AM by Knaģis

OK, enough for today. My result is 6.4 times faster (or 7.6 times faster if I use a struct instead of class - but that requires changing one line in Program::Main). The biggest suprise was that [,] is a lot slower than [][]...

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 2:27 AM by Peter Petrov

My best result is 7.2 faster :) Hee is the code.

private double[][] m;

public Matrix(int dim0, int dim1)

{

m = new double[dim0][];

for (int i = 0; i < m.Length; i++)

{

m[i] = new double[dim1];

}

}

public int Height { get { return m.Length; } }

public int Width { get { return m[0].Length; } }

public double this[int x, int y]

{

get { return m[x][y]; }

set { m[x][y] = value; }

}

public static Matrix operator *(Matrix m1, Matrix m2)

{

int w1 = m1.Width;

if (w1 != m2.Height)

throw new InvalidOperationException("Matrices should have compatible dimensions for multiplication.");

int h = m1.Height;

int w = m2.Width;

Matrix m = new Matrix(h, w);

for (int i = 0; i < h; i++)

{

for (int j = 0; j < w; j++)

{

for (int k = 0; k < w1; k++)

m.m[i][j] += m1.m[i][k] * m2.m[k][j];

}

}

return m;

}

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 2:41 AM by Peter Petrov

My previous method can be optimized even a little more - by not removing the initialization of the elements to zero. That's my biggest surprise because it is guarantied that the elements of the array are initialized to zero so the initialization m.m[i][j] = 0; is not need it but the method performs faster with the initialization. I'm really confused.

Total boost = x7.5

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 2:08 PM by c

I can improve the test's running time by a factor of more than 200.

class Matrix

{

protected double[,] m;

public Matrix(int height, int width)

{

m = new double[height, width];

}

public int Height { get { return m.GetLength(0); } }

public int Width { get { return m.GetLength(1); } }

public virtual double this[int x, int y]

{

get { return m[x, y]; }

set { m[x, y] = value; }

}

public static Matrix operator *(Matrix left, Matrix right)

{

return new ProductMatrix(left, right);

}

}

class ProductMatrix : Matrix

{

protected Matrix left;

protected Matrix right;

protected bool[,] m_calculated;

public ProductMatrix(Matrix left, Matrix right)

: base(left.Height, right.Width)

{

if (left.Width != right.Height)

throw new InvalidOperationException("Matrices should have compatible dimensions for multiplication.");

this.left = left;

this.right = right;

}

public override double this[int x, int y]

{

get

{

if (m_calculated == null)

{

m_calculated = new bool[left.Height, right.Width];

}

if (!m_calculated[x, y])

{

m[x, y] = 0;

for (int k = 0; k < left.Width; ++k)

m[x, y] += left[x, k] * right[k, y];

m_calculated[x, y] = true;

}

return m[x, y];

}

set

{

if (m_calculated == null)

{

m_calculated = new bool[left.Height, right.Width];

}

m_calculated[x, y] = true;

m[x, y] = value;

}

}

}

#### #re: C# Quiz - Need for speed

Wednesday, February 28, 2007 8:29 PM by Mitch Wheat

I don’t know how you guys are ‘picking’ your timing results.

I compiled as release with VS2005 SP1, with an unchanged program main method (NOTE: not sure if  or how that differs from csc /o) and ran exe from command line

I ran 10 x 10000 loops and averaged (to try to reduce fluxations idue to OS) to get  1.0146ms per matrix multiplication for the original code as published.

Peter Petrov’s published code (in comments)  resulted in 0.2738ms per matrix multiplication which represents a speedup of 3.7 times over original.

There are two improvements you can make to reduce the time further:

First you should use a temp variable in the inner loop summation.

Second you can shave off a few microseconds by running the inner loop backwards (this is a ‘trick’ from my C days. The complier can issue a simpler check against zero (can XOR register or stack value to zero with itself rather than loading a literal)

public static Matrix operator *(Matrix a, Matrix b)

{

int p = a.Width;

if (p != b.Height)

throw new InvalidOperationException("Matrices should have compatible dimensions for multiplication.");

int h = a.Height;

int w = b.Width;

Matrix c = new Matrix(h, w);

double tmp;

for (int i = 0; i < h; i++)

{

for (int j = 0; j < w; j++)

{

tmp = 0.0;

for (int k = p - 1; k >= 0; k--)

{

tmp += a.m[i][k] * b.m[k][j];

}

c.m[i][j] = tmp;

}

}

return c;

}

This results in 0.1287 ms per matrix multiplication. (7.88 times as fast) (modifications made to Peter’s published code)

I also implemented Winograd’s algorithm (reduces multiplications while increasing additions) but you need the matrices to be very large before any speedup is realised (n > 100).

Using structs would possibly also be an idea (as Knaģis mentioned)

#### #re: C# Quiz - Need for speed

Thursday, March 01, 2007 3:08 AM by bart

Hi Mitch,

A release build and csc /o are roughly the same due to the build config in the .csproj file:

<PropertyGroup Condition=" '\$(Configuration)|\$(Platform)' == 'Release|AnyCPU' ">

<DebugType>pdbonly</DebugType>

<Optimize>true</Optimize>

<OutputPath>bin\Release\</OutputPath>

<DefineConstants>TRACE</DefineConstants>

<ErrorReport>prompt</ErrorReport>

<WarningLevel>4</WarningLevel>

</PropertyGroup>

the <Optimize> portion maps to the /o compiler switch. Take a look at the %windir%\Microsoft.NET\Framework\v2.0.50727\Microsoft.CSharp.targets file to see this mapping.

-Bart

#### #Answers to C# Quiz - Need for speed

Tuesday, March 13, 2007 5:09 AM by B# .NET Blog

You can find the original quiz over here . There have been lots of great answers, thanks to all readers!