Wednesday, August 5, 2009

Silverlight 3 WriteableBitmap Performance Follow-Up

Two weeks ago I wrote the Silverlight 3 WriteableBitmap Performance blog post and got some good response on it. One of it came from the author of Quakelight, Julien Frelat. He contacted and asked me if I had tested his PNG wrapper technique, which is used in the Silverlight port of the famous game Quake. I thought Quakelight uses the WriteableBitmap and not a custom Stream hack, so I haven’t considered to check it for the Speedtest. Actually Quakelight uses the Silverlight 3 WriteableBitmap, but the source code includes a custom Stream for Silverlight 2 too.
For better performance the Quakelight PngWrapper and BitmapData classes use a 8 bit color palette instead of full 32 bit ARGB colors. This fact makes it not directly comparable to the other competitors, which all support 32 bit ARGB colors, but for certain problems 256 colors could be sufficient. That’s why I wrote this follow-up and integrated Quakelight’s Silverlight 2 Stream implementation.
The Speedtest generates an interference image and writes each pixel to a buffer, which is then used as the source of an Image. This effect could also be realized with a pixel shader and as I was working on the Speedtest v2, I implemented the effect as a pixel shader too.
Make sure to check the Update section at the bottom of this post.

The Competitors
  1. Silverlight 3 WriteableBitmap.
  2. RawPngBufferStream from the open source GameEngine Balder.
  3. Nikola's PngEncoder, which is an improved version of Joe Stegman's work.
  4. Ian Griffiths' SlDynamicBitmap library.
  5. NEW: Quakelight’s 8 bit BitmapData.
  6. NEW: A pixel shader.

Live

The application measures the time, which every implementation needs to draw the "Maximum Frames" and calculates the mean frames per seconds (fps). The third text column shows the relative performance compared to the WriteableBitmap.
If the tests complete very fast, you should increase the "Maximum Frames" to get better results.


How it works
The image is still 512 x 512 pixels in size and the mean frames per second are measured, but I changed the CalculateColor(int x, int y) effect a bit and removed the lookup table for the sine movement of the circles. The effect looks almost the same, but the code is much better to understand. At least I hope so.

private void CalculateColor(int x, int y)
{
   // Normalize coordinates
   double xn = x * TexSizeInv;
   double yn = y * TexSizeInv;

   // Overlayed sine circle rings
   // I use member variables for argb, 
   // so I don't have to allocate and return a byte array in each call
   // Ugly, but much faster than return new byte[]{ }

   // Red
   double d = (xn - C1.X) * (xn - C1.X) + (yn - C1.Y) * (yn - C1.Y);
   r = (byte)((Math.Sin(d * Frequency)) > 0 ? 0 : 255);

   // Blue
   d = (xn - C2.X) * (xn - C2.X) + (yn - C2.Y) * (yn - C2.Y);
   b = (byte)((Math.Sin(d * Frequency)) > 0 ? 0 : 255);

   // Green fills the gaps
   g = (byte)~(r | b);
}

The calculation uses normalized texture coordinates and the circle centers are computed each frame and stored in the Points C1 and C2:

private void CalculateCenters()
{
   // Nice sine circle movement
   // Use normalized coordinates
   C1.X = Math.Sin(framesCount * 0.02) * Half + Quarter;
   C1.Y = Math.Sin(framesCount * 0.08) * Half + Quarter;
   C2.X = Math.Sin(framesCount * 0.10) * Half + Quarter;
   C2.Y = Math.Sin(framesCount * 0.04) * Half + Quarter;
}

The normalization of the center coordinates makes it easier to use the same values directly as parameters for the pixel shader:

// Parameters
float2 C1 : register(C0);
float2 C2 : register(C1);
float Frequency : register(C2);

// Shader
float4 main(float2 p : TEXCOORD) : COLOR
{
   // Overlayed sine circle rings
   float4 color = 1;

   // Red
   float2 dist = uv - C1; 
   color.r = sin(dot(dist, dist) * Frequency) > 0 ? 0 : 1;

   // Blue
   dist = uv - C2; 
   color.b = sin(dot(dist, dist) * Frequency) > 0 ? 0 : 1;

   // Green fills the gaps
   color.g = color.r + color.b > 0 ? 0 : 1;
   return color;
}

Results

The results differ a bit from the first article, which is caused by the other algorithm that is used here. There are other color distributions and thus results in a slightly different drawing. The Silverlight rendering has a great impact on the performance and unfortunately Silverlight doesn't ship with the .Net Stopwatch class and the only way to get suitable data, is the measuring of larger code blocks, including the drawing, with the imprecise DateTime struct.
The Quakelight BitmapData is almost as fast as the Silverlight 3 WriteableBitmap, but at the cost of the 256 color limitation. Although the pixel shader is not executed on the GPU, it still runs ultra-fast compared to the other competitors.
If you want to implement a procedural image generation technique I recommend to try a pixel shader, but keep in mind that Silverlight 3 only supports the limited Shader Model 2. I noticed that the Silverlight pixel shaders seem to use SEE and are automatically executed in parallel if they run on a multi-core processor. The framerate on a dual-core machine was twice as high as on a single-core CPU. This parallel software shader implementaion in Silverlight is actually the only right way to implement them and nothing special. Shaders are designed for parallel execution on the GPU.

Source code
Download the C# code and the pixel shader from here.

Update 08-10-2009
Justin Harrell used my Speedtest source code and added tests for the Silverlight 3 MediaSource API. He contacted my via Email and attached the source code:

Hello

I have been looking into Silverlight Graphics performance myself and was interested in the new SL3 features. So I read your blog entry for WriteableBitmap performance and was also interested in the new MediaSource managed codec abilities, which looked like another way to get pixels to the screen, but also audio as well which could be interesting for games etc.
So I took your source for the test sample and added two MediaSource tests, one using single threaded frame generation, the second using a background thread and a double buffer based on work from Pete Brown at on his Commodore 64 emulator in Silverlight.
I also reworked the UI with checkboxes to turn on/off tests as well as adding 3 new types of tests beyond the Circle interference to test performance of the rendering method vs the pixel generation. These include a random noise, simple scrolling line, and a no op. I also refactored some to make it easier to add new test types. I did not implement shader based versions of my tests, so if you run the shader test on anything but circle interference its basically a no op for now J.
I thought it might be an interesting addition to the sample as yet another way to generate dynamic bitmaps, the mediasource performs very well, although still behind WriteableBitmap.
Note for the non-double buffered version of MediaSource there is a padTime that can be tuned, it is currently set to 10ms and can be lowered to improve frame rate, but below a certain time it will get too short based on how faster your computer is and cause the media player to think it is losing frames and start skipping badly. I haven’t figured out a good way to compensate for this automatically yet, it has to do with how long the video render takes in addition to the pixel generation. The double buffer does not have this issue as it is a fixed frame rate set by frameTime.
I have attached the sample to this email, let me know what you think, and thanks for the great blog posts.

Justin Harrell

Thanks Justin for sharing your additions. That's why I <3 open source.

8 comments:

  1. Flash back from some great Amiga demos!
    Excellant

    ReplyDelete

  2. Excellent. But. I suggest how to make it cooler=).

    Add a checkbox for each renderer (for ex _1, _2 ... _6) than in region Methods in private void Start() replace

    this.tests = new List<Test>{ ... bla bla bla .... };

    with something like this

    this.tests = new List<Test>{};



    if(_6.IsChecked == false)

    { if(_5.IsChecked == false)

    { if(_1.IsChecked == false)

    { if(_2.IsChecked == false)

    { if(_3.IsChecked == false)

    { if(_4.IsChecked == false)

    {

    _6.IsChecked = true;

    }

    }

    }

    }

    }

    }





    if(_1.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawWriteableBmp,

    ResultTextBlockAbs = TxtWriteableBmpAbs,

    ResultTextBlockRel = TxtWriteableBmpRel,

    });

    }

    if(_2.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawBalder,

    ResultTextBlockAbs = TxtBalderAbs,

    ResultTextBlockRel = TxtBalderRel,

    });

    }

    if(_3.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawNokola,

    ResultTextBlockAbs = TxtNokolaAbs,

    ResultTextBlockRel = TxtNokolaRel,

    });

    }

    if(_4.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawSiDynBmp,

    ResultTextBlockAbs = TxtSiDynBmpAbs,

    ResultTextBlockRel = TxtSiDynBmpRel,

    });

    }

    if(_5.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawQuakelight,

    ResultTextBlockAbs = TxtQLAbs,

    ResultTextBlockRel = TxtQLRel,

    });

    }

    if(_6.IsChecked == true)

    {

    this.tests.Add( new Test

    {

    Execute = DrawShader,

    PreExecute = AttachShader,

    PostExecute = RemoveShader,

    ResultTextBlockAbs = TxtShaderAbs,

    ResultTextBlockRel = TxtShaderRel,

    });

    }

    ReplyDelete
  3. Great!
    Code is also working under SilverLight4 in VS2010Express.
    Thanks.

    ReplyDelete
  4. With Silverlight 4, it goes a bit faster. I also suggest an improved method:

    private const double Frequency = 75/Math.PI;
    private void CalculateColor(int x, int y)
    {
    // Normalize coordinates
    double xn = x * TexSizeInv;
    double yn = y * TexSizeInv;

    // Overlayed sine circle rings
    // I use member variables for argb, so I don't have to allocate and return a byte array in each call
    // Ugly, but much faster than return new byte[]{ }
    // Red
    double d = (xn - Center1.X) * (xn - Center1.X) + (yn - Center1.Y) * (yn - Center1.Y);
    d = d * Frequency;
    r = (byte)(((int)(d ) & 1) * 255);

    // Blue
    d = (xn - Center2.X) * (xn - Center2.X) + (yn - Center2.Y) * (yn - Center2.Y);
    d = d * Frequency;
    b = (byte)(((int)(d ) & 1) * 255);

    // Green fills the gaps
    g = (byte)~(r | b);
    }
    The new shader looks like this:
    // Parameters
    float2 Center1 : register(C0);
    float2 Center2 : register(C1);
    float Frequency : register(C2);

    // Shader
    float4 main(float2 uv : TEXCOORD) : COLOR
    {
    // Overlayed sine circle rings
    float4 color = 1;

    // Red
    float2 dist = uv - Center1;
    color.r = frac(0.5*floor(dot(dist, dist) * Frequency))*2.0;

    // Blue
    dist = uv - Center2;
    color.b = frac(0.5*floor(dot(dist, dist) * Frequency))*2.0;

    // Green fills the gaps
    color.g = -color.r - color.b +1.0;
    return color;
    }

    The new shader is only slightly better, because you have to construct integer operations.

    Under these circumstances, you only have a great performance improvement with shaders at multicore systems.

    ReplyDelete
  5. Hi Andreas,

    I know that the sin and cos can be avoided. If you check the first blog post about this, you'll see that I use a look up table there. But the aim of this blog post was not "The fastest way to fill an image with rings" it was meant to be a performance comparison. And more real world operations result in a better and valuable performance measurement.

    ReplyDelete
  6. Hi René,
    yes, you are right. For a good test, you should use the same operations based on real numbers. I found my test result with an optimized method at a single core system so amazing, that I had to post it. I read your first blog post and with sqrt, there also should be a great performance difference between a WriteableBitmap and a shader. It was interesting for me to find out where it makes sense to use a shader and where not.

    ReplyDelete
  7. Can you also test speed for:
    http://writeablebitmapex.codeplex.com/

    ReplyDelete
  8. Hi Andrey,

    I did this for the line drawing. See http://kodierer.blogspot.com/2009/10/drawing-lines-silverlight.html

    ReplyDelete