Our Mac App Store 'Installed' problem
Posted by Gavin at 20:10 0 comments
Categories: App Store, NSConference
Providing a Soft Landing with WebKit
id scriptObject = [webView windowScriptObject];
[scriptObject setValue:self forKey:@"cocoaObject"];
var cocoaObject = window.cocoaObject;+ (BOOL)isSelectorExcludedFromWebScript:(SEL)aSelector;
+ (BOOL)isKeyExcludedFromWebScript:(const char*)name;
These methods should always return YES unless the passed in selector (or key name) is one for which we want to provide access.+ (NSString*)webScriptNameForSelector:(SEL)selThis method lets us provide cleaner method names for JavaScript to use. By default the bridge replaces colons with underscores so the Objective-C call:[a setValue:b forKey:c];Would in JavaScript become:a.setValue_forKey_( b, c );The webScriptNameForSelector: method lets us overwrite the default behaviour to provide our own method names.applicationDidFinishLaunching: method on your app delegate is probably as good a place as any):[OPSSoftLandingWindowController showWithDelegate:self onlyIfRequired:YES];This method takes two parameters. The first parameter is a delegate object which will handle any notification messages from the JavaScript. The controller expects the delegate to implement the following method:- (void)softLandingWindowController:(OPSSoftLandingWindowController*)softLanding didRecieveNotification:(NSString*)notification;
You may pass nil as the delegate if you only want the soft landing to display some HTML and do not have any messages which may be passed back.- (IBAction)showWelcome:(id)sender
{
[OPSSoftLandingWindowController showWithDelegate:self onlyIfRequired:NO];
}h1 tags) are told to call the onClickHandler JavaScript code when the user clicks on them.<h1 cocoaTagName="Welcome Clicked" onclick="onClickHandler(event)">Welcome to ShinyApp</h1>This code then pulls the cocoaTagName attribute from clicked element and passes it on to your application.Posted by Gavin at 15:36 0 comments
Categories: Objective-C, WebKit
Unleashing your hidden supercomputer
The following is the content of a talk on OpenCL I gave last night at Aberdeen’s inaugural techmeetup. You should be able to watch it online shortly from the Techmeetup website.
http://techmeetup.co.uk/blog/
Well, as you’ll know, things more or less continued along these lines until a couple of years ago when we started reaching processor speeds approaching 4GHz. But, I was being a little disingenuous when I referred to Moore’s Law as a doubling of speed because what Moore was actually talking about was the number of transistors which could be placed inexpensively on a chip. From this stand point Moore’s law is still holding, these days more computing power comes in the form of multiple processing units.
For example, my current computer has a 2.4GHz Core 2 Duo processor capable of performing around 20 GFLOPS (or 2x1010 floating point operations per second). This is about the same as Intel’s single core Pentium 4 running at 3.2GHz. If this is what 2 cores can do, imagine what 32 cores could do?
Well, it just so happens that I have just such a machine and it’s not that uncommon, many of you will as well.
Of course I’m referring to the graphics card. In my case I have an nVIDIA GeForce 9600M GT. This particular card has 32 cores, each running at 500MHz and providing a staggering 120 GFLOPS; six times that of the CPU alone. In fact, counting also the second GPU built into the logic board I have a machine which is theoretically capable of almost 200 GFLOPS.
Taking another example, the nVIDIA GeForce GTX 285 has 240 cores each running at around 650MHz. We’re now talking about a GPU capable of around 1TFLOP - around 50 times more that my processor. What’s more, a card like this will set you back about £300, so we’re not talking about large fortunes here.
So the question is: How can we utilise this additional power? Looking at the traditional frameworks for concurrent programming such as POSIX threads or OpenMP there is a fairly obvious drawback: They only utilise the multiprocessor capabilities of the CPU.
This may change over time as the boundaries between GPUs and CPUs narrow. According to nVIDIA’s CEO, Jen-Hsun Huang, the future of computing may well see the CPU replaced by beefed up GPUs. Whether that happens remains to be seen. However, right now we need another approach. (I have to confess that I read about this an article a couple of months back. However, looking at some more recent reports it appears that there was either some misreporting or some backtracking).
So for the moment at least we need to move into the realm of GPGPUs (General Purpose GPU programming). Let’s examine some of the options we have here.
Firstly there is CUDA. CUDA was developed by nVIDIA and stands for Compute Unified Direct Architecture. CUDA programmers use a C based language to code for the GPU. I believe it is quite widely used and there are third party wrappers for languages such as Python or Java. However, being developed by nVIDIA means that only nVIDIA hardware will be supported.
A similar offering is FireStream, which is AMD’s equivalent to nVIDIA’s CUDA. I’m not sure whether FireStream has as much of a following as CUDA, but again the disadvantage is that it is hardware specific.
A third option is DirectCompute. This is a extension to Microsoft’s DirectX collection of API and possibly worth considering if you’re only targeting Windows. But as a Mac developer it’s not what I’m looking for.
OpenCL stands for the Open Computer Language and is touted as “The Open Standard for Heterogeneous Parallel Programming”. It is intended to be to general purpose programming what OpenGL is to graphics programming or what OpenAL is to audio programming.
So why OpenCL? Well it is hardware and platform agnostic.
Some background: OpenCL was originally devised by Apple. However, in June 2008 Apple handed responsibility for the specification over to the not-for-profit consortium the Khronos Group. I think it would be fair to say that any major software or hardware company who have an interest in promoting open standards have some sort of input into the Khronos group. This list includes nVIDIA, AMD, Apple, Google, ARM, IBM, Intel and many more.

Image courtesy of Khronos Group
This is great if your an Apple developer, but it is important to note that OpenCL is not just for Macs. This would be equivalent to saying that just because Apple provide great OpenGL support OpenGL is just for Macs. SDKs are becoming available for other platforms, although as far as I can tell at this stage they are mostly in beta. This is fairly understandable as the whole specification has been pushed through in record time.
So, you will need to be a little more careful about supported devices on other platforms. Although you should be able to check hardware support on your system through the APIs it is something you should be aware of.
So how does it work?
An OpenCL application consists of a host which communicates with a set of Compute Devices. You can think of the host as your application and the compute devices as the individual CPUs, GPUs etc.
An OpenCL kernel may be compiled from code at runtime. By doing this the kernel may be optimised specifically for the hardware on which it is to be run. We use the API provided by the OpenCL SDK to perform this compilation and any developer who is familiar with OpenGL should feel quite comfortable with this API. I believe it is also possible to pre-compile kernels, but I have not done so and I believe this would be the exception rather than the rule.
Each compute device is further subdivided into Compute Units which are in turn divided into Processing Elements. So, for example, the nVIDIA GeForce GTX 285 with 30 streaming multiprocessors, each comprising 8 streaming processors would have 30 compute units and 240 processing elements.
The kernels you create will be executed on individual processing elements. A kernel executing on a single processing element is known in OpenCL as a work-item. You can think of this as a single thread of execution. Work-items are in turn grouped into work groups with a single work group running on a single compute unit. Hold that thought while I talk about memory quickly...
There are basically two types of memory which a kernel can access: global and local. Global memory is memory on the compute device, accessible by all work-items on that device. Local memory on the other hand is accessible only by work-items within the same work group. Going back to the graphics card you can see that the local memory corresponds to the local memory cache on each of the streaming multiprocessor units.
There are actually two other memory types: constant memory, which is really just constant global memory and private memory which is just memory defined within the scope of the kernel’s function.Each executing kernel (work-item) can uniquely identify itself by a global unique id, unique across the device on which it is running. A work-item can also identify itself by a unique local id and an id unique to the work-group to which the work-item belongs. These IDs will be used to access memory which the work-item needs to read from or write to. Memory in OpenCL may be allocated either as a single dimensional buffer or as a 2 or 3D image. Correspondingly the work-items have either 1, 2 or 3 global (or local) ids. For example the method get_global_id(0) may be used to get the x co-ordinate of a 2D image while get_global_id(1) will return the y co-ordinate.

Another point to bear in mind is that transferring memory onto a graphics card is relatively slow. Any performance gain achieve by running the process on multiple cores may be negated by performing this memory transfer.
You will also need to take into account the fact that your code will only have access to those methods defined by the OpenCL specifications. printf statements, for example, will not work (and, somewhat related to that, debugging your kernel objects is going to be tough).
Finally, (and here I really have to confess to being fuzzy on the detail) because we are running on streaming processors, branching in your code is not going to behave the same as in normal code executing on the CPU. For instance, if you have an if-else statement in a kernel running in 8 work-items all occurrences of the if condition need to complete before the else conditions execute.
To briefly explain: A complex number C can be said to be in the Mandelbrot set if the absolute value of Zn in equation Zn+1=Zn² + C is less than a given value after n iterations. If a given point is not within the set we can assign it a colour based on the number of iterations required before Zn exceeds the given tolerance.
If we define the maximum number of iterations required to determine whether a value is within the set as 1500 (which is the value I have chosen to use in the demonstration) we could potentially have to perform this calculation 1500 times for each pixel. Each of these calculations could themselves comprise10 floating point operations. Applying this over a 1000 by 1000 pixel image it should be apparent that we could be carrying out around 15 billion operations.
It should also be clear that the calculations carried out on a single pixel should in no way influence the calculations carried out on adjacent pixels. So as well as being an interesting example, the Mandelbrot set should also provide a good candidate for optimisation using OpenCL.
I wrote the example to perform the calculation in 3 ways. The first example simply performs the calculation on the main thread of the application using standard C. The second two perform the same calculation using OpenCL, firstly running solely on the CPU and then running on the GPU.
The results (I’m happy to report) are fairly conclusive. Running on a single thread the calculation takes a little over 10 seconds on my machine. Using OpenCL on the CPU the time is reduced to just over 4 seconds. This makes sense, we are doubling the number of cores and approximately halving the time taken required for the calculation.
Running on the GPU reduces the time further to around 0.67 seconds. Again this is about a sixth of the time required by the CPU and back at the start I said that the GPU was capable of processing around six times more operations per second than the CPU.
The code for the example can be downloaded from here. Although the GUI is written in Objective-C, I have written the OpenCL part in C++, so if you’re not on a Mac you should still be able to give it a go.
I would dearly love to see this example running on the GTX 285 I talked about - if anyone does this please let me know how you get on!
http://www.khronos.org/opencl/
Or you can download the specifications from:
http://www.khronos.org/registry/cl/specs/opencl-1.0.48.pdf
I would also strongly recommend that you watch the video podcasts produced by Dave Gohara at MacResearch. Even if you’re not looking at OpenCL for the Mac these are definitely well worth watching.
http://feeds.feedburner.com/opencl
Posted by Gavin at 13:12 0 comments
Categories: OpenCL, TechMeetup
The First Aberdeen TechMeetup

Posted by Gordon at 11:56 0 comments
Categories: OpenCL, TechMeetup
Creating an Open Directory Replica
Posted by Gordon at 17:07 0 comments
Categories: Open Directory, OS X Server
Trying to get noticed


Posted by Gordon at 11:03 3 comments
Categories: App Store, Balcassa, iPhone, Mac Developer Network, Marketing
Fun with NSOperation
NSOperation was introduced with Leopard as an attempt by Apple to simplify the task of writing multi-threaded applications in Objective-C. This will become increasingly important as we move away from faster processors to more cores and you can be sure that NSOperation is only the start of what Apple has in store for us in this area. In this article I discuss a fun little use of NSOperation and, hey, there is a little Core Animation thrown in to boot.
When I was at school I read James Gleick's book Chaos and decided I would write an application on my Dad's PC to draw a Mandelbrot set. I have no idea now what it was written in, it might even have been QuickBasic, but what I do remember is how slow it was. I would type in the co-ordinates and leave it for an hour or so before a small mono-chrome Mandelbrot image the size of a postcard appeared. I was quite pleased with it.
Fast forward by more years than I care to mention and I am the proud new owner of a G5 iMac, looking for a little project with which to get to grips with Xcode and Objective-C. So I decided to write a Mandelbrot screensaver. The results were a lot faster and a lot more colourful than my previous attempt, and I must have spent hours sitting watching the strange and beautiful patterns it produced. But there was a definite pause of between 2 to 5 seconds while the calculations were carried out.
So finally I've re-written it once more for this blog post using NSOperation. In fact we thought it was actually good enough to be released properly as a free screensaver. A cut-down version of the code for this project can be found here (cut-down simply because the actual project contains Sparkle updates and the like) or if you just want the screensaver itself you can get that from the downloads section of our web-site. I won't go into detail about writing the screensaver itself, but if you are interested Brian Christensen's excellent two part article is as good now as it was when I first read it all that time ago.
As anyone who has ever written any multi-threaded application will know, things start to get complicated when different threads need to interact. Traditionally this problem was handled using synchronisation mechanisms such as signalling waiting threads. This is fine until two threads manage to get into a situation where each is waiting on a signal from the other - which usually won't happen until the application is deployed on a client's machine.
Apple have helped us mortal developers get around this problem by introducing the concept of an operation queue. Independent packages of work, operations, are added to the queue and synchronisation is achieved by adding dependancies between these operations; if operation B is dependant on operation A then we are guaranteed that operation A has run to completion by the time operation B starts. Without dependancies, other operations on the queue will execute whenever they get a chance.
This model translates into three Cocoa classes: NSOperationQueue, NSOperation and NSInvocationOperation. NSOperationQueue is responsible for handling the queuing of operations (hence the name - I really couldn't think of a better way to say that). Instances of either NSOperation or NSInvocationOperation or both are placed into the queue using the addOperation: message.NSInvocationOperation provides the simplest method of adding an operation as no sub-classing is required. Instead, instances of the class are initialised using the initWithTarget:selector:object: method. For example the code:
NSInvocationOperation *pNotifyOperation = [[NSInvocationOperation alloc]
initWithTarget:self selector:@selector(notifyImageComplete) object:nil];will set up an operation that will call the
notifyImageComplete method on self to perform its task.NSOperation on the other hand is designed to be subclassed with an overwritten main method provided to carry out the operation's task.So let's see how all this fits into the example of the Mandelbrot screensaver.
To display the Mandelbrot image we have three CALayers in the
ScreenSaverView derived class: currentLayer, nextLayer and renderingLayer. Calculations are performed in the background and the results rendered to renderingLayer. When the application receives notification that it should refresh its view (via the animateOneFrame method) it checks to see whether this rendering has finished and replaces currentLayer with nextLayer, nextLayer with renderingLayer and renderingLayer with currentLayer. This lets us animate cross-fades between currentLayer and nextLayer without the risk of the rendering changing one of these layers half way through.Rendering to the CALayers takes place in the
MandelbrotImageGenerator class, which is also responsible for handling interactions with the NSOperationQueue and its operations.In our project we have a
NSOperation. The purpose of this class, as illustrated in the figure below, is to render a given portion of the Mandelbrot set into an image buffer. We derive a class (rather than using NSInvocationOperation) as we need a lot of additional information to carry out our task; the buffer to draw into, the buffer size, the visible portion to be drawn, etc.
We need to be notified once all the drawing operations have completed and we achieve this using an
NSInvocationOperation which has a dependancy on each of the drawing operations. Here is the relevant code (cut down to show only those bits of importance to the discussion).// Create the final operation to send the notification once everything is complete
NSInvocationOperation *pNotifyOperation = [[NSInvocationOperation alloc]
initWithTarget:self selector:@selector(notifyImageComplete) object:nil];
...
for( bottom = 0; bottom < height; bottom += dy )
{
...
// Create the operations to do the drawng
MandelbrotDrawingOperation* pOperation = [[MandelbrotDrawingOperation alloc]
initWithBuffer:(UInt8*)_pImageBuffer + (bottom * width * 3)
imageWidth:width imageHeight:dy
extents:curExtents
colourMap:_colourMaps[colourMap % 16] ];
// Make the notification operation dependant on the drawing operation
[pNotifyOperation addDependency:pOperation];
// Add the drawing operation into the queue
[_operationQueue addOperation:pOperation];
[pOperation release];
}
// Add the notification operation
[_operationQueue addOperation:pNotifyOperation];
An important point to note is that the final notification operation is not added to the queue until all of the drawing operations have been added. If the notification operation was added to the queue first it could have executed in a separate thread before we've had a chance to add any of the drawing operations to the queue.
The
notifyImageComplete method simply passes on the notification to the main thread. Drawing the image into its CALayer is safer on the main thread as we do not want to risk this memory being released half way through.-(void) notifyImageComplete
{
[self performSelectorOnMainThread:@selector(notifyImageCompleteOnMainThead) withObject:nil waitUntilDone:NO];
}
During rendering of the image we can periodically check that the operation has not been cancelled by checking the
isCancelled property of self, which may be the case if the application is terminating and we are in the process of clearing up. Operations may be cancelled directly. However, in our case we simply want to cancel all currently running operations and wait until they have finished. This takes place in the dealloc method of the MandelbrotImageGenerator class:[_operationQueue cancelAllOperations];
[_operationQueue waitUntilAllOperationsAreFinished];
So that's it. Although we still need to think carefully about how concurrent operations access the same data, I hope you can see that much of the complexity involved in synchronising this has been removed.
Posted by Gavin at 19:18 0 comments
Categories: Objective-C