RiverTrail - JavaScript GPGPU Data Parallelism

Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications ......

Posted On Thursday, November 29, 2012 9:02 AM | Comments (1)

HPC Server Dynamic Job Scheduling: when jobs spawn jobs

HPC Job Types HPC has 3 types of jobs · Task Flow – vanilla sequence · Parametric Sweep – concurrently run multiple instances of the same program, each with a different work unit input · MPI – message passing between master & slave tasks But when you try go outside the box – job tasks that spawn jobs, blocking the parent task – you run the risk of resource starvation, deadlocks, and recursive, non-converging or exponential blow-up. ......

Posted On Wednesday, October 10, 2012 2:34 PM | Comments (1)

Low-Latency High-Performant Financial App Infrastructures

Financial Apps feel the need for speed – this can come via parallelization, and via infrastructure - fast messaging and non-blocking distributed memory management. This blogpost gives an overview + examples of various technologies that can squeeze performance out of your trading apps and clock cycles out of your modeling apps. Low Latency via Infrastructure ZeroMQ · ZeroMQ is a messaging library - ‘messaging middleware’ , ‘TCP on steroids’ , ‘new layer on the networking stack’. not a complete messaging ......

Posted On Monday, March 26, 2012 9:05 PM | Comments (2)

Daytona - Iterative MapReduce on Windows Azure

Daytona - Iterative MapReduce on Windows Azure Overview MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of compute nodes. It is a generic mechanism that comprises 2 steps: Map step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. The worker node processes the smaller problem, and passes the answer back to its master node. Reduce step: The master node then collects the ......

Posted On Thursday, December 8, 2011 7:26 AM | Comments (1)

The Windows Azure HPC Scheduler SDK

Overview Windows HPC Server 2008 is infrastructure for high-end applications that require high performance computing clusters – i.e. for scaling out parallelizable across many compute nodes in a grid. These compute nodes can be coordinated by a head node , which in turn can be proxied via a service broker node that exposes a SOA WCF interface for job scheduling. Additional functionality includes the ability to coordinate between job processes running on nodes via MPI (message passing interface). ......

Posted On Tuesday, December 6, 2011 6:36 AM | Comments (0)

Unit Testing a ConcurrentPriorityQueue

I’m leveraging a ConcurrentPriorityQueue – from This class basically is a thread safe IProducerConsumerCollection wrapper for a binary heap that prioritizes smaller values. You use it as you would a dictionary, where the priority is the key, except you can have duplicate keys (ie values with the same priority). I needed to demonstrate to a customer that it worked. I set up my queue and my priority enum values: var q = new ConcurrentPriorityQueue<... ......

Posted On Sunday, December 4, 2011 1:05 PM | Comments (0)


Overview C++ AMP is a GPGPU API – it allows you to define functions (kernels) that take some input, perform an expensive calculation on the GPU and return the output to CPU. GPU supports fast calculative operations across many SIMD-like cores - NVidia Tesla supports 512 cores compared to the paltry 10 cores available on the CPU today - even Intel's Knights Corner will only support 60 cores next year. Suitable only for certain classes of problems (i.e. data parallel algorithms) and not for others ......

Posted On Sunday, December 4, 2011 8:20 AM | Comments (1)

Post-Build C++ Skill Rebuild

· For the last decade, the majority of my dev work has leveraged the .NET Framework for construction of information systems. However, my interest has lain in numerical computing. · Is it possible to have an increasingly higher level of abstraction and at the same time achieve underlying high performance computing? The prevailing winds say no: C# is aimed at productivity, and C++ is for performance. Garbage collection was great, but do we still need it with the availability of smart pointers? Would ......

Posted On Saturday, October 29, 2011 9:59 PM | Comments (0)

Azure Grid Computing - Worker Roles as HPC Compute Nodes

Overview · With HPC 2008 R2 SP1 You can add Azure worker roles as compute nodes in a local Windows HPC Server cluster. · The subscription for Windows Azure like any other Azure Service - charged for the time that the role instances are available, as well as for the compute and storage services that are used on the nodes. · Win-Win ? - Azure charges the computer hour cost (according to vm size) amortized over a month – so you save on purchasing compute node hardware. Microsoft wins because you need ......

Posted On Tuesday, February 22, 2011 10:12 AM | Comments (1)

MapReduce in DryadLINQ and PLINQ

MapReduce See The MapReduce pattern aims to handle large-scale computations across a cluster of servers, often involving massive amounts of data. "The computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The developer expresses the computation as two Func delegates: Map and Reduce. Map - takes a single input pair and produces a set of intermediate key/value pairs. The MapReduce function groups results by key and passes ......

Posted On Friday, December 10, 2010 12:58 AM | Comments (1)

Full Parallelism Archive

Copyright © JoshReuben

Design by Bartosz Brzezinski

Design by Phil Haack Based On A Design By Bartosz Brzezinski