Geeks With Blogs
Josh Reuben Parallelism
Java Fork-Join
in java.util.concurrent package - JDK 7 A Framework for Divide and Conquer recursively divides a task into smaller subtasks until threshold check indicates subtask size is small enough to execute serially. Optimal threshold is affected by specific computational steps & obtained through profiling – heuristic: between 100 and 10000. abstracts multithreading - automatically scale up. Leverages work-stealing - Each worker thread maintains a queue of tasks. If one worker thread’s queue is empty, it ......

Posted On Sunday, February 15, 2015 6:30 AM

OpenCL - An Overview
Overview OpenCL is a GPGPU API that abstracts over acceleration devices (be they CPU, GPU or FPGA) to provide data-parallelism (as well as task-parallelism) behavior. heterogeneous portability is achieved by avoiding high level abstractions and exposing the hardware in a context that explicitly defines its work scheduling capabilities. An OpenCL application consists of two parts: the host program that runs on the CPU - API functions to discover devices and their capabilities & create a context, ......

Posted On Saturday, February 7, 2015 10:20 PM

OpenGL Compute Shaders – an overview
OverviewA compute shader is a programmable shader stage that expands OpenGL beyond graphics programming. Like other programmable shaders, a compute shader is designed and implemented with GLSL. A compute shader provides single stage SIMD pipeline parallelized on the GPU. The compute shader provides memory sharing and thread synchronization features to allow more effective parallel programming methods. Create a Compute Shader Program: glCreateShader(GL_COMPUTE_S... - create a compute shader glShaderSource() ......

Posted On Friday, February 6, 2015 11:45 AM

RiverTrail - JavaScript GPGPU Data Parallelism
Where is WebCL ? The Khronos WebCL working group is working on a JavaScript binding to the OpenCL standard so that HTML 5 compliant browsers can host GPGPU web apps – e.g. for image processing or physics for WebGL games - http://www.khronos.org/webcl/ . While Nokia & Samsung have some protype WebCL APIs, Intel has one-upped them with a higher level of abstraction: RiverTrail. Intro to RiverTrail Intel Labs JavaScript RiverTrail provides GPU accelerated SIMD data-parallelism in web applications ......

Posted On Thursday, November 29, 2012 9:02 AM

HPC Server Dynamic Job Scheduling: when jobs spawn jobs
HPC Job Types HPC has 3 types of jobs http://technet.microsoft.co... · Task Flow – vanilla sequence · Parametric Sweep – concurrently run multiple instances of the same program, each with a different work unit input · MPI – message passing between master & slave tasks But when you try go outside the box – job tasks that spawn jobs, blocking the parent task – you run the risk of resource starvation, deadlocks, and recursive, non-converging or exponential blow-up. ......

Posted On Wednesday, October 10, 2012 2:34 PM

Low-Latency High-Performant Financial App Infrastructures
Financial Apps feel the need for speed – this can come via parallelization, and via infrastructure - fast messaging and non-blocking distributed memory management. This blogpost gives an overview + examples of various technologies that can squeeze performance out of your trading apps and clock cycles out of your modeling apps. Low Latency via Infrastructure ZeroMQ · ZeroMQ is a messaging library - ‘messaging middleware’ , ‘TCP on steroids’ , ‘new layer on the networking stack’. not a complete messaging ......

Posted On Monday, March 26, 2012 9:05 PM

Daytona - Iterative MapReduce on Windows Azure
Daytona - Iterative MapReduce on Windows Azure Overview MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of compute nodes. It is a generic mechanism that comprises 2 steps: Map step: The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. The worker node processes the smaller problem, and passes the answer back to its master node. Reduce step: The master node then collects the ......

Posted On Thursday, December 8, 2011 7:26 AM

The Windows Azure HPC Scheduler SDK
Overview Windows HPC Server 2008 is infrastructure for high-end applications that require high performance computing clusters – i.e. for scaling out parallelizable across many compute nodes in a grid. These compute nodes can be coordinated by a head node , which in turn can be proxied via a service broker node that exposes a SOA WCF interface for job scheduling. Additional functionality includes the ability to coordinate between job processes running on nodes via MPI (message passing interface). ......

Posted On Tuesday, December 6, 2011 6:36 AM

Unit Testing a ConcurrentPriorityQueue
I’m leveraging a ConcurrentPriorityQueue – from http://code.msdn.microsoft.... This class basically is a thread safe IProducerConsumerCollection wrapper for a binary heap that prioritizes smaller values. You use it as you would a dictionary, where the priority is the key, except you can have duplicate keys (ie values with the same priority). I needed to demonstrate to a customer that it worked. I set up my queue and my priority enum values: var q = new ConcurrentPriorityQueue<... ......

Posted On Sunday, December 4, 2011 1:05 PM

C++ AMP
Overview C++ AMP is a GPGPU API – it allows you to define functions (kernels) that take some input, perform an expensive calculation on the GPU and return the output to CPU. GPU supports fast calculative operations across many SIMD-like cores - NVidia Tesla supports 512 cores compared to the paltry 10 cores available on the CPU today - even Intel's Knights Corner will only support 60 cores next year. Suitable only for certain classes of problems (i.e. data parallel algorithms) and not for others ......

Posted On Sunday, December 4, 2011 8:20 AM

Copyright © JoshReuben | Powered by: GeeksWithBlogs.net | Join free