Geeks With Blogs

Josh Reuben

Programming Direct3D requires understanding of where different types of resources are bound to the shader pipeline. The shader pipeline consists of configurable fixed function stages (Input Shader, Tessellator, Stream Output, Rasterizer, Output merger), and opt-in HLSL programmable shader stages (Vertex Shader, Hull Shader, Domain Shader, Geometry Shader, Pixel Shader, Compute Shader). Passing data into shaders involves creating & binding resources to the pipeline in C++ on CPU, so that HLSL can manipulate them on GPU, in parallel and at multiple stages. 3 main types of inputs can be passed in to HLSL: buffers, shader resource views and sampler state objects. The graphics pipeline programmable shaders are written in HLSL – a C / C++ derived language with a simplified API and semantics for specifying how data is passed between stages.

Shader Pipeline components

The shader pipeline consists of configurable fixed function stages and HLSL programmable shader stages. Different pipeline stages can perform multiple ops at the same time. Depending upon the use case scenario, different shader stages can be leveraged. Data is passed between HLSL programmable shader stages via matching output & input struct fields decorated with HLSL semantics, and by HLSL global system value semantics. Different shader stages are designed for processing different data granularities - Data can be processed at 3 levels: whole primitive, vertex, and pixel. The rendering pipeline programmable shaders provide opt-in functionality, but at the bare minimum, you should implement a Vertex Shader and a Pixel Shader.

Try to avoid amplification –the Pixel Shader works against interpolated fragments, of which there will be many more than there are vertices – so for performance do calculations earlier in the pipeline, for fewer parallelized ops. For example, the tessellation shaders can increase the number of vertices to be rasterized. In the Rasterizer stage, culling, clipping and scissors test can eliminate unnecessary fragments being passed to the Pixel Shader stage.

Pipeline Stage

type

functionality

HLSL input

HLSL output

Input Assembler

configurable

Read bound input vertex buffers from CPU

   

Vertex Shader

programmable

Transform vertices

Vertex

Vertex

Hull Shader

programmable

Determine tessellation LOD &

process convex hull patch control points

Primitive

control points

Tesselator

configurable

Determine barycentric coordinates to be sampled from primitive

   

Domain Shader

programmable

Vertex generation

control points & barycentric coordinates

Vertex

Geometry Shader

programmable

Modify primitive vertices

Primitive

primitive

Rasterizer

configurable

Interpolate fragments from vertices

Determine fragment depth value

   

Stream Output

configurable

Buffer resource output to CPU

   

Pixel Shader

programmable

Determine pixel color from texture, lights & norm

Pixel

pixel

Output Merger

configurable

Write Pixels to bound render target output

Depth / stencil visibility tests

blending

   

Computation pipeline – A single stage programmable shader for GPGPU. It provides structured threading with group shared memory granularity for intermediary calculation rollup. Compute Shader is called by invoking Dispatch op instead of Draw op.

Input Assembler

· IO: This stage inputs up to 16 vertex buffers & an index buffer, and outputs streams of individual vertices (for input into the Vertex, Hull & Domain shaders) and primitives assembled from vertices (for input into Hull, Domain & Geometry shaders).

· The array of input structures must match the vertex shader's input parameter struct field datatype layout via its vertex shader input semantics (A typical canonical input could match a vertex shader input struct with a float3 POSITION, a float3 NORMAL, and a float2 TEXCOORDS - see below in HLSL section). Note that vertex data can be split between multiple vertex buffers, each with a different struct (e.g. float[3] positions in one, float[3] norms in another).

· Each buffer is configured via a D3D11_INPUT_ELEMENT_DESC - specify SemanticName for binding, Format – int or float datatype, InputSlot : 0-15, AlignedByteOffset – where to start reading data, InputSlotCLass & InstanceDataStepRate – for drawing multiple varied instances of a model mesh.

· PRIMITIVE_TOPOLOGY – specifies how the primatives are organized within the vertex buffer. Strips are more compact than lists and allow indexing of shared vertices

  • o Point list – for particles
  • o Line list – for hair
  • o Line strip
  • o Triangle list
  • o Triangle strip – given a mesh, you are most likely to use this.
  • o Control point patch list (max 32 points) – for input to Hull Shader

Vertex Shader (HLSL)

· Invoked for every vertex inputted from the Input Assembler vertex stream output – each vertex is processed in isolation (this won't deform mesh symmetry because standard transforms rotate, scale, translate are affine).

· 3 types of matrices are often combined in the Vertex Shader to perform geometric affine transforms:

  • o World Matrix - convert the object space vertices into global vertices in the 3D scene, and apply rotate, translate, and scale transforms to them.
  • o view matrix - calculate the camera position
  • o projection matrix - translate the 3D scene into the 2D viewport clip space.

· Vertex shaders are used for:

  • o geometric affine transforms on vertex positions
  • o vertex skinning (object space bone based transforms for posturing)
  • o per vertex light calculations – vertex reflectivity, for later use in Pixel Shader
  • o control point manipulation

· Input can come from HLSL intrinsics generated from input assembler output stream & bound resources:

  • o SV_InstanceID – for variations
  • o SV_VertexID – each vertex has one!

· Output – dependent on next pipeline stage used:

  • o Some data must be passed by stage prior to Rasterizer stage - For passing to Pixel Shader, and optionally pass through for Hull & Geometry shaders to pass to Pixel Shader: provide SV_Position (vertex clip space position - ie post projection) and optionally provide SV_ClipDistance[n] and SV_CullDistance[n] (to Rasterizer stage). If using Hull shader, SV_Position instead provides control points for a patch primitive (a convex hull).

Hull Shader (HLSL)

· 2 required functions (unlike other programmable shaders which have only 1):

  • o Hull shading function: Invoked once for each control point - add [amplify] / remove[filter] / modify control points – input from the Vertex shader SV_Position combined with Input Assembler stage primitive stream output; output to the Domain Shader. 3 inputs: InputPatch<HSControlPointIn, #>, SV_OutputControlPointID, SV_PrimitiveID
  • o Patch constant function: Invoked for the entire control point patch - Configure Tessellator LOD (level of detail) heuristic – e.g. less triangles needed if far from camera, more triangles needed near silhouette edge. Inputs SV_PrimitiveID, outputs SV_TessFactor and SV_InsideTessFactor.

· The Hull shader HLSL program is adorned with several config attributes for the Tessellator & Domain Shader stages:

  • o [domain] – input primitive type, typically "tri" for triangle, can also be isoline or quad
  • o [partitioning] – tessellation algorithm for Tessellator Stagecan be: fractional_even, fractional_odd, integer, pow
  • o [outputtopology] - output primitive typecan be: triangle_cw, triangle_ccw (clockwise / anticlockwise – note that Rasterizer Stage culls triangles facing away from camera ) or line
  • o [outputcontrolpoints(n)] – output size (max 32)
  • o [patchconstantfunc] – control point algorithm for Domain Shader
  • o [maxtessfactor(n)] – driver hint for memory pre-allocation

Tessellator

· Subdivision of input geometry – generates points specifying where vertices are to be created by domain shader. Point count amplification / de-amplification is based upon edge factor to interior factor ratio.

· Receives input from Hull Shader patch constant function output and Hull Shader main function attributes.

· Outputs SV_DomainLocation[n] for input into Domain Shader. Outputs the output topology to Geometry Shader stage if used , or straight to Rasterizer stage.

Domain Shader (HLSL)

· Inputs

  • o from Hull Shader stage: entire control point patch and config attributes – remains constant for the invocation sweep
  • o from Tessellator stage: SV_DomainLocation[n] coordinate points – varies for the invocation sweep à Domain Shader is invoked once for each point.

· Creates vertices from sampled Tessellator stage coordinate points, positioned according to the surface curves defined by the Hull Shader stage control point patch output and the patch constant function config.

· Outputs SV_Position to Rasterizer stage. Whilst the Vertex Shader could provide this to the Rasterizer stage, optionally using the tessellation stages allow mesh morphing and LOD amplification.

Geometry Shader (HLSL)

· Explicitly modify / add / remove geometry vertices – takes vertices from input stream and invokes Append() to add it to output stream.

· Usages:

  • o partial model discard
  • o instance amplification & variation
  • o shadow volumes -discard non-edge primitives to generate just a silhouette
  • o point sprite particle generation – convert points to quads for texture application

· Inputs up to 6 vertices for adjacent triangles (SV_PrimitiveID) from either the Vertex Shader or Domain Shader stages. If no tessellation is used, it connects adjacent primitives according to Index Buffer stream from Input Assembler. An additional input is SV_GSInstanceID for primitive amplification copies. Note that because vertices can be shared by multiple primitives a vertex may be processed multiple times – performance implications.

· adorned with 2 config attributes:

  • o maxvertexcount – for vertex output, specify driver hint for memory pre-allocation
  • o instance – create up to 32 copies, each of which can be vary-transformed according to SV_GSInstanceID

· It has 3 possible output types: PointStream<T> (point list), LineStream<T> (line strip) or TriangleStream<T> (triangle strip). Up to 4 streams can be output – only one needs to be passed to Rasterizer stage as SV_Position, the optional others can be passed to Stream Output stage to send back to CPU. If multiple output streams are used, they must all be PointStream<T>. Geometry Shader optionally outputs a SV_RenderTargetIndex for determining which Texture2D bound render target a stream sent to Stream Output stage should utilize. For split screen rendering, the Geometry Shader can optionally specify SV_ViewportArrayIndex to Rasterizer to specify a target subregion within its render target Texture2D.

Stream Output

· Requires that the Geometry Shader be created with stream output via ID3D11Device::CreateGeometryShaderWithStreamOutput and that output buffers are bound via ID3D11DeviceContext::SOSetTargets

· Used to pass geometry data back to the CPU via Unordered Access Views bound to output buffer streams – e.g. for debugging, testing, offline inspection, or post-processing between multi-passing (via DrawAuto). Different streams can receive different output – e.g. a geometry's back & front faces. Up to 4 output buffer resource slots are available.

Rasterizer

· Primitive culling - cull primitives completely outside the normalized clip space cube according to SV_CullDistance received from Vertex Shader stage. ID3D11RasterizerState config values control culling: FrontCounterClockwise & CullMode – determine triangle vertex ordering, identify front & back faces, and specify which non-contributing face to cull. DepthBias, SlopeScaleDepthBias, DepthBiasClamp – apply depth mapping technique to identify scene objects visible to light sources as well as shadowed artifacts for discarding. Culling is important to reduce amplification in interpolation.

· Primitive clipping - clip parts of primitives partially outside the normalized clip space cube according to SV_ClipDistance received from Vertex Shader stage. . ID3D11RasterizerState config value DepthClipEnable – specifies to remove primitives outside the frustrum (near & far clip planes).

· Normalize the viewport target - The C++ code must provide at least one D3D11_ViewPort defining render target sub-region rectangle and near & far planes. The Viewport is normalized from (-1,-1, 0) to (1, 1,,1) for clip space mapping.

· Multiple render targets - Rasterizer can optionally identify a specific render target viewport and texture slice according to SV_ViewportArrayIndex and SV_RenderTargetArrayIndex – up to 8 render targets are supported.

· Interpolation - Interpolate SV_Position vertex data received from a previous stage into generated fragments (pixels with interpolated attributes) to pass to Pixel Shader stage. ID3D11RasterizerState config values control interpolation: FillMode – specify solid fill (default) or wireframe (fragments are only interpolated for edge polygons); AntiAliasedLineEnable varies edge pixel color according to the percent that a pixel is covered by a line. By default interpolation mode is perspective-correct linear that takes depth into account, but other modes can be set: centroid (considers pixel coverage), no-interpolation (pass constants to appear faceted), no-perspective (ignores depth), sample (MSAA).

· Multi-sample anti-aliasing (MSAA) - ID3D11RasterizerState MultiSampleEnable reduce edge aliasing in a performant manner, buy utilizing multiple sub-samples stored in a depth-stencil buffer instead of increasing the Pixel Shader resolution.

· Scissor test – discard any generated fragments outside the render target's viewport (rectangular windows region). Applied if ID3D11RasterizerState ScissorsEnable is set

Pixel Shader (HLSL)

· Invoked for each fragment - processes each fragment independently. Receives SV_Position processed by Rasterizer stage to hold the X,Y render target coordinates and normalized depth value.

· SV_Depth can optionally be used to substitute an arbitrary depth value. conservative depth output can clamp minimum / maximum depth values, and the bill-boarding technique allows a textured quad perpendicular to the camera to have a depth complexity that enables partial occlusion.

· If the Rasterizer stage was configured for MSAA, SV_SampleIndex is passed into Pixel Shader to be invoked for each subsample of each pixel. If the rasterize passes specific render target texture slice data via SV_RenderTargetArrayIndex the Pixel Shader can evaluate whether to process the fragment. The Pixel Shader can be executed once per pixel, according to which sample passes the coverage test, or it can run once per sample per pixel (supersampling).

· For each fragment, calculates the pixel out color to pass to the Output Merger stage. The calculation uses basic trigonometry & linear algebra & is based on:

  • o Sampled texture – external resource file loaded & bound via shader resource view.
  • o light - type, color, directional vector - bound via constant buffer.
  • o material reflectivity factor
  • o vertex normal vectors - specified in vertex buffer
  • o color - bound via constant buffer.

· The Pixel Shader stage passes SV_Target[n] (color) and SV_Depth (depth) to the Output Merger stage. For MSAA, SV_Coverage is also passed.

Output Merger

· Inputs SV_Target[n] (color) and SV_Depth (depth) from the Pixel Shader stage

· blending – Blending involves post process combining texels of multiple 2D render targets. color value combination function against an input pixel color – e.g. modify alpha channel transparency. Configured using D3D11_RENDER_TARGET_BLEND_DESC and D3D11_BLEND_DESC

· Depth test– for each fragment, use Z-buffer algorithm: if the normalized z-coordinate from Rasterizer stage (or clamped value from Pixel Shader stage) is less than the bound depth stencil resource value, then it is in visibility range , else it can be discarded. Ignored if the Pixel Shader stage targeted the use of an ID3D11UnorderedAccessView.

· stencil test – for each fragment, determine whether the area of the render target is masked and cannot be written to. Can be configured for both front face & back face of a primitive. Configured via D3D11_DEPTH_STENCIL_DESC, D3D11_BLEND_DESC, Set test evaluation & pass / fail action using D3D11_COMPARISON_FUNC and D3D11_STENCIL_OP. Ignored if the Pixel Shader stage targeted the use of an ID3D11UnorderedAccessView.

· Output – the Output Merger stage merges results into bound output render target(s). Up to 8 ID3D11RenderTargetView and 1 ID3D11DepthStencilView can be bound using ID3D11DeviceContext::OMSetRenderTargets. If the Pixel Shader stage targeted the use of an ID3D11UnorderedAccessView, use ID3D11DeviceContext::OMSetRenderTargetsAndUnorederedAccessViews. Multiple render targets allow different versions of the same scene. These must have the same size (height, width, depth, sample count, array size) & type (e.g. Texture2D, Texture2DArray) so that depth stencil can match the render target. You can use multiple render target Texture2Ds (MRT - require only a single parallelized Pixel Shader invoke to split), or texture slices in a single Texture2DArray (individual slice Rasterization allow geometry primitives to be rasterized to different target locations).

Resources

Passing data into shaders involves creating & binding resources to the pipeline in C++ on CPU, so that HLSL can manipulate them on GPU, in parallel and at multiple stages. 3 main types of inputs can be passed in to HLSL: buffers, shader resource views (more memory, but slower access) and sampler state objects.

For rendering pipeline, Bound resources are either RO or WO. For compute pipeline, bound resources can be RO, WO or RW

Resource Creation

Resources are created by specifying a specifically typed resource description (via config flags) & a D3D11_SUBRESOURCE_DATA structure describing the type of data loaded.

The following config flag enums are leveraged in the resource description:

· D3D11_USAGE

  • o DEFAULT (GPU RW) – for pipeline outputs: Output Merger stage render target Texture2D and Stream Output stage vertex buffers
  • o IMMUTABLE (GPU RO) – for static buffers created at initialization
  • o DYNAMIC (CPU WO, GPU RO) – used for passing data from C++ to shader programs on each render frame update – e.g. cbuffer scalar values, & matrices for transforms
  • o STAGING (CPU RW, GPU RW) – for DirectCompute GPGPU

· D3D11_CPU_ACCESS_FLAG – read, write or both

· D3D11_BIND_FLAG – specify pipeline location(s) that have access

  • o VERTEX_BUFFER , INDEX_BUFFER – bind geometry to Input Assembler stage – these binding types do not require a resource view.
  • o RENDER_TARGET, DEPTH_STENCIL – bind raster output from Output Merger stage. Note that render target requires a resource view to facilitate binding.
  • o STREAM_OUTPUT – bind geometry output from Stream Output stage
  • o CONSTANT_BUFFER, SHADER_RESOURCE, UNORDERED_ACCESS – bind input to any programmable shader stage. Note that these 3 binding types require resource views to facilitate binding.

· D3D11_RESOURCE_MISC_FLAG - miscalaneous

Resource view types

4 interfaces derive from ID3D11Resource - 4 types of resource views specify how the resource will be used:

  • · ID3D11RenderTargetView – bind a Texture2D for output to the double buffer swap chain
  • · ID3D11DepthStencilView – bind output for depth & stencil tests.
  • · ID3D11ShaderResourceView – RO bound for HLSL shaders at any stage. Multiple shader resource views can access the same resource.
  • · ID3D11UnorderedAccesslView - RW bound for HLSL shaders at Compute Shader or Pixel Shader stage. Only a single unordered access view can access the same resource.

Resource View descriptions use a structured union to specify the type of resource data structure:

  • · Buffer - 1D linear block of memory
  • · BufferEx – raw buffer freeform structure
  • · Texture1D – a vector of texels – typically used to implement lookup tables of float values
  • · Texture2D – a matrix of texels, - typically used for standard image representation: render targets, depth target, normal maps (RGB values map to normal vector XYZ), displacement maps.
  • · Texture2DMS – multi-sampled version
  • · Texture3D – voxels (memory intensive) can be used for isosurface modeling and global illumination (fixed resolution makes it more performant than ray tracing).
  • · TextureCube – for CubeMaps – 6 Textures can be applied together to a model for reflective effects.
  • · Texture1DArray , Texture2DArray , Texture2DMSArray, TextureCubeArray – arrays of textures can be bound in single ops.

Resource data structures are supported by different resource view types

RenderTarget View

DepthStencil View

ShaderResource View

UnorderedAccess View

Buffer

X

 

X

X

Texture1D

X

X

X

X

Texture1DArray

X

X

X

X

Texture2D

X

X

X

X

Texture2DArray

X

X

X

X

Texture2DMS

X

X

X

 

Texture2DMSArray

X

X

X

 

Texture3D

X

 

X

X

TextureCube

   

X

 

TextureCubeArray

   

X

 

BufferEx

   

X

 

Depending on the resource view type and data structure combination, different size, index & offset properties are specifiable for each resource data structure:

  • · ElementOffset, ElelmentWidth
  • · MipSlice
  • · FirstArraySlice, ArraySize
  • · FirstWSlice, WSize
  • · MostDetailedMip, MipLevels
  • · FirstElement, NumElements
  • · First2DArrayFace, NumCubes

Buffers

· 1D linear block of memory.

· Can contain scalar, vector, matrix values, structures of these types, or arrays of these structures. If passing custom data structures via vertex buffers or constant buffers, the C++ structure layout and datatypes must match the corresponding HLSL cbuffer. C++ code binds to a HLSL cbuffer by its name, but HLSL does not use this name internally – instead it uses shader reflection intrinsic keywords mapped against the fields of the cbuffer data structure. E.g. angles & transform matrices for vertex shaders.

· Confusingly, buffers while describe transferrable memory block structures, the term is overloaded to describe the intent of the usage in passing polygon mesh model structures to the pipeline:

  • o Vertex Buffers - array of customizable vertex structures – a typical vertex structure contains a float[3] position vector, a float[3] norm vector, and a float[2] texture coord vector. A model is typically represented as a triangle polygon mesh – each corner of each triangle is a vertex. Multiple models can be combined into 1 triangle strip vertex buffer to reduce draw calls from CPU. Typically bound to Input Assembler stage as input, can also be bound to Stream Output stage for output debug.
  • o Index Buffers - Allow reuse of shared vertices, reducing the vertex buffer size & thus the amount of shader processing.

Constant Buffers

· The mechanism for data transfer of RO data structures from C++ host app on CPU to cbuffer structures in programmable shaders on GPU.

· The value may vary between each C++ draw or dispatch call, but remain constant across all parallel shader invocations for that call. Treated as a global constant in GPU memory for that parallel sweep and is accessible from multiple shader stages as well as parallel invocations across each shader

· HLSL cbuffer structures - constant buffers are globally accessible and immutable across parallel instances of a shader program invocation. Must match CPU host app constant buffer struct. Note correct offset sizes may require padding for this match – fields can be annotated with packoffset. Bound via SSetConstantBuffers. static const variables are added to the $Globals constant buffer and shader entry function uniform params are added to the $Params constant buffer.

· HLSL tbuffer structures - Texture buffers have similar syntax to cbuffer structures. However they are used as mapping targets for large array inputs from bound shader resource views, and use an async memory access mechanism. Bound via SSetShaderResources

Structured Buffers

· The mechanism for data transfer of arrays of data structures from C++ host app on CPU to StructuredBuffer<Tstruct> or RWStructuredBuffer<Tstruct> of struct structures in programmable shaders on GPU, and for passing data between pipeline stages If bound to multiple stages must be RO. HLSL uses Buffer<T> variables for DXGI_FORMAT RO values bound via Load, and StructuredBuffer<Tstruct> for RO arbitrary structs. HLSL RWBuffer<T> and RWStructuredBuffer<Tstruct> support RW values – require manual thread sync. Writing is accomplished via array accessor. For RO use in multiple stages must bind via a shader resource view to StructuredBuffer<Tstruct> . for RW use in a single stage Must bind via an unordered resource view to a RWStructuredBuffer<Tstruct>. like C++, HLSL supports bracketed array indexing and a dot operator. It also supports a GetDimensions() method. an unordered access view binding to a HLSL AppendStructuredBuffer<Tstruct> or ConsumeStructuredBuffer<Tstruct> can also act as a LIFO stack via the HLSL append() & consume() ops.

Byte Address Buffers

· use 4-byte offsets instead of a fixed structure – for GPGPU algorithms, can access custom data structures – trees, linked lists etc. part of each item can specify the (+/-) offset to the next item.

· HLSL ByteAdressBuffer<T> RO values bound via Load#. – access a buffer via byte offset of 1-4

· HLSL RWByteAdressBuffer<T> uses store# operator to write values.

Indirect argument buffers

· reduce CPU passing params into Pixel Shader or Compute shader on each execution pass, by instead passing values from within a resource.

· Use any of the following methods: DrawInstanceIndirect (access a vertex), DrawIndexedInstanceIndirect (the array version) and DispatchIndirect

· Obviously requires GPU RW access

Geometry & Tessellation Buffers

· HLSL PointStream<Tstruct>, LineStream<Tstruct>, TriangleStream<Tstruct> – stream output buffers for the Geometry Shader to emit vertices for a primitive. Use Append() to add a vertex to a strip, RestartStrip() to begin a new strip.

· HLSL InputPatch[n] - input control point array for Hull Shader entry point function & patch constant function; and for Geometry Shader

· HLSL OutputPatch[n] - input control point array for Domain Shader.

Texture Resources

· Image-like 1-3D texel arrays. E.g. a Texture3D has X,Y,Z coordinates as well as UVW normalized coordinates.

· RO resource types: Texture1D , Texture1DArray , Texture2D , Texture2DArray , Texture2DMS , Texture2DMSArray , Texture3D, TextureCube, TextureCubeArray

· Textures are stored in texture specific video memory locations via register(t#)

· Textures support the following filtering functions (mipmap interpolation , minification / magnification):

  • o Mip-map levels - multiple mipmap mip levels allow different resolution granularities in 1-3 dimensions
  • o slices - sub-selections across 1 to 3 dimensions
  • o MSAA - (multisample anti-aliasing) - a quality technique whereby each texel can be composed of up to 32 subsamples, controlled by sample count & quality

· HLSL TextureX resource exposes several methods:

  • o Sample methods - apply filtering– take a SampleState and normalized float texture coordinates as parameters. The SampleBias, SampleGrad & SampleLevel methods support mipmapping
  • o SampleCmp & Gather methods – Boolean comparison & RGBA functions for shadow mapping
  • o Load methods – array indexer for accessing texture raw RO pixel subsamples – for MSAA
  • o Get methodsfor querying metadata – e.g. dimensions, mip levels, MSAA sample positions etc.

Sampler State Objects

used for filtering textures for pixel fragments. D3D11_SAMPLER_DESC can specify up to 16 texture resource sampling configuration settings for each pipeline stage.

· A sampler modifies how the pixels are written to the polygon face when shaded – determine which combination of pixels will be drawn from the original texture e.g. based on position, screen depth.

· HLSL SampleState objects are mapped to s# sample registers.

· Sampler details:

  • o sampling location – AddressU, AddressV, AddressW – control wrapping, flipping, mirroring, value range clamping.
  • o level of detail – MinLOD , MaxLOD, MipLODBias
  • o filtering - D3D11_FILTER subtype can specify minification (eliminate sparkle effects) / magnification (eliminate blockiness) / mip-mapping (multi-resolution representations). Sampling quality can be point (raw) , linear (interpolated average smoothing) or anisotropic (angle based interpolation).
  • o border color - float BorderColor[4]

Graphics Card memory Allocation

HLSL supports a registry access mapping scheme for structure fields. Explicit register location can be specified for a data structure using register – e.g. cbuffer X : register(cb#) {}. These registry schemes are as follows:

  • · v# - inputs (RO)
  • · r#, x# - temp data
  • · t# - textures
  • · cb#[i] – constant buffers
  • · icb#[i] – immediate constant buffers
  • · #u – unordered access
  • · #o – output to be passed to next stage input

HLSL – High Level Shader Language

The graphics pipeline programmable shaders are written in HLSL - C / C++ derived with a simplified API. No support for dynamic memory alloc, recursive functions, pointers or templates.

Inter-stage data passing – binding semantics specify metadata (stage IO data) which consist of up to 4 float or int vectors. The output attributes of the previous stage must match the input attributes of the next stage. HLSL variable & system value semantics (SV_ prefix) adorn parameters and provide the pipeline with binding of the required matching values.

Syntax

Primitive types: bool, int, unint, half (16bit float for backwards compatibility), float, double

Vectors and matrices: support 1-4 components. can use the verbose type syntax (e.g. vector<int,2> , matrix<float,4,4> ) or the compressed syntax (vector2, matrix4x4). Can be initialized via array initializers or constructors that take scalar or vector params. Components can be accessed via array index syntax or via xyzw or rgba ordered swizzle properties. Single array-indexed matrices implicitly cast to vectors. Use the mul op for matrix multiplications. To enforce row major layout, use row_major modifier

Data structures – a struct, interface (implicitly pure virtual class i.e. abstract) or class can contain primitive, vector and matrix members.

Function param semanticsin, out, inout, uniform (constant in)

Control flow – supports loops (for, while, do while) and conditionals (if , case). if a branch is coherent (simultaneous invocations of a shader program all choose the same branch) then dynamic branching occurs: control flow executes a single branch), otherwise predication occurs (all branches are executed, taking more compute cycles).

Attributes – GPU compiler hints for flow control (branch / flatten, loop / unroll) tessellation shaders (domain , maxtessfactor, outputcontrolpoints, outputtopology, partitioning, patchconstantfunc), the geometry shader (maxvertexcount, instance) , the compute shader (numthreads) or the pixel shader (earlydepthstencil)

Intrinsic Functions - HLSL contains various mathematical functions (mapped to graphics card instruction set): general math, vector & matrix manipulations, casting, synch and thread atomicity, pixel & tessellation manipulation.

Reflection - HLSL also supports reflection for querying metadata via ID3D11ShaderReflection, ID3D11ShaderReflectionConstantBuffer, ID3D11ShaderReflectionVariable, ID3D11ShaderReflectionType interfaces.

HLSL Semantics

· semantic strings decorate shader variables, function parameters & struct fields. They specify the intended I/O binding for passing matching parameters between shader pipeline stages.

· Commonly used vertex shader input semantics include POSITION (Vertex position in object space), NORMAL (Normal vector), COLOR (Diffuse / specular color) or TEXCOORD (Texture coordinates). bonId and bonewight are used for vertex skinning. Commonly used vertex shader output semantics include POSITION (Vertex position transformed in screen space), COLOR (pass through) or TEXCOORD (pass through). If tessellation shaders are used, TESSFACTOR is also passed.

IO Semantics

Vertex Shader

Input Semantics

Vertex Shader

Output Semantics

Pixel Shader

Input Semantics

Pixel Shader

Output Semantics

BINORMAL[n] (float4)

BLENDINDICES[n](uint)

BLENDWEIGHT[n](float)

COLOR[n] (float4)

NORMAL[n](float4)

POSITION[n] (float4)

POSITIONT (float4)

PSIZE[n] (float)

TANGENT[n] (float4)

TEXCOORD[n] (float4)

boneId (uint4)

boneweight (float4)

COLOR[n] (float4)

FOG (float)

POSITION[n] (float4)

PSIZE (float)

TESSFACTOR[n] (float)

TEXCOORD[n] (float4)

COLOR[n] (float4)

TEXCOORD[n] (float4)

COLOR[n] (float4)

DEPTH[n] (float)

System-Value Semantics - begin with an SV_ prefix. Pixel shaders can only write to SV_Depth and SV_Target parameters. SV_VertexID, SV_InstanceID, & SV_IsFrontFace can only be input into the first active shader in the pipeline that can interpret it and must be passed to subsequent stages.

Read Only

Write Only

Input Assembler

 

*generated here:

SV_VertexID (uint)

SV_InstanceID (uint) – for DrawInstance calls

SV_OutputControlPointID (uint)

SV_PrimitiveID (uint)

Vertex Shader

SV_VertexID (uint)

SV_InstanceID (uint)

SV_ClipDistance[n] (float)

SV_CullDistance[n] (float)

SV_Position (float4)

Hull Shader

SV_VertexID (uint)

SV_OutputControlPointID (uint)

SV_PrimitiveID (uint)

SV_InsideTessFactor (float/float[2]) - how much to tessellate patch non-edge polygons

SV_TessFactor (float[2|3|4]) - how much to tessellate patch edge polygons

Domain Shader

SV_DomainLocation (float2|3)

SV_OutputControlPointID (uint)

SV_PrimitiveID (uint)

SV_InsideTessFactor (float/float[2])

SV_TessFactor (float[2|3|4)

 

Geometry Shader

SV_GSInstanceID (uint)

SV_PrimitiveID (uint)

SV_ClipDistance[n] (float)

SV_CullDistance[n] (float)

SV_Position (float4)

SV_RenderTargetArrayIndex (uint)

SV_ViewportArrayIndex (uint)

Pixel Shader

SV_IsFrontFace (bool)

SV_Position (float4)

SV_RenderTargetArrayIndex (uint)

SV_ViewportArrayIndex (uint)

SV_Coverage (bool) – mask

SV_Depth (float)

SV_Target[0..7] (float)

SV_SampleIndex (uint)

Output Merger

SV_Coverage (bool)

SV_Depth (float)

SV_Target[0..7] (float)

SV_SampleIndex (uint)

·

Read Only

Compute Shader

SV_DispatchThreadID (uint3)

SV_GroupID (uint3)

SV_GroupIndex (uint)

SV_GroupThreadID (uint3)

Compilation & linkage

· HLSL compiles into a SIMD vectorized assembly language byte code specific to the GPU architecture. Compiled before binding – compile via fxc.exe or in code via D3DXCompileFromFile/Resource, and bound via D3D11Device::Create*Shader – use to create Vertex, Hull, Domain, Geometry, Pixel or Compute shader objects

· To prevent branched shader program combinatorial explosion, dynamic shader linkage allows selecting the appropriate shader program interface implementation during binding for each Draw or Dispatch invocation.

clip_image001

Accessing the Shader pipeline from CPU

A Win32 message pump app is used to host the DX COM API for calling into the pipeline.

2 DirectX classes serve as the API entry points:

  • · ID3D11Device (the device) is used to create & configure resources and attach shader programs.
  • · ID3D11DeviceContext (the DC) is used to bind & manipulate resources and invoke bound shaders. A default single immediate context provides the main rendering thread, but deferred contexts can provide command lists for async resource creation.

Creating resources - The ID3D11Device interface supports the Create family of ops for creating buffers and resources

Binding input resources - The ID3D11DeviceContext can dynamically manipulate resources via the Map & UnMap ops for reading & writing, and via UpdateSubresource for efficient writing to a resource. Resources can be copied via CopyResource, CopySubresourceRegion & CopyStructureCount ops. MipMaps can be dynamically generated from shader resource views using the GenerateMips op.

Initialization process – get the device & DC, create resources, setup windows message loop and render frame function. The windows message loop essentially conducts 2 ops repeatedly : update and render.

Update - Timer based animation often requires calculation of angles which are bound to constant buffers – this can be achieved with the sinf function or via calculating the modulus 360 of an angle to prevent overflow.

Render - The render frame function makes use of several function family groups in ID3D11DeviceContext:

  • · SetRenderTargets - Bind RenderTargetView & DepthStencilView
  • · Clear - clear render target(s)
  • · IASet - bind buffer data
  • · xSSet - set inputs – set shaders buffers, shader resource views & samplers for specific pipeline stages
  • · Draw

Draw calls:

· pass stream of vertices into Input Assembler stage

  • · Auto – make primitive order dependent on vertex order, and setup up the GPU to do 2 passes: the 1st pass Vertex Shader & Geometry Shader stages process the vertices then the Stream Output stage feeds the data stream back into the Input Assembler stage for a 2nd pass. Multiple rendering passes are useful for applying vertex skinning or tessellation on a 1st pass, mitigating the amplification effects.
  • · Indexed – leverage Index buffer
  • · Instanced – multiple copies of same mesh model primitive – introduce variations
  • · Indirect – allow GPU to construct vertex buffer from a previously loaded resource, reducing CPU to GPU data flow
Posted on Wednesday, March 14, 2012 10:28 AM Graphics Programming , C++ | Back to top


Comments on this post: Direct3D 11 Programming in a Nutshell

# re: Direct3D 11 Programming in a Nutshell
Requesting Gravatar...
DS cannot read SV_OutputControlPointID, can it?
Left by ON on Aug 14, 2014 3:31 AM

# re: Direct3D 11 Programming in a Nutshell
Requesting Gravatar...
Great information! It is nice to learn new ideas now. - Morgan Exteriors
Left by Winston Monroe on Dec 29, 2016 1:40 PM

Your comment:
 (will show your gravatar)


Copyright © JoshReuben | Powered by: GeeksWithBlogs.net