WARNING: This is a tutorial for DirectX 12 containing technical and spelling errors. This is not a finished document and most likely won’t be finished

Table of Contents
Preface
Guide to Readers
Introduction
Creating the project
Creating a window
The Graphics Pipeline
Initializing Direct3D12
Rendering Triangles
Depth and Stencil Testing
Indexed Rendering
Constant Buffers
Optimizing Root Signatures
Residency
- Evect
- Why and When to Control Residency

Preface

This paper isn’t an introduction to graphics programming. There aree many other books that do a good job of that. This books assumes you are reasonably proficient in at least the basics of graphics programming. Not having this knowledge shouldn’t prevent you from benefitting from this paper. All keywords are indexed at the end of the book with a reference to a explanation (either a page and line number or a external source). I also won’t be covering any of the generic mathmatics used in graphics programming. Whenever I use mathmatics that I won’t be explaining I’ll reference on or more external sources you can use to learn more about the magic in question.

This paper is structured in such a way it is possible to skip the basic principles of D3D12 and skip to specific rendering techniques. Due to the structure of the example code it should be easy to refer back to previously explained subjects.

The code examples may contain some unusual language features or programming tricks. I hope this complexity will actually make the parts of the code that matter easier to read. The code examples make heavy use of the standard library and modern C++ features.

As a word of warning and encouragment: Don’t worry if you don’t understand everything on your first reading. I didn’t understand everything when I was writing this paper and was continuesly researching questions that popped up. This paper isn’t nessessarily meant to be read only once. I hope you will find yourself referring to it again and again.

I welcome comments on this paper, wether criticisms of the examples, writting, references and issues I’ve missed, or rendering techniques I should have included. You can contact me by e-mail at d3d12@vzout.com. I’m also open to pull requests on the examples gitlab page.

Guide to Readers

This paper has two main parts. The first part (Chapters 1 to 4) are used to introduce you to DirectX and teach you the main building blocks for most applications. It is recommended you follow the parapgaphs in order during your first reading. Because these chapters will explain concepts that keep reocuring their is little abstraction in the code examples.

The second part (Chapters …) discuss and implement specific rendering techniques or features. These chapters can be read in any order. The code examples for these chapters abstract previously explained features. The functions and objects that are abstracted will be contained in a namespace called “tut” with the number of the pargraph where the code was explained and written without abstraction. For example “tut1”, “tut2” and etc.

If you want to run the code snippets and examples you will need a computer which supports DirectX 12 (feature level 11.0) and Windows 10. If you want to compile the snippets you will need the Windows 10 SDK. The snippets and examples accompanying this article work with version 10.0.14393.795 or later of the Windows10 SDK which can be downloaded here.

It is possible to follow this book without the acompanieing code without any issues. However if you do get confused or want to test something the code examples can be a great help.

As mentioned in the preface: at the end of the book you can find a list of keywords with either a page and line number to the explanation or a link to a external source if I felt like the keyword was outside of the scope of this paper.

Introduction

Choosing The Right API

What DirectX Isn’t

DirectX isn’t a graphics API. It is a collection of API’s. I will only talk about Direct3D12 (D3D12) which is the rendering part of the collection. From here on out I’ll stop calling Direct3D DirectX.

The History of DirectX

Creating the project

I will assume you will want to use Visual Studio. If I am wrong good luck getting the Windows10 SDK to work with any compiler other than MSVC.

You can create a empty Win32 Project. To select the correct Windows10 SDK you can right click on your solution and click “Retarget solution” or you can go to the properties page of your project and open Configuration Properties > General and change the Target Platform Solution\

I won’t be using precompiled headers in my examples but its completely up to you if you want to enable them.

Now lets create two files named game.cpp and game.h, create a WinMain entry point and include game.h in game.cpp. As I said before I will wrap all my code in the namespace tut. But I won’t include the namespaces in the snippets to reduce the size.

#include "game.h"

int CALLBACK WinMain(HINSTANCE inst, HINSTANCE prev_inst, LPSTR arg, int show_cmd) {

	return 0;
}

You can find the source of this paragraph here

Creating a window

To use D3D12 we will obviously need a window to render to. You can create one using the WinAPI. We will also need a game loop. Lets define a wrapper function for the creation of the window, a function to capture window events and a function to start the game loop. Also if you want you can define a render and init function that will be passed to the tut::StartLoop function.

HWND InitWindow(const char* name, HINSTANCE inst, int show_cmd, int width, int height, bool fullscreen = false);
LRESULT CALLBACK WindowProc(HWND hWnd,	UINT msg, WPARAM w_param, LPARAM l_param);
void StartLoop(std::function<void()> init, std::function<void()> render);

void Init();
void Render();

Lets start by implementing the tut::InitWindow

HWND InitWindow(const char* name, HINSTANCE inst, int show_cmd, int width, int height, bool fullscreen) {
	HWND hwnd = nullptr;

	if (fullscreen) {
		HMONITOR hmon = MonitorFromWindow(hwnd, MONITOR_DEFAULTTONEAREST);
		MONITORINFO mi = { sizeof(mi) };
		GetMonitorInfo(hmon, &mi);

		width = mi.rcMonitor.right - mi.rcMonitor.left;
		height = mi.rcMonitor.bottom - mi.rcMonitor.top;
	}

	WNDCLASSEX wc;
	wc.cbSize = sizeof(WNDCLASSEX);
	wc.style = CS_HREDRAW | CS_VREDRAW;
	wc.lpfnWndProc = WindowProc;
	wc.cbClsExtra = NULL;
	wc.cbWndExtra = NULL;
	wc.hInstance = inst;
	wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);
	wc.hCursor = LoadCursor(NULL, IDC_ARROW);
	wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 2);
	wc.lpszMenuName = NULL;
	wc.lpszClassName = name;
	wc.hIconSm = LoadIcon(NULL, IDI_APPLICATION);

	if (!RegisterClassEx(&wc))
		throw("Failed to register class with error: " + GetLastError());

	hwnd = CreateWindowEx(NULL,
		name, name,
		WS_OVERLAPPEDWINDOW,
		CW_USEDEFAULT, CW_USEDEFAULT,
		width, height,
		NULL,
		NULL,
		inst,
		NULL);

	if (!hwnd)
		throw("Failed to create window with error: " + GetLastError());

	if (fullscreen) {
		SetWindowLong(hwnd, GWL_STYLE, 0);
	}

	ShowWindow(hwnd, show_cmd);
	UpdateWindow(hwnd);

	return hwnd;
}

The window’s event callback function will be quite empty since we will just use it to close the window.

LRESULT CALLBACK WindowProc(HWND handle, UINT msg, WPARAM w_param, LPARAM l_param) {
	switch (msg) {
		case WM_DESTROY:
			PostQuitMessage(0);
			return 0;
		case WM_KEYDOWN:
        	if (w_param == VK_ESCAPE)
                DestroyWindow(WINDOW_HANDLE);
        	return 0;
	}

	return DefWindowProc(handle, msg, w_param, l_param);
}

Now lets write our function to start the game loop. We will pass 2 functions as arguments. One for initialization and one that gets called every frame for rendering. We will leave the init and render function empty for now.

void StartLoop(std::function<void()> init, std::function<void()> render) {
	MSG msg;
	ZeroMemory(&msg, sizeof(MSG));

	init();

	while (true) {
		if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
			if (msg.message == WM_QUIT)
				break;

			TranslateMessage(&msg);
			DispatchMessage(&msg);
		}
		else {
			render();
		}
	}
}

void Init() {
}

void Render() {
}

Now we have implemented those functions lets update or application’s entry point and define a window width and height.

static unsigned int WIDTH = 640;
static unsigned int HEIGHT = 480;

int CALLBACK WinMain(HINSTANCE inst, HINSTANCE prev_inst, LPSTR arg, int show_cmd) {

	CreateWindow(inst, show_cmd, WIDTH, HEIGHT);
	StartLoop(&Init, &Render);

	return 0;
}

We now have a window to render to! You may have noticed I am not explaining what a HINSTANCE is or what the show_cmd is used for. I won’t explain WinAPI functions and parameters since that is not the purpose of this article.

You can find the source of this paragraph here

The Graphics Pipeline

Before we get to the actual initialization of D3D12 I want to explain the graphics pipeline. The order, access and type of pipeline stages is wonderfully shown in this image provided my Microsoft.

D3D12 Graphics Pipeline

But you may not fully understand all the buzz words used in this image. So lets dip our toes in the main stages of the graphics pipeline!

Input Assembler

The input assembler reads the buffers created by the user to create primitives for other stages. The IA can can assemble vertices into several different Primitive Types.

Adjacency information available in the geometry shader. If a geometry shader were invoked with a triangle including adjacency, for instance, the input data would contain 3 vertices for each triangle and 3 vertices for adjacency data per triangle.

The secondary purpose of the Input Assembler is to attach system-generated values to help shaders to be faster. System-generated values are also called semantics. All three shader stages are constructed from a common shader core, and the shader core uses system-generated values (such as a primitive id, an instance id, or a vertex id) so that a shader stage can reduce processing to only those primitives, instances, or vertices that have not already been processed.

Vertex Shader

The vertex shader takes the data received from the IA (vertices and indices) and GPU buffers and modifies the data. We use this to create effects such as transformations, lighting and displacement mapping.

Hull and Domain Shader

The hull and domain shader are both part of the GPU’s Tesselessation process. These shaders are generally used to create high detail geometry from “Patches” (Low detail geometry). The hull shader takes an input patch and returns an output patch. The output of the hull shader runs trough the Tessellesation stage which produces domains.

The domain shader is takes those domains in order to compute the actual vertex position.

Geometry Shader

Unlike vertex shaders, which operate on a single vertex, the Geometry shader inputs are the vertices for a full primitive. For example the input primitive can be expanded into one or more other primtives. The geometry shader can stream-out vertex data into a buffer which can later be drawn. A example use can be things like grass, geometry tessellation and volumatric shadows.

Pixel Shader

The pixel shader is the final stage of the pipeline before we merge everything together. This shader is executed for every pixel fragment and is used to determine the color of said pixel. This can return a constant output or something more advanced like per-pixel lighting, reflections and shadows.

Output Merger

This stage may reject some of the pixel fragments from the pixel shader thanks to the depth. stencil and depth tests and render targets. The remaining pixels are drawn to the back buffer. Blending is also done in this stage.

Initializing Direct3D12

Overview

I guess now is a good time to start explaining what Direct3D actually is and why its better than higher level API’s.

D3D12 delivers much better performance due to the low level control over the graphics hardware. This low level control allows programmers to make better use of multi threading by filling comamnd lists on different threads. Sadly this low level control does have some disadvantages. We will have to do a lot more bookkeeping like managing memory and CPU/GPU synchronization.

D3D12 also minimizes CPU overhead by using pre-compiled pipeline state objects and command lists. During the initialization we create as many pipeline state objects as we need. We give he pipeline state objects things such as the shaders, blending description, rasterizer description, primitive topology and more. Previously the driver had to create the pipelines during runtime which is obviously less efficient. We can also create a command list that we can reuse without having to populate it again. Those command lists are called bundles.

You can find the source for of this paragraph here.

Initialization Order

We will structure our application like this:

Create Window
Create Device
Create Command Queue
Create Swap Chain
Create Back Buffers
Create Root Signature
Create Pipeline State Object
Create Command Lists
Create Vertex Buffers and etc.
Loop
- Populate Command Lists.
- Execute Command Lists.
- Wait(fence)
- Display a beautifully rendered frame.
- Reset the Command Lists and Allocators.
Begin clean up by using Wait(fence).
Release all D3D12 objects.

We are starting our game loop from our application’s entry point but we don’t want to initialize all those things inside our entry point. So we are going to define a function called InitD3D12 which we will be calling from the entry point instead.

void InitD3D12();

void InitD3D12() {
	// Device
	// Cmd Queue
	// Swapchain
	// Back Buffers
}

We will create our root signatures, and command lists inside our Init function we passed to our StartLoop function because they are very game specific.

Device

The device is represented by the ID3D12Device interface. The device is a virtual adapter which we use to create command lists, pipeline state objects, root signatures, command allocators, command queues, fences, resources, descriptors and descriptor heaps. I know a lot of concepts I haven’t explained but I’ll get to that when when we actually need them.

Now lets create our device! To create our device we need a physical adapter (similar to vulkan’s physical device). We specifically want your main GPU and not some software device. We also require the adapter to support at least feature level 11 which is required by D3D12. If you have a SLI or Crossfire setup you could initialize your device with multiple physical adapters. I won’t be going over this since I don’t have a machine with multiple GPU’s to test my code on.

So lets write a function to find a compatible physical adapter. To create a device we also need a IDXGIFactory5. We will also use this factory to create our swapchain. We can use the CreateDXGIFactory1 to create our factory. It takes a REFIID and a void**. Void is a pointer to a void pointer. The function. In this case we want to pass a IDXGIFactory5**. A REFIID is a globally unique identifier (GUID) of the IDXGIFactory5. We don’t really want to generate a REFIID in this case. Luckily window’s has a macro called IID_PPV_ARGS which generates a GUID for you. We will be using IID_PPV_ARGS a lot since many D3D12 functions require REFIID’s

Lets defien our device creation function like this:

void CreateDevice(IDXGIFactory5** out_factory, ID3D12Device** out_device);

Now lets implement our function by starting to create a IDXGIFactory5.

void CreateDevice(IDXGIFactory5** out_factory, ID3D12Device** out_device) {
	HRESULT hr = CreateDXGIFactory1(IID_PPV_ARGS(out_factory));
	if (FAILED(hr)) {
		throw "Failed to create DXGIFactory.";
	}

	...

Finding a adapter is a bit more interesting. We can enumerate our adapters with the method IDXGIFactory1::EnumAdapters1. Like I said before we don’t want a software renderer so we need to get the description of the adapter via the IDXGIAdapter1::GetDesc1 method. This method outputs a DXGI_ADAPTER_DESC1 which has a flag attribute which we test against the DXGI_ADAPTER_FLAG_SOFTWARE flag and if this flag is set we can skip the adapter.

// Find a compatible adapter.
while ((*out_factory)->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND) {
	DXGI_ADAPTER_DESC1 desc;
	adapter->GetDesc1(&desc);

	// Skip software adapters.
	if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE) {
		adapterIndex++;
		continue;
	}

	... ```

Now because we want to initialize directx 12 the adapter has to support feature level 11 to be able to run the application. Sadly there is no elegant way to test for this so we will have to temporarily create a device using D3D12CreateDevice. If this function fails we know the adapter doesn’t suite our needs.

D3D12CreateDevice is defined as followed:

HRESULT WINAPI D3D12CreateDevice(
  _In_opt_  IUnknown          *pAdapter,
            D3D_FEATURE_LEVEL MinimumFeatureLevel,
  _In_      REFIID            riid,
  _Out_opt_ void              **ppDevice
);

The minimum feature level we want is D3D_FEATURE_LEVEL_11_0. You can find a list of features levels here which will also tell you what shader model is supported. Since we are only calling this function to see if the adapter is suitable we can pass a nullptr as the device. This will cause issues with IID_PPV_ARGS so will have to generate our own REFIID for the first time and the last.

		hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr);
		if (SUCCEEDED(hr))
			break;

		adapterIndex++;
	}

	if (adapter == nullptr) {
		throw "No comaptible adapter found.";
	}

	...

Now if we have successfully found a adapter we can create the device! This is done exactly the same way as we did before when testing the adapter. But instead of creating our own REFIID and passing a nullptr we can use IID_PPV_ARGS again.

	hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(out_device));
}

Now lets call this from the tut::D3D12 function

void InitD3D12() {
	CreateDevice(&dxgi_factory, &device);
}

Command Queue

A command queue submits work to the GPU. This is done by filling command lists and executing them using a command queue. Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order. I’ll go more in-depth on command lists in the paragraph appropriately named “Command Lists (and Allocators)”.

To create a command queue we will have to fill out the D3D12_COMMAND_QUEUE_DESC structure. This structure is defined as followed:

typedef struct D3D12_COMMAND_QUEUE_DESC {
	D3D12_COMMAND_LIST_TYPE   Type;
	INT                       Priority;
	D3D12_COMMAND_QUEUE_FLAGS Flags;
	UINT                      NodeMask;
} D3D12_COMMAND_QUEUE_DESC;

We will skip the priority and the node mask for now since they are optional. If you want to know what those attributes do you can check out the docs.

Lets start by specifying the type. Oddly enough the enum is called D3D12_COMMAND_LIST_TYPE. This is probably done to save a few lines of code but in practice its kind of annoying. I’ll explain why in a second but first I need to explain the difference between the types.

D3D12_COMMAND_LIST_TYPE_DIRECT (Direct Command Queue) - This command queue accepts all types of commands.
D3D12_COMMAND_LIST_TYPE_COMPUTE (Compute Command Queue) - This command queue accepts only copy and compute commands. This is interesting for GPGPU programming.
D3D12_COMMAND_LIST_TYPE_COPY (Copy Command Queue) - This command queue only accepts copy commands. This allows you to use a separate command queue for doing initialization of data. It won’t be any faster since you have to allocate 2 command queues instead of 1 direct command queue. But this will prevent you from having to deal with reusing command queues which can cause errors when done improperly. The validation layer will catch those errors rather well tough.
D3D12_COMMAND_LIST_TYPE_BUNDLE (Bundle Command Queue???) - To my knowledge this doesn’t exist and this is the reason why I find it weird that they don’t have a separate enum for command queue types. You can try creating a command queue with it but I doubt it works and if it does I don’t see a reason why you want a separate type for a command queue that can only execute command list bundles. Maybe GPU drivers can optimize a bit but that’s just speculation.

Now lets get to the D3D12_COMMAND_QUEUE_FLAGS attribute. This only has 2 flags called D3D12_COMMAND_QUEUE_FLAG_DISABLE_GPU_TIMEOUT and D3D12_COMMAND_QUEUE_FLAG_NONE. The GPU will timeout when a single command list will take to long. If you have a application that renders a single frame every 30 seconds you will run into this feature. For game development it is best to leave it on because if the GPU’s work takes 30 seconds to render something is probably wrong.

We will be using a direct command queue which we will be reusing for copying data and rendering. And will leave the GPU timeout on for now. Creating a command queue takes very little code which is why I won’t make a separate function and just write it inside the tut::InitD3D12 function after device creation.

// TODO: Turn this into a function.

	D3D12_COMMAND_QUEUE_DESC cmd_queue_desc = {};
	cmd_queue_desc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
	cmd_queue_desc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;

Creating the command queue will be easy. Just call the ID3D12Device::CreateCommandQueue method with the self explenatory arguments and voila! We have a command queue.

	HRESULT hr = device->CreateCommandQueue(&cmd_queue_desc, IID_PPV_ARGS(&cmd_queue)); // create the command queue
	if (FAILED(hr))
	throw "Failed to create direct command queue.";

Swap Chain

To avoid flickering between each frame we draw a entire frame into a off-screen texture called the back buffer. When the frame is drawn it is presented to the screen. This prevents the viewer from seeing the frame being drawn. To do this 2 buffers are maintained by the hardware. The back and front buffer. The front buffer stores the data currently displayed while the next frame is being drawn to the back buffer. When the back buffer is ready the back buffer becomes the front buffer and the front buffer becomes the back buffer. This process is called presenting. Presenting is an efficient operation because the pointer of the front and back buffer just needs to get swapped. Here is a image to visualize the process:

Visualization of page flipping by www.yaldex.com

DirectX supports two ways to avoid tearing. Lets start with Back Buffering. When back buffering you render the scene to a off-screen buffer. And the swap chain will only swap when the scene is fully rendered. Thus preventing tearing between frames. The second technique is called vertical retrace (or V-Sync). The following explanation is given by the MS Docs. Its way better than I could do.

We are going to be implementing a back buffering approach. To create the swap chain we will need to describe the back buffers and the swap chain itself. We do this by filling out this descriptor:

typedef struct _DXGI_SWAP_CHAIN_DESC1 {
  UINT             Width;
  UINT             Height;
  DXGI_FORMAT      Format;
  BOOL             Stereo;
  DXGI_SAMPLE_DESC SampleDesc;
  DXGI_USAGE       BufferUsage;
  UINT             BufferCount;
  DXGI_SCALING     Scaling;
  DXGI_SWAP_EFFECT SwapEffect;
  DXGI_ALPHA_MODE  AlphaMode;
  UINT             Flags;
} DXGI_SWAP_CHAIN_DESC1;

Lets break down this structure:

Width, Height - The size of the buffers.
Format - The format of the buffer. I’ll explain the format types after this breakdown.
Stereo - This is used for quad buffering. The term quad buffering means the use of double buffering for each of the left and right eye images in stereoscopic implementations. 4 buffers basically mean we are double buffering 2 for each eye. So if we want to tripple buffer our scene we need to have 6 back buffers.
SampleDesc - Describes the multie sampling properties of the back buffers. DXGI_SAMPLE_DESC.Count is the number of multisamples per pixel and DXGI_SAMPLE_DESC.Quality is the quality of the image ranging from 0 to 1.
BufferUsage - Tell D3D12 how we want to use the buffers. You can find a list of possible usages here. We ofcourse want to use DXGI_USAGE_BACK_BUFFER.
BufferCount - The amount of back buffers we want. 2 = double buffering, 3 = tripple buffering, 4 = quad (2 x double) buffering and etc.
Scaling - Specifies the resize behaviour of the buffers for when the buffer size doesn’t equal the size of the output target. We will be using DXGI_SCALING_STRETCH which is the default. It won’t be relevant in our application since our target will be the same size as our back buffers. You can find the other scaling modes here
SwapEffect - This is a quite a lot to cover so lets do that after I explained the last 2 attributes.
AlphaMode - Allows for blending between the back buffer that is being rendered and presented. We will just use the default DXGI_ALPHA_MODE_UNSPECIFIED.
Flags - There are many flags and you can find the list here. We will just be using DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH since we might want to toggle between windowed and fullscreen.

Image Formats

There are many image formats. But there naming convention is very consistent so it will be easy to get the format you want. For example a common format is DXGI_FORMAT_R32G32B32A32_FLOAT which is a four-component, 128-bit floating-point format that supports 32 bits per channel including alpha. And DXGI_FORMAT_R32G32B32_UINT is a three-component, 96-bit unsigned-integer format that supports 32 bits per color channel. The format we want is DXGI_FORMAT_B8G8R8A8_UNORM which is a four-component, 32-bit unsigned-normalized-integer format that supports 8 bits for each color channel and 8-bit alpha.

Swap Effects

A DXGI_SWAP_EFFECT describes the presentation model that is used by the swap chain and options for handling the contents of the presentation buffer after presenting a surface. Lets go over the different types of swap effects:

DXGI_SWAP_EFFECT_DISCARD - Specify this if you want DXGI to discard the back buffer after you call IDXGISwapChain::Present. This flag will work with multiple buffers although the application only has read and write access to buffer 0. This is the most efficient presentation technique.
DXGI_SWAP_EFFECT_SEQUENTIAL - Use this flag if you want the back buffer’s content to persist after swapping. This swap effect allows you to present the back buffers in order, from the first buffer (buffer 0) to the last buffer. This does not work with multisampling.
FLIP variants - Both discard and sequential have a flip variant which allows for the flip presentation mode. If you want to know exactly what the flip model is and why its good for games I recommend you read this article. After you read that you will hopefully understand why we will be using the DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL flag.

So now we know what swap chains are and how we want to configure them lets start with defining a function called tut::CreateSwapChain.

void CreateSwapChain(IDXGISwapChain4** out_swap_chain);

Lets start implementing the function by filling in the sample descriptor. We don’t want multisampling since our swap effect won’t support it. So quality should be 0 and the number of samples 0.

void CreateSwapChain(IDXGISwapChain4** out_swap_chain) {
	// Describe multisampling capabilities.
	DXGI_SAMPLE_DESC sample_desc = {};
	sample_desc.Count = 1;
	sample_desc.Quality = 0;

	...

Now lets create our DXGI_SWAP_CHAIN_DESC1 and set the width, height, format and sample descriptor.

	// Describe the swap chain
	DXGI_SWAP_CHAIN_DESC1 swap_chain_desc = {};
	swap_chain_desc.Width = WIDTH;
	swap_chain_desc.Height = HEIGHT;
	swap_chain_desc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
	swap_chain_desc.SampleDesc = sample_desc;

To show you how easy it is to change between double and tripple buffering lets define a variable called num_back_buffers in our header and set it to 3.

const static unsigned int num_back_buffers = 3;

	swap_chain_desc.BufferCount = num_back_buffers;
	swap_chain_desc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;

	...

And finally the swap effect, alpha mode and flags.

	swap_chain_desc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
	swap_chain_desc.AlphaMode = DXGI_ALPHA_MODE_UNSPECIFIED;
	swap_chain_desc.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;

	...

Now lets create our swapchain for our Hwnd using the method “IDXGIFactory2::CreateSwapChainForHwnd”. This method is defined as followed:

HRESULT CreateSwapChainForHwnd(
  [in]                 IUnknown                        *pDevice,
  [in]                 HWND                            hWnd,
  [in]           const DXGI_SWAP_CHAIN_DESC1           *pDesc,
  [in, optional] const DXGI_SWAP_CHAIN_FULLSCREEN_DESC *pFullscreenDesc,
  [in, optional]       IDXGIOutput                     *pRestrictToOutput,
  [out]                IDXGISwapChain1                 **ppSwapChain
);

All arguments should be self exlenatory. Except for the fullscreen descriptor and IDXGIOutput. These are currently not very important to us so I will skip over them and pass a nullptr to them.

Since we want a IDXGISwapChain4 and IDXGIFactory2::CreateSwapChainForHwnd gives us a IDXGISwapChain1 we will need to store our swap chain temporarily and cast it to our prevered swap chain type. We also want to store our current back buffer in a integer we define in our header for later use.

	IDXGISwapChain1* temp_swap_chain;
	HRESULT hr = dxgi_factory->CreateSwapChainForHwnd(
		cmd_queue,
		WINDOW_HANDLE,
		&swap_chain_desc,
		NULL,
		NULL,
		&temp_swap_chain
	);
	if (FAILED(hr))
		throw "Failed to create swap chain.";

	out_swap_chain = (IDXGISwapChain4**)&temp_swap_chain;
	frame_index = (*out_swap_chain)->GetCurrentBackBufferIndex();
}

static unsigned int frame_index;

At this point we haven’t actually created the back buffers (Render Target Views). So thats what we will do next. (Here we are going to need dx3d12.h). We will store our back buffers in a heap.

Descriptor Heaps

A descriptor heap is a collection of contiguous allocations of descriptors, one allocation for every descriptor.

Descriptor heaps contain many objects that are not part of the Pipeline State Object (PSO’s will be explained in a future chapter.) such as Render Target Views, Unordered Access Views, Constant Buffer Views and samples. (“Views” is a legacy name for descriptors from D3D11. I am not sure why that naming scheme is still being used in the api.).

The Microsoft docs have a very good explenation on why you would use descriptor heaps. To understand there explenation it is important to know what descriptor tables are. A descriptor table is a array of descriptors. Each descriptor table stores descriptors of one or more types - SRVs, UAVe, CBVs, and Samplers. A descriptor table is not an allocation of memory; it is simply an offset and length into a descriptor heap. Here you can read more about descriptor tables.

To create our heap we need to fill out the D3D12_DESCRIPTOR_HEAP_DESC descriptor. The type should be D3D12_DESCRIPTOR_HEAP_TYPE_RTV (Render Target View), We don’t want any flags so we use D3D12_DESCRIPTOR_HEAP_FLAG_NONE and the number of descriptors should be the number of back buffers we specified for the swap chain. We will skip the node mask since that is only relevant for multi-adapter applications. We will create the render target views inside a function conveniantly called tut::CreateRenderTargetViews.

void CreateRenderTargetViews() {
	D3D12_DESCRIPTOR_HEAP_DESC back_buffer_heap_desc = {};
	back_buffer_heap_desc.NumDescriptors = num_back_buffers;
	back_buffer_heap_desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV;
	back_buffer_heap_desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;

	...

void CreateRenderTargetViews();

Now its time to create the descriptor heap. But first we need some variables to store our heap, descriptor handle increment size and our render target views. We store our render target views inside a ID3D12Resource* array, the descriptor heap a ID3D12DescriptorHeap* and we define the descriptor size as a unsigned integer.

static ID3D12Resource* render_targets[num_back_buffers];
static ID3D12DescriptorHeap* rtv_descriptor_heap;
static unsigned int rtv_descriptor_increment_size;

Creating the heap is again quite simple now all the setup has been finished. We just call CreateDescriptorHeap and pass our descriptor and use IID_PPV_ARGS again to pass our ID3D12DescriptorHeap object. Finally we get our rtv’s descriptor’s increment size using GetDescriptorHandleIncrementSize. The increment size allows us to step through the array of descriptors.

	HRESULT hr = device->CreateDescriptorHeap(&back_buffer_heap_desc, IID_PPV_ARGS(&rtv_descriptor_heap));
	if (FAILED(hr))
		throw "Failed to create descriptor heap.";

	rtv_descriptor_increment_size = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV);

	...

To create our render targets we need a handle to our descriptor. Since the heap is located on the CPU we will be using a structure called CD3DX12_CPU_DESCRIPTOR_HANDLE. We initialize this struct with a D3D12’s native D3D12_CPU_DESCRIPTOR_HANDLE which we will obtain via ID3D12DescriptorHeap::GetCPUDescriptorHandleForHeapStart. This handle will be used to both traverse the array of descriptors and create our RTV’s.

	CD3DX12_CPU_DESCRIPTOR_HANDLE rtv_handle(rtv_descriptor_heap->GetCPUDescriptorHandleForHeapStart());

	...

We need to create the amount of RTV’s specified in num_back_buffers. We will do this with a simple for loop so you can see the difference between 2 and 3 back buffers without having to change any code. Previously I said the swapchain didn’t create the back buffers yet. This is true but the swapchain did allocate memory. We will use IDXGISwapChain::GetBuffer to point our ID3D12Resource*’s to that memory.

	for (unsigned int i = 0; i < num_back_buffers; i++) {
		hr = swap_chain->GetBuffer(i, IID_PPV_ARGS(&render_targets[i]));
		if (FAILED(hr))
			throw "Failed to get swap chain buffer.";

		...

Finally! time to create the back buffers. This is going to be a beeze now we have done all the prep work. Just call ID3D12Device::CreateRenderTargetView and pass the correct ID3D12Resource*, a nullptr for the D3D12_RENDER_TARGET_VIEW_DESC (which will cause D3D12 to create a default descriptor) and our descriptor handle. After we have create a RTV we wan’t to offset our handle before the next RTV is created. We will use CD3DX12_CPU_DESCRIPTOR_HANDLE::Offset for this. We will offset our handle by 1 descriptor using the descriptor increment size.

		device->CreateRenderTargetView(render_targets[i], nullptr, rtv_handle);

		rtv_handle.Offset(1, rtv_descriptor_increment_size);
	}
}

If we present our swapchain something will actually happen. The grey screen will turn black! The IDXGISwapChain::Present method takes 2 arguments. The SyncInterval allows us to enable vertical syncronization (0 = off 1 = on). The second argument is to enable the following flags:

0 - Cancel the remaining time on the previously presented frame and discard this frame if a newer frame is queued.
1-4 - Synchronize presentation for at least n vertical blanks.

Set them to whatever you want. I’ll set them both to 0 since I prefer to be able to show rendering performance. I’ll call present in our tut::Render function. We also need to update our tut::InitD3D12 function.

void Render() {
	swap_chain->Present(0, 0);
}

void InitD3D12() {
	CreateDevice(&dxgi_factory, &device);

	// Create a direct command queue.
	D3D12_COMMAND_QUEUE_DESC cmd_queue_desc = {};
	cmd_queue_desc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
	cmd_queue_desc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;

	HRESULT hr = device->CreateCommandQueue(&cmd_queue_desc, IID_PPV_ARGS(&cmd_queue));
	if (FAILED(hr))
		throw "Failed to create direct command queue.";

	CreateSwapChain(&swap_chain);

	CreateRenderTargetViews();
}

Debugging

If you are using visual studio you can go to Debug->Graphics->Start Graphics Debugging to get detailed information about your program. (Don’t worry its “normal” that this doesn’t work all of the time…) I highly recommend you do some research and play around with it when we have got something rendering on the screen. But there are 2 other debugging tools that we want to use. Both of them are enabled via code.

The first is enabling the debugging layer. This gives you a insanely usefull output when running the program with the VS debugger. If you have any errors they will show up in the Output panel of VS. You shouldn’t have any errors at this point but I recommend trying to make some breaking changes to descriptors to see what the debugging layer is going to say.

The other usefull thing to do is report any live objects before exiting the game. This allows you to detect memory leaks on the GPU. This should output a few leaks in the Output panel since we haven’t done any cleaning yet. We will get the to clean up after we have initialized everyting.

Now lets implement those helpfull tools. We will define 2 functions called EnableDebugLayer and ReportLiveObjects. The implementation is quite unintresting so I won’t go into detail.

void EnableDebugLayer();
void ReportLiveObjects();

void EnableDebugLayer() {
	Microsoft::WRL::ComPtr<ID3D12Debug> debugController;
	if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugController)))) {
		debugController->EnableDebugLayer();
	}
}

void ReportLiveObjects() {
	Microsoft::WRL::ComPtr<IDXGIDebug> dxgiControler;
	if (SUCCEEDED(DXGIGetDebugInterface1(0, IID_PPV_ARGS(&dxgiControler)))) {
		dxgiControler->ReportLiveObjects(DXGI_DEBUG_ALL, DXGI_DEBUG_RLO_FLAGS(DXGI_DEBUG_RLO_DETAIL | DXGI_DEBUG_RLO_IGNORE_INTERNAL));
	} 
}

Lets call those functions in our application’s entry point.

int WINAPI WinMain(HINSTANCE inst, HINSTANCE prev_inst, LPSTR cmd_line, int show_cmd) {

	ALLOC_DEBUG_CONSOLE

	Microsoft::WRL::ComPtr<ID3D12Debug> debugController;
	if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugController)))) {
		debugController->EnableDebugLayer();
	}

	try { WINDOW_HANDLE = sample1::InitWindow(EXAMPLE_NAME, inst, show_cmd, WIDTH, HEIGHT, FULLSCREEN); }
	CATCH_EXCEPTS

	EnableDebugLayer();

	try { InitD3D12(); }
	CATCH_EXCEPTS

	StartLoop(&Init, &Render);

	ReportLiveObjects();

	return 0;
}

Resource Barriers

Before we continue I need to explain Resource Barriers. Resource Barriers exist to reduce overall CPU usage and enable driver multi-threading (thread safety) and pre-processing. An example of per-resource state is whether a texture resource is currently being accessed as through a Shader Resource View or as a Render Target View. In Direct3D 11, drivers were required to track this state in the background. This is expensive from a CPU perspective and significantly complicates any sort of multi-threaded design. In D3D12, most per-resource state is managed by the application with: ID3D12GraphicsCommandList::ResourceBarrier.

The default resource barrier of a render target is D3D12_RESOURCE_STATE_PRESENT. To do stuff like set the render target we need to be in the state D3D12_RESOURCE_STATE_RENDER_TARGET. Transitioning between states is done via the ID3D12GraphicsCommandList::ResourceBarrier method. This method allows for multiple transitions at once so we will need to pass the number of transitions we want to performa and a array of D3D12_RESOURCE_BARRIER ’s. We can initialize D3D12_RESOURCE_BARRIER thanks to d3dx12.h’s CD3DX12_RESOURCE_BARRIER. This struct needs to be inialized with the current render target, the current resource state and the resource state you want to transition to. Here is a example:

CD3DX12_RESOURCE_BARRIER begin_transition = CD3DX12_RESOURCE_BARRIER::Transition(
		render_target,
		D3D12_RESOURCE_STATE_PRESENT,
		D3D12_RESOURCE_STATE_RENDER_TARGET
);

Command Lists (and Allocators)

Now we ofcourse don’t want to have just a boring black screen. We want fancy colors! We do this by setting the clear color. We need to tell the GPU to execute a command for this. To do this we will fill a command list with out command and execute it using the command queue.

To create our command lists we need to create command allocators first. A command allocator allows the app to manage the memory that is allocated for command lists. We can create a allocator via the ID3D12Device::CreateCommandAllocator method. It requires the type of command list you want to use this allocator for and a pointer to the command allocator we want to fill using IID_PPV_ARGS. We have 3 back bufer and we want to be able to record commands into the back buffers while rendering the previous frame. This means we need to create the same number of allocators for the amount of back buffers. This way we don’t need multiple command lists.

Command List Types

The explenation of the different command list types is very similair to the queue types (They even use the same enum) except for the bundle.

D3D12_COMMAND_LIST_TYPE_DIRECT (Direct Command List) - This command list accepts all types of commands.
D3D12_COMMAND_LIST_TYPE_COMPUTE (Compute Command List) - This command list accepts only copy and compute commands. This is intresting for GPGPU programming.
D3D12_COMMAND_LIST_TYPE_COPY (Copy Command List) - This command list only accepts copy commands. This allows you to use a seperate command list for doing intitialization of data. It won’t be any faster since you have to allocate 2 command lists instead of 1 direct command list. But this will preven you from having to deal with reusing command lists which can cause errors when done inproperly. The validation layer will catch those errors rather well tough.
D3D12_COMMAND_LIST_TYPE_BUNDLE (Bundle Command Lis) - A bundle command list allows you to record a bunch of commands into a list and resuse that list over and over again without having to repopulate it. This can save a a lot of performance.

We shall define a function called CreateCommandList in our header and in our implementation of said function we need to loop over the amount of back buffers. For every back buffer we call ID3D12Device::CreateCommandAllocator and give the allocator a name with D3D12CommandAllocator::SetName so we can identify the allocators in the debugger. We also need to define a array of ID3D12CommandAllocator*’s and a ID3D12GraphicsCommandList*.

static ID3D12CommandAllocator** cmd_allocators;
static ID3D12GraphicsCommandList* cmd_list;

void CreateCommandList();

void CreateCommandList() {
	HRESULT hr;

	// Create Allocators
	cmd_allocators = new ID3D12CommandAllocator*[num_back_buffers];
	for (int i = 0; i < num_back_buffers; i++) {
		hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&cmd_allocators[i]));
		if (FAILED(hr)) {
			throw "Failed to create command allocator";
		}

		cmd_allocators[i]->SetName(L"Direct CommandList allocator.");
	}

	...

So now we should be able to create our comand list using ID3D12Device::CreateCommandList. Which is defined as followed:

HRESULT CreateCommandList(
  [in]           UINT                    nodeMask,
  [in]           D3D12_COMMAND_LIST_TYPE type,
  [in]           ID3D12CommandAllocator  *pCommandAllocator,
  [in, optional] ID3D12PipelineState     *pInitialState,
                 REFIID                  riid,
  [out]          void                    **ppCommandList
);

We can set the nodemask to zero again. Like I explained before its for multiple GPU’s. The type should be the most versatile type D3D12_COMMAND_LIST_TYPE_DIRECT. We can pass our first allocator to the allocator indexed at the current frame index in our cmd_allocators array. We switch between command allocators when we reset our command allocators before we fill it again. We can pass a nullptr as the pipeline state since we will cover that in a future paragraph. There is very little reason to pass the pipeline state though because there is almost no performance benefit so I recommend you to pass a nullptr whetever its hard to do with your engine architecture. You will want to pass the pipeline state if your using bundles since this will significantly impact performance. And again the last to arguments can be filled using the fantatic IID_PPV_ARGS. And ofcourse we want to set a name for the command list for debugging purposes.

	// Create the command lists
	HRESULT hr = device->CreateCommandList(
		0,
		D3D12_COMMAND_LIST_TYPE_DIRECT,
		cmd_allocators[frame_index],
		NULL,
		IID_PPV_ARGS(&cmd_list)
	);
	if (FAILED(hr)) {
		throw "Failed to create command list";
	}
	cmd_list->SetName(L"Native Commandlist");
}

Playing With Colors

Now lets fill the command list with a command that sets our render target and sets a clear color for our current render target. Since we don’t any pre-initialization for our command list (For example createing a root signature which doesn’t have to be called every frame) we can just close our cmd list in our game’s tut::Init function. Note that command lists are opened automatically after creation. If you would have filled the command list you should also call CommandQueue::ExecuteCommandLists ofcourse.

void Init() {
	cmd_list->Close();
}

Moving to our tut::Render function the first thing we want to do is reset our allocators and command list. Well this is kinda redundant since we didn’t do any initialization but at least we can’t forget to do this in the future when we will be initializing some stuff. Both the D3D12CommandList and the D3D12CommandAllocator have a ::Reset method which we can call. The D3D12::CommandList::Reset function takes a pipeline state. Just like when creating the command list the performance inpact is neglactable except when using bundles.

We reset the allocator we want to use. Keep in mind this will fail when you are not incrementing (or updating with IDXGISwapChain3::GetCurrentBackBufferIndex which we will be doing just incase something really strange happens and a frame is somehow skipped (Which should be impossible)) the frame index after every D3D12CommandQueue::Present. The resetting of the command list also opens it for recording.

void Render() {
	// Reset command allocators
	HRESULT hr = cmd_allocators[frame_index]->Reset();
	if (FAILED(hr)) {
		throw "Failed to reset cmd allocators";
	}

	// Reset command list
	hr = cmd_list->Reset(cmd_allocators[frame_index], NULL);
	if (FAILED(hr)) {
		throw "Failed to reset command list";
	}

	...

Before we can set our render target we need to transition the resource barrier from D3D12_RESOURCE_STATE_PRESENT to D3D12_RESOURCE_STATE_RENDER_TARGET. We use the render target located at position frame_index in our render_targets array for reasons that should be obvious.

	// Transition to RENDER_TARGET
	CD3DX12_RESOURCE_BARRIER begin_transition = CD3DX12_RESOURCE_BARRIER::Transition(
		render_targets[frame_index],
		D3D12_RESOURCE_STATE_PRESENT,
		D3D12_RESOURCE_STATE_RENDER_TARGET
	);
	cmd_list->ResourceBarrier(1, &begin_transition);

Drum roll… Its time! We can populate our command list! For now we just populate it with 2 commands: ID3D12GraphicsCommandList::OMSetRenderTargets and ID3D12GraphicsCommandList::ClearRenderTargetView (Note that OM in OMSetRenderTargets means Output Merger).

T think its very obvious what ID3D12GraphicsCommandList::OMSetRenderTargets is supposed to do. Incase you have questions check the docs. We need to pass the number of render targets (The amount we wan’t to bind at once so 1 in our case), A handle to our RTV, A boolean (I recommend to check the docs) and a handle to a depth stencil view. We will pass false to the boolean and a nullptr to the DSV since we haven’t created a depth stencil yet.

	// Populate Command List
	CD3DX12_CPU_DESCRIPTOR_HANDLE rtv_handle(rtv_descriptor_heap->GetCPUDescriptorHandleForHeapStart(), frame_index, rtv_descriptor_increment_size);
	cmd_list->OMSetRenderTargets(1, &rtv_handle, false, nullptr);
	...

Now lets call ID3D12GraphicsCommandList::ClearRenderTargetView. The first argument is again a handle to our current RTV, The second argument is a 4 component color array, The third is the amount of rects (0 in our case since we won’t define a rect) and finally a array of rectangles to clear (We will pass a nullptr since I’d like to clear the entire screen). Lets define the color in our header as clear_color.

static float clear_color[4] = { 0.568f, 0.733f, 1.0f, 1.0f };

cmd_list->ClearRenderTargetView(rtv_handle, clear_color, 0, nullptr);

...

Before we execute the command list and present we need to transition back from D3D12_RESOURCE_STATE_RENDER_TARGET to D3D12_RESOURCE_STATE_PRESENT and close the command list.

	// Close and transition the cmd list
	CD3DX12_RESOURCE_BARRIER end_transition = CD3DX12_RESOURCE_BARRIER::Transition(
		render_targets[frame_index],
		D3D12_RESOURCE_STATE_RENDER_TARGET,
		D3D12_RESOURCE_STATE_PRESENT
	);
	cmd_list->ResourceBarrier(1, &end_transition);
	cmd_list->Close();

	...

We can only execute arrays of command lists because the GPU will want all command lists to be executed at the same time for synchronization purposes. We will just create a temporary array and put our command list in it. The first argument of the ID3D12CommandQueue::ExecuteCommandLists method specifies the amount of command lists we want to execute. The second argument is the array itself.

	ID3D12CommandList** cmd_lists = new ID3D12CommandList*[1];
	cmd_lists[0] = cmd_list;
	cmd_queue->ExecuteCommandLists(1, cmd_lists);

	...

Now we can call present and update our frame_index. (Don’t forget to remove or re-use the present call we used to display the black screen. We don’t want to present twice per frame)

	swap_chain->Present(0, 0);

	// Update our frame index
	frame_index = swap_chain->GetCurrentBackBufferIndex();
}

If everything went according to plan you should now be able to run the application and see a beutifull color. The debugger however (unless your GPU finishes its tasks faster than your CPU which is unlikely) won’t be very happy… Our CPU is so incredibly fast that its repopulating command allocators which haven’t been executed yet on the GPU (Remember? The GPU and CPU run in parrallel). A way to fix this is to call Sleep() after every present call. This solution is horrible and I heard the death penality is given to people who do this. That was ofcourse a lie. What isn’t a lie is that I’ll go over some proper methods to solve this synchronization issue in the next chapter.

Synchronization

Because we are working with 2 proccessors running in parrallel we will run into some syncronization issues. To prevent the CPU from executing a new command list before the previous one is finished we need to make the CPU wait for the GPU. We can do this with a fence. When the GPU finishes it will signal the CPU its done via a fence signal. This is not ideal since the CPU idling while its waiting for GPU. There is another solution which allows you to flush the command queue at any point. I probably won’t be able to explain this into much detail since there are many other things to work on before this but its a intresting topic to research in the future.

Fences

A fence is a synchronization construct determined by monotonically periodically a integer value. Fence values are set by the application. A signal operation increases the fence value and a wait operation blocks until the fence has reached the requested value. An event can be fired when a fence reaches a certain value.

These following functions are used to manipulate fence’s:

GetCompletedValue - returns the current value of the fence.
SetEventOnCompletion - causes an event to fire when the fence reaches a given value.
Signal - sets the fence to the given value.

There are two signal methods: ID3D12Fence::Signal which is done one the CPU and ID3D12CommandQuueue which signals from the GPU.

The application is required to increment the fence. This is not done automatically.

Now you understand the basics of fences we can start by defining our ID3D12Fence array for every command list allocator, fence event (HANDLE) and fence values (UINT64) for all our command list allocators.

static ID3D12Fence* fences[num_back_buffers];
static HANDLE fence_event;
static UINT64 fence_values[num_back_buffers];

Lets start by creating our fences and fence event in a function called CreateFences. The creation itself should be fairly simple. We iterate over the number of back buffers and call ID3D12Device::CreateFence for every fence. For the fence event we call CreateFence. We only need 1 event since we would never be waiting for 1 fence at the time. Create fence is not part of D3D12. We can pass to all the nullptr’s or false since we need a very basic event. The first argument of ID3D12Device::CreateFence allows us to set the initial value (We want 0). We don’t really care about multithreading or cross adapter fences right now so we pass D3D12_FENCE_FLAG_NONE to the second argument. And finally we use IID_PPV_ARGS again. Don’t forget to call tut::CreateFences in the tut::InitD3D12 function.

void CreateFences();

void CreateFences() {
	HRESULT hr;

	// create the fences
	for (int i = 0; i < num_back_buffers; i++) {
		hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fences[i]));
		if (FAILED(hr)) {
			throw "Failed to create fence.";
		}  
		fence_values[i] = 0; // set the initial fence value to 0
	}

	// create a handle to a fence event
	fence_event = CreateEvent(nullptr, FALSE, FALSE, nullptr);
	if (fence_event == nullptr) {
		throw "Failed to create fence event.";
	}
}

void InitD3D12() {
	CreateDevice(&dxgi_factory, &device);

	// Create a direct command queue.
	D3D12_COMMAND_QUEUE_DESC cmd_queue_desc = {};
	cmd_queue_desc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
	cmd_queue_desc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT;

	HRESULT hr = device->CreateCommandQueue(&cmd_queue_desc, IID_PPV_ARGS(&cmd_queue));
	if (FAILED(hr))
		throw "Failed to create direct command queue.";

	CreateSwapChain(&swap_chain);
	CreateRenderTargetViews();

	CreateCommandList();
	CreateFences();
}

Now lets write a function called WaitForPreviousFrame that pauses the application until the fence is finished. We can check whetever the fence has finished by comparing ID3D12Fence::GetCompleteValue with our fence value. Before we start idling we need to set a ID3D12Fence::OnCompletionEvent to allow us to stop idling when the fence ready. After we have set that we can use the windows function WaitForSingleObject to pause our application. After we have waited for the fence we need to increase the current fence value. Here is the entire implementation of the function:

void WaitForPrevFrame() {
	if (fences[frame_index]->GetCompletedValue() < fence_values[frame_index]) {
		// we have the fence create an event which is signaled once the fence's current value is "fence_value"
		HRESULT hr = fences[frame_index]->SetEventOnCompletion(fence_values[frame_index], fence_event);
		if (FAILED(hr)) {
			throw "Failed to set fence event.";
		}

		WaitForSingleObject(fence_event, INFINITE);
	}

	// increment fenceValue for next frame
	fence_values[frame_index]++;
}

We can call this function after we have presented our screen and before we have incremented our frame_index value in our render function.

void Render() {
	[Previous code snippets]

	swap_chain->Present(0, 0);

	WaitForPrevFrame();

	// Update our frame index
	frame_index = swap_chain->GetCurrentBackBufferIndex();
}

At this point we have never actually signaled from the GPU. If you would run the application it would get stuck rendering the first frame and eventually Windows will tell you the application has stopped responing.. Lets signal after we have executed our command lists so their is work in the GPU to be processed.

void Render() {
	[Previous code snippets]

	// execute the array of command lists
	ID3D12CommandList** cmd_lists = new ID3D12CommandList*[1];
	cmd_lists[0] = cmd_list;
	cmd_queue->ExecuteCommandLists(1, cmd_lists)

	// GPU Signal
	hr = cmd_queue->Signal(fences[frame_index], fence_values[frame_index]);
	if (FAILED(hr)) {
		throw "Failed to set fence signal.";
	}

	swap_chain->Present(0, 0);

	[Previous code snippets]
}

Now you can remove your Sleep call and try running the application. There shouldn’t be any errors in your debugger except for memory leaks. And Windows won’t think the application stopped responding.

Rendering Triangles

To render a triangle to the screen we need 3 things. A root signature, A pipeline state object and a vertex buffer.

You can find the source for of this paragraph here.

Constant Buffers

I’m going to be throwing around the word constant buffer around and its important to know what I mean with it. A constant buffer is a buffer you can use in a shader per object. You can compare them to variable storage modifiers from GLSL. I’ll go into more detail when we actually need constant buffers.

Root Signature

A root signature links resources the shaders require. There is a Graphics Root Signature and a Compute Root Signature. For now we don’t need to worry about compute root signatures since we won’t do any GPGPU programming. Its important to note that these root signatures are independend from each other.

Root Parameters and Arguments

A root signature requires root parameters to be usefull. A root parameter determines the type of data a shader shoudl expect and does not define the actual memory or data.

The actual values of the root parameters are called root arguments.

Root Constants, Descriptors and Tables

A root signature can contain 3 types of root parameters; root constants, root descriptors and descriptor tables.

Lets start off with the root constants. Root constants are inlined 32-bit values which are accassible in the shader as constant buffers. They are however faster than normal constant buffers. You could use them for the projection view matrix for example. Due to the limited size and the fact they don’t support arrays they are not always the way to go.

Root descriptors are also inlined. They should contain descriptors that are accessed most often. However the descriptors are limited to CBV (Constant Buffer View), UAV (Unoredered Access View) and SRV (Shader Resource View).

Descriptor table are a range of descriptors. You can use them to store any number of descriptors unlike the limited amount of root descriptors you can have. Sadly their is a performanve cost assosiated with them. They hava a extra indirection. The descriptor table points to a descriptor inside a descriptor heap which points to the actual resource data.

Creating Our Root Signature

We don’t actually need a root signature at this moment since we are not going to access anything in our shaders for now. Why we can’t compile our pipeline state without one I don’t know. Lets define a function called CreateRootSignature and a /ID3D12RootSignature* in our header

static ID3D12RootSignature* root_signature;
void CreateRootSignature();

and lets call it from our Init function. (Not InitD3D12 becuase this is game specific)

void Init() {
	CreateRootSignature();

	cmd_list->Close();
}

To create our root signature we need to do three steps: First we describe our root signature using the D3D12_ROOT_SIGNATURE_DESC descriptor. Then we use this descriptor to serialize our root signature (D3D12SerializeRootSignature). And than finally we can use the signature returned from the serialization function to create our root signature using ID3D12Device::CreateRootSignature

Lets start at the beginning; filling in the descriptor. We will use the d3dx12.h version of the descriptor called CD3DX12_ROOT_SIGNATURE_DESC. Its init function takes 6 arguments. A root parameter descriptor, The amount of sampler, sampler descriptor and the shader accessibility. Shader accessibility allows you to disable access to resources. This is intresting because it can safe you some performance. I’ll just skip the sampler since its quite uninstresting and self explenatory (You don’t even need one for this article). We also don’t want any SRV’s RTV’s or UAV’s at the moment since we are just drawing a set of vertices with a hardcoded color in the shader.

void CreateRootSignature() {
	D3D12_STATIC_SAMPLER_DESC sampler[1] = {};
	sampler[0].Filter = D3D12_FILTER_MIN_MAG_MIP_POINT;
	sampler[0].AddressU = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
	sampler[0].AddressV = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
	sampler[0].AddressW = D3D12_TEXTURE_ADDRESS_MODE_BORDER;
	sampler[0].MipLODBias = 0;
	sampler[0].MaxAnisotropy = 0;
	sampler[0].ComparisonFunc = D3D12_COMPARISON_FUNC_NEVER;
	sampler[0].BorderColor = D3D12_STATIC_BORDER_COLOR_TRANSPARENT_BLACK;
	sampler[0].MinLOD = 0.0f;
	sampler[0].MaxLOD = D3D12_FLOAT32_MAX;
	sampler[0].ShaderRegister = 0;
	sampler[0].RegisterSpace = 0;
	sampler[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_PIXEL;

	CD3DX12_ROOT_SIGNATURE_DESC root_signature_desc;
	root_signature_desc.Init(0,
		nullptr,
		1,
		sampler,
		D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT
	);

	...

Now the second step: serielization. With serialized I mean self contained and pointer free. Well store the serialized root signature in a ID3DBlob*. We are creating a root signature of version 1. Since the anniversary update there is a Version alternative for the serialization but to keep it simple I’m going to skip over it. I’ll also pass a nullptr to the error argument. If you wan’t to catch this error you can use a ID3DBlob*.

	ID3DBlob* signature;
	ID3DBlob* error = nullptr;
	HRESULT hr = D3D12SerializeRootSignature(&root_signature_desc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr);
	if (FAILED(hr)) {
		throw "Failed to create a serialized root signature";
	}

	...

The third and final step should be straightforward. Now we have the serialized root signature we can call ID3D12Device::CreateRootSignature which is defined as followed:

HRESULT CreateRootSignature(
  [in]        UINT   nodeMask,
  [in]  const void   *pBlobWithRootSignature,
  [in]        SIZE_T blobLengthInBytes,
              REFIID riid,
  [out]       void   **ppvRootSignature
);

We can again pass zero to the node mask since we are not using SLI or Crossfire. Ofcourse we pass the blob to the 2e argument, the size of the blob in the third and its time for IID_PPV_ARGS again. Will name the root signature for debugging puproses as mentioned before.

	hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&root_signature));
	if (FAILED(hr)) {
		throw "Failed to create root signature";
	}
	root_signature->SetName(L"Our epic D3D12RootSignature");
}

Pipeline State Object

When geometry is submitted to the GPU for drawing, there are a lot of hardware settings that determine how the input data is interpreted and rendered. All these settings are called the graphics pipeline state. These settings: rasterizer state, blend state, depth stencil state, primitive topology and all types shaders. The graphics pipeline state is set by using a pipeline state object (PSO).

You can create a near infinite amount of PSO’s. Creation is mainly done during initialization. Creating them can be rather expensive and will impact your performance greatly. Luckily you can switch at runtime between PSO’s. The switching is done by the command lists using the ID3D12GraphicsCommandList::SetPipelineState method. This change still costs some performance so if you have for example 6 PSO’s you can create a command list for each and gain a small amount but not irrelevant amount of performance.

A intresting bit of information provided my the Microsoft Doc’s is why they decided to use PSO’s to set the graphics pipeline state instead of setting the different settings using for example ID3D11DeviceContext::OMSetBlendState. You can find it here. Its also intresting to know there there are still some graphics pipeline state settings that are set in the “old” (There are obvious reason to why they are not included in the PSO) D3D11 way. Look for example at ID3D12GraphicsCommandList::OMSetRenderTargets

Creating Our Pipline State Object

Describing the Pipeline State Object

I have defined a function called CreatePSO and defined the following variable: ID3D12PipelineState* pipeline;.

static ID3D12PipelineState* pipeline;

Just like every other time we have created something we need a descriptor. The descriptor of Pipeline State Object is called D3D12_GRAPHICS_PIPELINE_STATE_DESC. It takes 4 “sub descriptors” and a bunch of other settings. The 4 sub descriptors are D3DX12_BLEND_DESC, CD3DX12_DEPTH_STENCIL_DESC, D3DX12_RASTERIZER_DESC and DXGI_SAMPLE_DESC. We will initialize all of them using the defaults specified in d3dx12.h except for the sample descriptor. We will set the number of samples to 1 and the quallity to 0 because we don’t want multi sampling. (Our flip type won’t even support it).

void CreatePSO() {
	D3D12_BLEND_DESC blend_desc = CD3DX12_BLEND_DESC(D3D12_DEFAULT);
	D3D12_DEPTH_STENCIL_DESC depth_stencil_state = CD3DX12_DEPTH_STENCIL_DESC(D3D12_DEFAULT);
	D3D12_RASTERIZER_DESC rasterize_desc = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT);
	DXGI_SAMPLE_DESC sampleDesc = {1, 0};

	...

Now we need a array of D3D12_INPUT_ELEMENT_DESC’s. We use this descriptors to desribe what data will be pased to the shaders. In newer DX12 versions you don’t have to do this but I have no way to test it because my laptop is to old. The descriptor is defined like this:

typedef struct D3D12_INPUT_ELEMENT_DESC {
  LPCSTR                     SemanticName;
  UINT                       SemanticIndex;
  DXGI_FORMAT                Format;
  UINT                       InputSlot;
  UINT                       AlignedByteOffset;
  D3D12_INPUT_CLASSIFICATION InputSlotClass;
  UINT                       InstanceDataStepRate;
} D3D12_INPUT_ELEMENT_DESC;

The members should be self exlenatory and the once that aren’t we won’t use. If your intrested in those members check out the docs. Lets create a descriptor with “POSITION” as the sementic name, DXGI_FORMAT_R32G32B32_FLOAT as the format and D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA. The rest of the members we can leave at zero.

	std::vector<D3D12_INPUT_ELEMENT_DESC> inputs = {
		{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
	};

	...

HLSL Shaders

Before we can fill out the pipeline’s descriptor we need to load some shaders. We will use the D3DCompileFromFile function provided by D3Dcompiler.h. I won’t go into detail here except that “vs_5_0” specifies the shader type and vesion and “main” specifies the shader entry point.

// TODO: Move to seperate function.

	ID3DBlob* vs;
	ID3DBlob* error;
	HRESULT hr = D3DCompileFromFile(L"vertex.hlsl",
		nullptr,
		nullptr,
		"main",
		"vs_5_0",
		D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION,
		0,
		&vs,
		&error);
	if (FAILED(hr)) {
		throw((char*)error->GetBufferPointer());
	}

	ID3DBlob* ps; // d3d blob for holding vertex shader bytecode
	hr = D3DCompileFromFile(L"pixel.hlsl",
		nullptr,
		nullptr,
		"main",
		"ps_5_0",
		D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION,
		0,
		&ps,
		&error);
	if (FAILED(hr)) {
		throw((char*)error->GetBufferPointer());
	}

	D3D12_SHADER_BYTECODE vs_bytecode = {};
	vs_bytecode.BytecodeLength = vs->GetBufferSize();
	vs_bytecode.pShaderBytecode = vs->GetBufferPointer();

	D3D12_SHADER_BYTECODE ps_bytecode = {};
	ps_bytecode.BytecodeLength = ps->GetBufferSize();
	ps_bytecode.pShaderBytecode = ps->GetBufferPointer();

Now we can finally fill in our pipeline’s descriptor! D3D12_GRAPHICS_PIPELINE_STATE_DESC is defined as followed:

typedef struct D3D12_GRAPHICS_PIPELINE_STATE_DESC {
  ID3D12RootSignature                *pRootSignature;
  D3D12_SHADER_BYTECODE              VS;
  D3D12_SHADER_BYTECODE              PS;
  D3D12_SHADER_BYTECODE              DS;
  D3D12_SHADER_BYTECODE              HS;
  D3D12_SHADER_BYTECODE              GS;
  D3D12_STREAM_OUTPUT_DESC           StreamOutput;
  D3D12_BLEND_DESC                   BlendState;
  UINT                               SampleMask;
  D3D12_RASTERIZER_DESC              RasterizerState;
  D3D12_DEPTH_STENCIL_DESC           DepthStencilState;
  D3D12_INPUT_LAYOUT_DESC            InputLayout;
  D3D12_INDEX_BUFFER_STRIP_CUT_VALUE IBStripCutValue;
  D3D12_PRIMITIVE_TOPOLOGY_TYPE      PrimitiveTopologyType;
  UINT                               NumRenderTargets;
  DXGI_FORMAT                        RTVFormats[8];
  DXGI_FORMAT                        DSVFormat;
  DXGI_SAMPLE_DESC                   SampleDesc;
  UINT                               NodeMask;
  D3D12_CACHED_PIPELINE_STATE        CachedPSO;
  D3D12_PIPELINE_STATE_FLAGS         Flags;
} D3D12_GRAPHICS_PIPELINE_STATE_DESC;

Lets break it down.

pRootSignature - A pointer to the root signature object.
VS - Describes the vertex shader.
PS - Describes the pixel shader
DS - Describes the domain shader
HS - Describes the hull shader
GS - Describes the geometry shader
StreamOutput - A D3D12_STREAM_OUTPUT_DESC structure that describes a streaming output buffer.
BlendState - A descriptor that desribes the blend state.
SampleMask - The sample mask for the blend state.
RasterizerState - A descriptor that desribes the rasterizer state.
DepthStencilState - A descriptor that describes the depth stencil state.
InputLayout - A descriptor that describes the input-buffer data for the input-assembler stage.
IBStripCutValue - Specifies the properties of the index buffer in a D3D12_INDEX_BUFFER_STRIP_CUT_VALUE structure.
PrimitiveTopologyType - Specifies the primitive topology type.
NumRenderTargets - The amount of render targets.
RTVFormats - An array of DXGI_FORMAT-typed values for the render target formats.
DSVFormats - A DXGI_FORMAT-typed value for the depth-stencil format.
SampleDesc - A descriptor that describes the multi-sampling settings.
NodeMask - For single GPU operation, set this to zero. If there are multiple GPU nodes, set bits to identify the nodes.
CachedPSO - A cached pipeline state object, as a D3D12_CACHED_PIPELINE_STATE structure.

Thats quite a lot to fill in but I hope you’ll be able to do most of it youself so I won’t go over every desicion I made.

	D3D12_GRAPHICS_PIPELINE_STATE_DESC pso_desc = {};
	pso_desc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
	pso_desc.RTVFormats[0] = DXGI_FORMAT_B8G8R8A8_UNORM;
	pso_desc.DSVFormat = DXGI_FORMAT_D32_FLOAT;
	pso_desc.SampleDesc = sampleDesc;
	pso_desc.SampleMask = 0xffffffff;
	pso_desc.RasterizerState = rasterize_desc;
	pso_desc.BlendState = blend_desc;
	pso_desc.DepthStencilState = depth_stencil_state;
	pso_desc.NumRenderTargets = 1;
	pso_desc.pRootSignature = root_signature;
	pso_desc.VS = vs_bytecode;
	pso_desc.PS = ps_bytecode;
	pso_desc.InputLayout.NumElements = inputs.size();
	pso_desc.InputLayout.pInputElementDescs = &inputs[0];

Now finally lets call ID3D12Device::CreateGraphicsPipelineState and give our pipeline a name!

	hr = device->CreateGraphicsPipelineState(&pso_desc, IID_PPV_ARGS(&pipeline));
	if (FAILED(hr)) {
		throw "Failed to create graphics pipeline";
	}
	pipeline->SetName(L"My sick pipeline object");

And lets call it from our Init function after our root signature:

void Init() {
	CreateRootSignature();
	CreatePSO();

	cmd_list->Close();
}

Defining Where To Draw

Viewport

The viewport specifies the area of the render target which we will draw to. So ofcourse we will need one. We can define our viewport by settings 6 values: top left X, top left Y, width, height, near Z and far Z.

The top left X and Y are relative to the top left of the render target. The width and height define the right and bottom of the viewport. Finally the near Z and far Z define the Z range of the scene to be drawn. Anything outside this range will not be drawn.

The viewport converts the view space to screen space, where screen space is in pixels, and view space is between -1.0 to 1.0 from left to right, and from 1.0 to -1.0 from top to bottom. We will define our vertex positions in screen space.

Scissor Rectangle

The scissor rectangle specifies the area which we will be drawing to. Any pixels outside this area won’t be drawn.

The scissor rect has only four values to define: left, right, top and bottom. These coordinates are relative to the top left of the render target.

Lets start implementing our viewport and scissor rect by defining some new variables and functions:

static D3D12_VIEWPORT viewport;
static D3D12_RECT scissor_rect;

...

void CreateViewport();

Setting the values will be a breeze:

void CreateViewport() {
	// Define viewport.
	viewport.TopLeftX = 0;
	viewport.TopLeftY = 0;
	viewport.Width = WIDTH;
	viewport.Height = HEIGHT;
	viewport.MinDepth = 0.0f;
	viewport.MaxDepth = 1.0f;

	// Define scissor rect
	scissor_rect.left = 0;
	scissor_rect.top = 0;
	scissor_rect.right = WIDTH;
	scissor_rect.bottom = HEIGHT;
}

Vertex Buffers

I assume you are familiar with vertex buffers and how they work in general. We will need 2 ID3D12Resource* objects. One for staging and one to actually hold the data. So lets define those buffers. Besides those 2 buffers we need a D3D12_VERTEX_BUFFER_VIEW variable which we will use as a handle for the vertex buffer and describe the vertex buffer. We will also need a function called CreateVertexBuffer.

static ID3D12Resource* vertex_buffer;
static int vertex_buffer_size;
static D3D12_VERTEX_BUFFER_VIEW vertex_buffer_view;

void CreateVertexBuffer();

Lets start by creating the normal buffer. We want a D3D12_RESOURCE_STATE_COPY_DEST type buffer because we will be copying data from the upload (staging) buffer to here.

void CreateVertexBuffer(std::vector<glm::vec3> vertices) {
	vertex_buffer_size = sizeof(vertices);

	device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(vertex_buffer_size),
		D3D12_RESOURCE_STATE_COPY_DEST,
		nullptr,
		IID_PPV_ARGS(&vertex_buffer));

	vertex_buffer->SetName(L"Vertex Buffer Resource Heap");


	...

The upload (staging) buffer is created in a very similair fashion but except for the buffer type which will use D3D12_RESOURCE_STATE_GENERIC_READ and we want the heap to be of the type D3D12_HEAP_TYPE_UPLOAD because that is all this buffer will do.

	ID3D12Resource* vb_upload_heap;
	device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(vertex_buffer_size),
		D3D12_RESOURCE_STATE_GENERIC_READ,
		nullptr,
		IID_PPV_ARGS(&vb_upload_heap));
		
	vb_upload_heap->SetName(L"Vertex Buffer Upload Resource Heap");

	...

We ofcourse need to tell the gpu what kind of data we will be uploading. We use D3D12_SUBRESOURCE_DATA for that.

	D3D12_SUBRESOURCE_DATA vertex_data = {};
	vertexData.pData = reinterpret_cast<BYTE*>(vertices);
	vertexData.RowPitch = size;
	vertexData.SlicePitch = size;

	...

Now we need to copy the data from our upload buffer into our normal buffer. We will ofcourse need a barrier transition to do this.

	UpdateSubresources(cmd_list, buffer, buffer_upload, 0, 0, 1, &vertexData);
	cmd_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(buffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER));

	...

And finally we need to initialize the members of our D3D12_VERTEX_BUFFER_VIEW.

	vertex_buffer_view.BufferLocation = vertex_buffer->GetGPUVirtualAddress();
	vertex_buffer_view.StrideInBytes = sizeof(Vertex);
	vertex_buffer_view.SizeInBytes = vertex_buffer_size;

Update Command Lists

Now we just need to set the root signature than the graphics pipline followed by the vertex buffer and finally we can call DrawInstanced to draw our object.

...
// Set Viewport & Sissor Rect
cmd_list->RSSetViewports(1, &viewport); // set the viewports
cmd_list->RSSetScissorRects(1, &scissor_rect); // set the scissor rects

cmd_list->SetGraphicsRootSignature(root_signature);
cmd_list->SetPipelineState(pipeline);
cmd_list->IASetVertexBuffers(0, 1, buffer_view);
cmd_list->DrawInstanced(1, 3, 0, 0);
...

And their we have it. A beautiful triangle:

Depth and Stencil Testing

As I explained before, The output merger may reject fragments from the pixel shader based on the depth and stencil tests. The remaining pixels will be drawn to the back buffer.

The Depth Test

When depth testing objects closer to the camera will appear infront of other objects. If you don’t use depth testing anything that is drawn will apear infront of the object drawn before. You could sort the objects based on the distance of the camera. This may actually be fater in some circumstances. But this won’t work when objects are overlapping or if you render a model with some nice phong shading and back face culling disabled the object will render quite strangly because you might actually be looking straight through a object. (The object will apear inside-out at some parts)

The Stencil Test

The stencil test allows you discard pixels based on a pattern. Lets say we want a checkerboard transition between 2 scenes. To do this we could animate the stencil values to represent a checkerboard. The values are either 0 or 1. I can’t say whether 0 or 1 will reject the pixel. This depends on your comparison function.

The Depth / Stencil Buffer

The depth and stencil values are stores in the so called depth/stencil buffer. The depth/stencil buffer is often created with the format DXGI_FORMAT_D24_UNORM_S8_UINT. 24 bits will be used for the depth values and the remaining 8 bits for the stencil values.

I however don’t care about the stencil test so in this article I’ll be making just a depth buffer. This means we can use 32 bits for depth so we will use the format DXGI_FORMAT_D32_FLOAT.

Creating The Depth Stencil Buffer

For this example I will only cover how to create and use the depth buffer. To create and use a depth buffer we will need to define 2 things: A ID3D12DescriptorHeap to hold the view into the depth stencil resource and a ID3D12Resource. To create these we will define 2 functions: tut::CreateDepthStencilHeap and tut::CreateDepthStencilBuffer. We will call those functions after tut::CreateSwapchain in the tut::InitD3D12 function.

static ID3D12Resource* depth_stencil_buffer;
static ID3D12DescriptorHeap* depth_stencil_heap;

...

void CreateDepthStencilHeap();
void CreateDepthStencilBuffer();

void InitD3D12() {
	
	...

	CreateSwapChain(&swap_chain);

	CreateDepthStencilHeap();
	CreateDepthStencilBuffer();

	...
}

Lets start with tut::CreateDepthStencilHeap. Just like with the render target views we will use ID3D12Device::CreateDescriptorHeap and a D3D12_DESCRIPTOR_HEAP_DESC to create the heap. However since we are only creating 1 buffer we set our number of descriptors to 1 and we have to change the type to D3D12_DESCRIPTOR_HEAP_TYPE_DSV (Our heap will hold views into the depth stencil buffer). Our tut::CreateDepthStencilHeap function will look like this:

void CreateDepthStencilHeap() {
	D3D12_DESCRIPTOR_HEAP_DESC heap_desc = {};
	heap_desc.NumDescriptors = 1;
	heap_desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV;
	heap_desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
	HRESULT hr = device->CreateDescriptorHeap(&heap_desc, IID_PPV_ARGS(&depth_stencil_heap));

	if (FAILED(hr)) {
		throw "Failed to create descriptor heap for the depth stencil buffer";
	}
}

Now we are able to create the actual depth stencil resource. We will use ID3D12Device::CreateCommittedResource with a D3D12_CLEAR_VALUE as optimized clear value. We have used this function before so I won’t go over it again.

Unlike when creating render targets the optimized clear value isn’t a color. Now it just holds the format, depth and stencil values. As menitoned before we will use the format DXGI_FORMAT_D32_FLOAT because we don’t care about the stencil test. This means the depth value should be 1 and the stencil value 0. Just like with the render target views we will use the helper function &CD3DX12_RESOURCE_DESC::Tex2D provided by d3dx12.h to make it easier to create our resource descriptor. We will use the width and height of the window and we will pass the flag D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL so we can use it as a depth stencil resource. The other parameters should be self explenatory.

void CreateDepthStencilBuffer() {
	D3D12_CLEAR_VALUE optimized_clear_value = {};
	optimized_clear_value.Format = DXGI_FORMAT_D32_FLOAT;
	optimized_clear_value.DepthStencil.Depth = 1.0f;
	optimized_clear_value.DepthStencil.Stencil = 0;

	HRESULT hr = device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Tex2D(DXGI_FORMAT_D32_FLOAT, WIDTH, HEIGHT, 1, 0, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL),
		D3D12_RESOURCE_STATE_DEPTH_WRITE,
		&optimized_clear_value,
		IID_PPV_ARGS(&depth_stencil_buffer)
	);
	if (FAILED(hr)) {
		throw "Failed to create commited resource.";
	}
	depth_stencil_heap->SetName(L"Depth/Stencil Buffer");

	...

Now we we have the resource but we are still missing our view into the resource. This is what we will do next and we will do this inside the same function. To create our view we will use the function ID3D12Device::CreateDepthStencilView method which is defined like this:

void CreateDepthStencilView(
  [in, optional]       ID3D12Resource                *pResource,
  [in, optional] const D3D12_DEPTH_STENCIL_VIEW_DESC *pDesc,
  [in]                 D3D12_CPU_DESCRIPTOR_HANDLE   DestDescriptor
);

Should be quite easy right? We will pass our depth_stencil_buffer to pResource and since we only have 1 depth buffer we will just use D3D12DescriptorHeap::GetCPUDescriptorHandleForHeapStart for DestDescriptor.

D3D12_DEPTH_STENCIL_VIEW_DESCs implementation looks like this:

typedef struct D3D12_DEPTH_STENCIL_VIEW_DESC {
  DXGI_FORMAT         Format;
  D3D12_DSV_DIMENSION ViewDimension;
  D3D12_DSV_FLAGS     Flags;
  union {
    D3D12_TEX1D_DSV         Texture1D;
    D3D12_TEX1D_ARRAY_DSV   Texture1DArray;
    D3D12_TEX2D_DSV         Texture2D;
    D3D12_TEX2D_ARRAY_DSV   Texture2DArray;
    D3D12_TEX2DMS_DSV       Texture2DMS;
    D3D12_TEX2DMS_ARRAY_DSV Texture2DMSArray;
  };
} D3D12_DEPTH_STENCIL_VIEW_DESC;

We can completely forget about the union. The format should of-course be DXGI_FORMAT_D32_FLOAT and the view dimension D3D12_DSV_DIMENSION_TEXTURE2D. Finally we don’t need any flags so we use D3D12_DSV_FLAG_NONE.

	D3D12_DEPTH_STENCIL_VIEW_DESC view_desc = {};
	view_desc.Format = DXGI_FORMAT_D32_FLOAT;
	view_desc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2D;
	view_desc.Flags = D3D12_DSV_FLAG_NONE;

	device->CreateDepthStencilView(depth_stencil_buffer, &view_desc, depth_stencil_heap->GetCPUDescriptorHandleForHeapStart());
}

Binding The Depth Stencil Buffer

We are obviously not using our depth stencil buffer yet. We will need to bind it first. Do you remember when we called ID3D12GraphicsCommandList::OMSetRenderTargets? You might have noticed one of the parameters of that function is a CPU descriptor callled “pDepthStencilDescriptor”. Lets go back to that call inside our tut::Render function. Now lets get the handle to our view into the depth buffer and pass it to the ID3D12GraphicsCommandList::OMSetRenderTargets function.

Just like the render target views we need to clear them every frame. We use ID3D12GraphicsCommandList::ClearDepthStencilView for that and it defined as followed:

void ClearDepthStencilView(
  [in]       D3D12_CPU_DESCRIPTOR_HANDLE DepthStencilView,
  [in]       D3D12_CLEAR_FLAGS           ClearFlags,
  [in]       FLOAT                       Depth,
  [in]       UINT8                       Stencil,
  [in]       UINT                        NumRects,
  [in] const D3D12_RECT                  *pRects
);

We pritty much already decided what everything should be. The clear flag is D3D12_CLEAR_FLAG_DEPTH because we don’t have a stencil element in our buffer. The depth value should be 1 and the stencil 0. And we don’t have a specific area we want to clear.

The part of the render function where we set the render target and depth stencil buffer and clear both of them should now look similar to this:

void Render() {
	...

	CD3DX12_CPU_DESCRIPTOR_HANDLE rtv_handle(rtv_descriptor_heap->GetCPUDescriptorHandleForHeapStart(), frame_index, rtv_descriptor_increment_size);
	CD3DX12_CPU_DESCRIPTOR_HANDLE dsv_handle(depth_stencil_heap->GetCPUDescriptorHandleForHeapStart());
	cmd_list->OMSetRenderTargets(1, &rtv_handle, false, &dsv_handle);
	cmd_list->ClearRenderTargetView(rtv_handle, clear_color, 0, nullptr);
	cmd_list->ClearDepthStencilView(dsv_handle, D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr);

	...
}

And that is it! We now have a working depth stencil buffer. If you’d run the program now you shouldn’t see any difference. However if you’d render a second triangle behind or infront of the first one you should be able to determine whether the depth buffer is working.

Indexed Rendering

Indeced rendering is going to be a breeze. We are going to be render a rectangle instead of a triangle in this chapter! I don’t think it is nessessary to ecplain what indexed rendering is so I’ll get straight to the meat of it.

We only need 3 new variables. A index buffer(ID3D12Resource*) and a view to the index buffer (D3D12_INDEX_BUFFER_VIEW) and a unsigned int to store the buffer’s size just like the vertex buffer.

static ID3D12Resource* index_buffer;
static int index_buffer_size;
static D3D12_INDEX_BUFFER_VIEW index_buffer_view;

We are actually going to create the index buffer in the same way we created the vertex buffer. We do need to change the resource transition’s target from D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER to D3D12_RESOURCE_STATE_INDEX_BUFFER The view definition also changes since its now of the type D3D12_INDEX_BUFFER_VIEW instead of D3D12_VERTEX_BUFFER_VIEW. Instead of a stride viarable we can now specify a format. We will specify the format as DXGI_FORMAT_R32_UINT.

The entire tut::CreateIndexBuffer will look like this:

void CreateIndexBuffer() {
	// Rectangle Vertices
	uint32_t indices[] = {
		0, 1, 2, // first triangle
		0, 3, 1, // second triangle
	};

	index_buffer_size = sizeof(indices);

	device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(index_buffer_size),
		D3D12_RESOURCE_STATE_COPY_DEST,
		nullptr,
		IID_PPV_ARGS(&index_buffer));

	index_buffer->SetName(L"Index Buffer Resource Heap");

	ID3D12Resource* ib_upload_heap;
	device->CreateCommittedResource(
		&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
		D3D12_HEAP_FLAG_NONE,
		&CD3DX12_RESOURCE_DESC::Buffer(index_buffer_size),
		D3D12_RESOURCE_STATE_GENERIC_READ,
		nullptr,
		IID_PPV_ARGS(&ib_upload_heap));
	ib_upload_heap->SetName(L"Index Buffer Upload Resource Heap");

	// store index buffer in upload heap
	D3D12_SUBRESOURCE_DATA index_data = {};
	index_data.pData = reinterpret_cast<BYTE*>(indices);
	index_data.RowPitch = index_buffer_size;
	index_data.SlicePitch = index_buffer_size;

	UpdateSubresources(cmd_list, index_buffer, ib_upload_heap, 0, 0, 1, &index_data);

	// transition the index buffer data from copy destination state to index buffer state
	cmd_list->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(index_buffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_INDEX_BUFFER));

	// create a index buffer view for the rectangle. We get the GPU memory address to the index buffer using the GetGPUVirtualAddress() method
	index_buffer_view.BufferLocation = index_buffer->GetGPUVirtualAddress();
	index_buffer_view.SizeInBytes = index_buffer_size;
	index_buffer_view.Format = DXGI_FORMAT_R32_UINT;
}

The other difference from the vertex buffer is the function we use to bind it. This function is ID3D12GraphicsCommandList::IASetIndexBuffer. So lets add that to our command list in the render function.

void Render() {
	...

	cmd_list->IASetVertexBuffers(0, 1, &vertex_buffer_view);
	cmd_list->IASetIndexBuffer(&index_buffer_view);

	cmd_list->DrawInstanced(3, 1, 0, 0);

	...
}

and finally we need to replace the function we used to draw with ID3D12GraphicsCommandList::DrawIndexedInstanced. We will now be drawing 2 triangles to create a rectangle. This means we now need to draw 6 vertices instead of 3.

void Render() {
	...

	cmd_list->DrawIndexedInstanced(6, 1, 0, 0, 0);

	...
}

And now we have a pretty rectangle:

Constant Buffers

I briefly explained constant buffers before. This time however I am going to explain them in more detail and we will be implementing them.

Versioning

Versioning is a term for when data that is bound to the pipeline needs to be updated multiple times in a single command list. We need to version the constant buffer so we can update the buffer while making sure the previous data isn’t lost.

We get free versioning with root constants and root descriptors. When data changes D3D12 will create a complete copy of the root signature. Thus making sure the data previously set is still available.

Since we don’t have a lot of data we want to access from the shader a descriptor table would be overkill. A root constant would be to small. So we will use a root descriptor initialized as a constant buffer view.

The constant buffer data is stored in a resource heap. Even though descriptor constants have free versioning we still need to version our resource heap. We do this by creating a resource heap for every back buffer.

Constant Buffer Alignment

The size of resource heaps must be a multiple of 64KB for single textures or buffers.

Constant buffers are stored at 256 byte offsets from the beginning of a resource heap. When you set a root descriptor you give it the memory location of the data you want to use. That memory address must be the start of the resource heap plus a multiple of 256 byte offset. This offset means if we have 2 constant buffers. This problem can be solved by padding the constant buffer to be 256 byte aligned.

Implementation

Lets start again by defining some variables. We decided we need multiple resource heaps for versioning so lets define a array of resource heaps with the size of the number of backbuffers. We will also need an array of UINT8’s’ of the same size. We will need those to store the address of your constant buffer which we will need to update the data stored in the constant buffer.

static ID3D12Resource* const_buffers[3];
static UINT8* const_buffer_adresses[3];

Now we need to create a struct defining the data we want to use in our shader. For now we will just want a float4 specifying the color of our quad.

struct ConstantBufferObject {
	std::array<float, 4> color;
};

Now lets define and implement a function to create our constant buffers and map our constant buffers to our adresses. We will call ID3D12Device::CreateCommittedResource to create the resource. We will specify the heap properties as default using the helper function provided by d3dx12.h. We don’t need any flags. We use CD3DX12_RESOURCE_DESC to provide the resource descriptor and we only need to specify the size which is required to be 256-byte aligned. The resource state should be D3D12_RESOURCE_STATE_GENERIC_READ because the GPU will read the data. We obviously don’t need a clear value and we can use IID_PPV_ARGS again to specify the output of the function.

After we have created the 3 constant buffers we need to map them to our addresses. We will call ID3D12Resource::Map for this. The first argument of this function is the sub-resource index which is in our case 0. The second argument is the read range. We can specify a range via CD3DX12_RANGE. The beginning and end of the range should be 0 because we are not reading on the CPU. So this is what the code should look like:

...

static ID3D12Resource* const_buffers[3];
static UINT8* const_buffer_adresses[3];

...

void CreateConstantBuffers();

void CreateConstantBuffers() {
	unsigned int mul_size = (sizeof(ConstantBufferObject) + 255) & ~255;

	for (unsigned int i = 0; i < 3; ++i) {
		HRESULT hr = device->CreateCommittedResource(
			&CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
			D3D12_HEAP_FLAG_NONE,
			&CD3DX12_RESOURCE_DESC::Buffer(mul_size), buffers
			D3D12_RESOURCE_STATE_GENERIC_READ,
			nullptr.
			IID_PPV_ARGS(&const_buffers[i]));
		const_buffers[i]->SetName(L"Constant Buffer Upload Resource Heap");
		if (FAILED(hr)) {
			throw "Failed to create constant buffer resource";
		}

		CD3DX12_RANGE readRange(0, 0);
		hr = const_buffers[i]->Map(0, &readRange, reinterpret_cast<void**>(&const_buffer_adresses[i]));
		if (FAILED(hr)) {
			throw "Failed to map constant buffer";
		}
	}
}

And we should not forget to call tut::CreateConstantBuffers in the tut::InitD3D12 function.

void InitD3D12() {
	...

	CreateCommandList();
	CreateConstantBuffers();

	...
}

For this tutorial I want the colors of the quad to change over time so we will need to update 1 of our constant buffers every frame. We will do this at the beginning of our tut::Render function. We just create a object of the type ConstantBufferObject and set the data. We copy the data to the buffer using memcpy. The source of the memcpy call should obviously be the ConstantBufferObject object and the target should be the constant buffer address we saved earlier. This is how the beginning of the tut::Render function now looks:

void Render() {
	auto now = std::chrono::high_resolution_clock::now();
	float time = std::chrono::duration<double>(now - app_start_time).count();
	ConstantBufferObject b = { { sin(time), cos(time), 1, 1 } };
	memcpy(const_buffer_adresses[frame_index], &b, sizeof(b));

	...
}

Before we can bind the constant buffer we have to update our root signature and shader. Before we had no root parameters but now we need a root descriptor. This is quite easy thanks to d3dx12.h. We will create a array of CD3DX12_ROOT_PARAMETER’s and initialize the first element as a constant buffer view which is only visible to the pixel shader. We will also need to update the CD3DX12_ROOT_SIGNATURE_DESC::Init function by telling it which parameters to use and how many we have. This is the updated tut::CreateRootSignature function:

void CreateRootSignature() {
	...

	std::array<CD3DX12_ROOT_PARAMETER, 1> parameters;
	parameters[0].InitAsConstantBufferView(0, 0, D3D12_SHADER_VISIBILITY_PIXEL);

	CD3DX12_ROOT_SIGNATURE_DESC root_signature_desc;
	root_signature_desc.Init(parameters.size(),
		parameters.data(), // a pointer to the beginning of our root parameters array
		1,
		sampler,
		D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);

	...
}

For the shader we need to define the constant buffer and we should return the new color instead of the hardcoded color.

cbuffer ConstantBuffer : register(b0) {
	float4 color;
};

float4 main() : SV_TARGET {
    return color;
}

The only thing left is binding the constant buffer in the tut::Render function after we set the root signature. We do this with the ID3D12GraphicsCommandList::SetGraphicsRootConstantBufferView method. The root parameter index is 0 and the buffer location is the GPU virtual address of the current constant buffer.

void Render() {
	...

	cmd_list->SetGraphicsRootSignature(root_signature);
	cmd_list->SetGraphicsRootConstantBufferView(0, const_buffers[frame_index]->GetGPUVirtualAddress());

	...
}

And that’s it! We now use a constant buffer to tell the shader what color the quad should be. Here is a gif:

Optimizing Root Signatures

Ordering root constants

You should prefer to place constants and CBV’s directly in the root signature because they have less indirections as I mentioned before. These constants however placing the constants in a specific order can impact performance quite heavily thanks to driver optimization (Specifically NVidea).

Their are 2 rules to follow when ordering our constants:

Entries for the pixel shader go first.
The more frequent to entry is used the higher it should be on the list.

Rule 1 is more important than rule 2. Meaning the first entries should always be for the pixel shader.

Limit changes

It is highly recommended to limit the changes to the root signature’s content. This can be quite easily done by cashing the current values and only update it when a true change is detected. Or by designing your code in a better way.

The same goes for actual root signature changes. Initializing root signatures (and pipelines) can be quite expensive. You should prefer to initialize root during initialization time and not real time. I always write a queue of root signatures to create and create all of them in a batch on a separate thread.

Shader Visibility

The shader visibility of CBVs, SRVs and UAVs should be limited to the shaders that use it. Their is overhead in the driver and on the GPU for each stage that needs to see those views. I’m not sure whether this overhead still exists when not using the stages at all.

Hardware Tiers

Feature levels are strict sets of features required by certain versions of D3D12, as well as additional optional feature levels available within the same API version. Hardware tiers are different generations of hardware. The higher the tier the more resources available to the pipeline. Tier 3 is the newest tier released in april 2017 and has no resource restrictions.

The hardware tier of your device can actually make a massive performance difference if you have setup your root signature poorly. For tier 1 and tier 2 hardware you should fill all descriptors defined in the rot signature. This is even the case if the used shaders may not reference all these descriptors.

Tier 3 hardware (such as AMD GCN and, Intel Skylake) you should keep your unused descriptors bound. Unbinding them can cause state trashing bottlenecks.

Residency

A interesting yet complicated feature of DirectX12 is the ability to manage residency. But what is recidency?

If you are using large amount of textures and meshes they can no longer be stored in the GPU’s memory. We can remove memory and make them resident again using the ID3D12Device::MakeResident and ID3D12Device::Evict functions.

Evect

After calling Evict it is not save to touch a heap until you make him resident again by calling MakeResident. When a heap is evicted it means the Video Memory Manager can re purpose the memory. However when you can Evict the heap is only marked for eviction. The Video Memory Manager will try really hard not to evict it unless it really needs the memory. This causes the MakeResident call the be almost instantaneous and won’t impact performance significantly.

Why and When to Control Residency

When making a heap resident again the Video Memory Manager promises to return the data and page table mappings back to their original state ensuring the descriptors on the CPU are still valid and you don’t need to copy the data again. This is vastly more efficient than having to recreate the heap/committed resource, re-copy the data into the heap. and execute a GPU copy. But you mustn’t evict heaps to often since it still impacts performance.

Table of Contents

Preface

Guide to Readers

Introduction

Choosing The Right API

What DirectX Isn’t

The History of DirectX

Creating the project

Creating a window

The Graphics Pipeline

Input Assembler

Vertex Shader

Hull and Domain Shader

Geometry Shader

Pixel Shader

Output Merger

Initializing Direct3D12

Overview

Initialization Order

Device

Command Queue

Swap Chain

Image Formats

Swap Effects

Descriptor Heaps

Debugging

Resource Barriers

Command Lists (and Allocators)

Command List Types

Playing With Colors

Synchronization

Fences

Rendering Triangles

Constant Buffers

Root Signature

Root Parameters and Arguments

Root Constants, Descriptors and Tables

Creating Our Root Signature

Pipeline State Object

Creating Our Pipline State Object

Describing the Pipeline State Object

HLSL Shaders

Defining Where To Draw

Viewport

Scissor Rectangle

Vertex Buffers

Update Command Lists

Depth and Stencil Testing

The Depth Test

The Stencil Test

The Depth / Stencil Buffer

Creating The Depth Stencil Buffer

Binding The Depth Stencil Buffer

Indexed Rendering

Constant Buffers

Versioning

Constant Buffer Alignment

Implementation

Optimizing Root Signatures

Ordering root constants

Limit changes

Shader Visibility

Hardware Tiers

Residency

Evect

Why and When to Control Residency