Graphics API
API stands for Application Programming Interface. Basically a set of conventions / standards, compute engineers have come up with to write the software into. We need to pick sides here.
Choosing a graphics API to base our software upon is one of the most fundamental design we are going to make. For all practical purpose (read sunk man-month reasons) once we choose an API we will be “stuck” with it forever. This is one of the topics where I intentionally choose Performance over Development velocity. We could speed up software development by choosing a ready built engines such as open source ImGUI, GoDot, QT etc. Though, “engines” isolate the software from underlying APIs, we may get constrained by the engine itself at some point in future. We rule out closed source engines such as Unity and Unreal Engine for political reasons ! Fun Fact: This attitude is sometimes called NIH Syndrome i.e. Not-Invented-Here Syndrome. ;) So coming back to lower level APIs, we have limited APIs on each of the Operating Systems.
On windows, we have DirectX 9 / 10 / 11 / 12, OpenGL and Vulkan. OpenGL has been deprecated long back and newer graphics features such as Ray Tracing aren’t supported by it. Vulkan is generally a 2nd class citizen in windows compared to DirectX. Hence we choose the most modern flavor DirectX12. Remember, DirectX12 itself was 1st released in 2014. Hence setting it as a baseline requirement for our software is a reasonable decision. Hence DirectX12 is our ONLY graphics API for Windows Operating System. We support Windows 10 and 11 both for now (2025). This covers perhaps 90% of our target worldwide users. We also presume support of Heap_Tier_2 inside DirectX12. Note: Heap_Tier_2 started appearing in 2015/2016 timeline. What ShaderModel Level ? To be figured out. If you are feeling over-hyped to get deep down, read the 1st ( of 4 ) tutorial on DirectX12 here. It is ~100 pages !
Next most “market-share” operating system is MacOS on Apple Devices. In Apple world, Metal APIs are the only recommended ( non-deprecated ) APIs, hence we go with Metal. Even Vulkan works though a translation layer such as MoltenVK etc. Still for performance and 1st party support, we choose Metal API. Mac Graphics / Metal API shall also be partially reusable on iPhone / iPad devices, since they also have Metal as the preferred API.
Next up is Linux ( Ubuntu ) Operating System. This being open source operating system, open standard Vulkan is preferred here. We want our software to be available on even free operating systems. Hence we must have a Vulkan based US as well. Another reason for keeping this Vulkan interface is due to overlap with Android Mobile Operating System. For Android Phones, we have only 2 options, deprecated OpenGL or modern Vulkan. Hence we choose Vulkan. The within last 10 year version ! i.e. Vulkan 1.1.
Above 3 APIs are for desktop application. Next up is Brower based engine. Here upcoming ( as on 2025) API named WebGPU is chosen-one. This is supported by all major web-browser vendors i.e. Google Chrome, Apple Safari and Mozilla.
Having made above decisions, we have to be realistic about our core-engineering-degree-holder software developers. We can’t expect a chemical / civil / electrical / instrumentation / mechanical background people/developers to be familiar with such deep computer science concepts. Hence we structure our code in sort of mini-engine (NIH?), where adding a new UI element doesn’t involve fiddling deep down in graphics APIs. This will be sorted out progressively as our software matures.
Our software installer will verify that all the relevant APIs are present on the system, before installation. So this way, inside application, we don’t check every time whether a particular feature is supported by available hardware. Unless the initial installed-hardware itself changes. By default this check shouldn’t take more than a few micro-seconds during application startups.
More Graphics design decisions as specified in our Source Code !
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2
3/*
4Windows Desktop C++ DirectX12 application for CAD / CAM use.
5
6This file is our Architecture . Premitive data structures common to all platforms may be added here..
7
8At startup, pickup the GPU with highest VRAM. All rendering happens here only. Only 1 device supported for rendering.
9However OS may send the display frame to monitora connected to other / integrated GPU.
10
11VertexLayout Common to all geometry:
123x4 Bytes for Position, 4 Bytes for Normal, 4 Bytes for Color RGBA / 8 Bytes if HDR Monitor present. = 20 / 24 Bytes per vertex.
13Anyway go with 24 Bytes format ONLY. Tone mapping (HDR -> SDR) should happen in the Pixel Shader.
14
15Initially Hemispheric Ambient Lighting
16Factor = (Normal.z \times 0.5) + 0.5
17AmbientLight = Lerp(GroundColor, SkyColor, Factor)
18Screen Space Ambient Occlusion (SSAO) to darken creases and corners in future revision.
19
20Seperate render threads (1 per monitor) and single Copy thread. Copy thread is the ringmaster of VRAM!
21Seperate render threads per monitor are in VSync with monitors unique refresh rate. Hehe seperate render queue per monitor.
22
23We use ExecuteIndirect command with start vertex location instead of DrawIndexedInstanced per object.
24
25I want per tab VRAM isolation, each tab will be completely seperate. Except for unclosable tab 0 which stores common textures and UI elements.
26
27Since I want to support 100s of simultaneous tab, I want to start with small heap say 4MB per tab and grow only heap size only when necessary.
28Instead of allocating 1 giant 256MB buffer. Don't manually destroy heaps on tab switch. Use Evict. It allows the OS to handle the caching.
29If the user clicks back to a heavy tab, MakeResident is faster than re-creating heaps. Tab 0 is always resident.
30Eviction happens with a time lag of few seconds. Advanced system memory budget based eviction strategy after rest of spec implemented.
31
32There will be multiple views per tab. Each View will maintain a pair ( double buffered ) of ExecuteIndirect command buffer.
33When an object is deleted, copy thread receive command from engineering thread.
34Copy thread than update the next double buffer and record the hole in Vertex/index buffer. Except for currently filling head buffer,
35
36Maintain a Free-List Allocator (e.g., a Segregated Free List) on the CPU. Per Tab.
37The Allocator knows: "I have a 12KB middle gap in Page 3, and a 40KB middle gap in Page 8."When a 10KB request comes in,
38the Allocator immediately returns "Page 3". No iterating through Page objects.
39If freelist says none of existing pages can accomodate new geometry, than create new heap/placedresource buffer.
40Free list does not track internal holes created from deleting objects. Only middle empty space. Aggregate holes are tracked per page. Defragmented occasionally.
41
42When a buffer gets >25% holes, it does creates a new defragmented buffer, once complete, switches over to new buffer.
43For new geometry addition. Maximum 1 buffer is defragmented at a time (between 2 frames). Since max page size is 64MB,
44This will not produce high latemcy stall during aync with copy thread.
45
46Root Signature puts the "Constants" (View/Proj matrix) in root constants or a very fast descriptor table,
47as these don't change between pages. Only the VBV/IBV and the EI Argument Buffer change per batch/page.
48
49Here is the realistic "Worst Case" Hierarchy for a CAD Frame:
50• Index Depth (2): 16-bit vs 32-bit (Hardware Requirement) Examples: Nuts/Bolts (16) vs Engine Blocks (32)
51• Transparency (2): Opaque vs Transparent (Sorting Requirement). Transparent objects must be drawn last for alpha blending.
52• Topology (2): Triangles (Solid) vs Lines (Wireframe) (PSO Requirement). You cannot draw lines and triangles in the same call.
53• Culling (2): Single-Sided vs Double-Sided (PSO Requirement) . Sheet metal vs Solids.
54• Buffer Pages (N): How many 256MB blocks you are using.
55Total Unique Batches = 2 \times 2 \times 2 \times 2 \times N = 16 \times N
56
57This will ensure no pipeline state reset while rendering single Page. ExecuteIndirect call for every Page.
58
59‐---------------------------------------------------------------
60The industry standard solution for Normals is not 16-bit floats, but Packed 10-bit Integers.
61We use the format: DXGI_FORMAT_R10G10B10A2_UNORM.
62• X: 10 bits (0 to 1023) • Y: 10 bits (0 to 1023) • Z: 10 bits (0 to 1023) • Padding: 2 bits (unused) • Total: 32 bits (4 Bytes)
63Why this is perfect for Normals:
64• Size: It is 3x smaller than 12-byte normal. (4 bytes vs 12 bytes).
65• Precision: 10 bits gives you 2^{10} = 1024 steps. Since normals are always between -1.0 and 1.0, this gives you a precision of roughly 0.002.
66This is visually indistinguishable from 32-bit floats for lighting, even in high-end CAD.
67
68Vertex Shader: Normal = Input.Normal * 2.0 - 1.0.
69----------------------------------------------------------------
70
71Vertex and Index buffer in same Page : superior architectural choice for three reasons:
72• Halves the Allocation Overhead: You only manage 1 heap/resource per 4MB page instead of 2.
73• Cache Locality: When the GPU fetches a mesh, the vertices and indices are physically close in VRAM (same memory page).
74This can slightly improve cache hit rates.
75• Vertices start at Offset 0 and grow UP.
76• Indices start at Offset Max (4MB) and grow DOWN.
77• Free Space is always the gap in the middle.
78• Page Full when Vertex_Head_Ptr meets or crosses Index_Tail_Ptr.
79• 32 Bytes mandatory gap in middle to address alignment concerns.
80
81------------------------------------------------
82Lazy Creation.
83• When a user creates a new Tab, allocated memory = 0 MB.
84• User draws a Bolt (Solid): Allocate Solid_Page_0 (4MB).
85• User draws a Glass Window: Allocate Transparent_Page_0 (4MB).
86• User never draws a Wireframe: Wireframe_Page remains null.
87
88Resource state is together . I.e. D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER |
89D3D12_RESOURCE_STATE_INDEX_BUFFER
90------------------------------------
91Feature Decision Benefit
92Page Content Single Type Only Zero PSO switching during Draw.
93Growth Logic Chained Doubling 4->8->16->32->64. No moving old data.
94Max Page Size 64 MB Prevents fragmentation failure on low-VRAM GPUs.
95Allocation Lazy (On Demand) Keeps "Hello World" tabs lightweight.
96Sub-Allocation Double-Ended Stack Maximizes usage for varying ratio of Vertex/Index Buffers.
97
98----‐--------------------------
99New geometry is appended (in the middle ) only if both new vertex and index buffers fit inside. Otherwise allocate new buffer.
100Copy thread also does batching. It aggregates all(who fit in current buffer) objects coming from engineering thread into single GPU upload.
101The Copy Thread should consume batches of updates, coalescing them into single ExecuteCommandList calls where possible to reduce API overhead.
102
103"Big Buffer" fallback. If Allocation_Size > Max_Page_Size, allocate a dedicated Committed Resource just for that object, bypassing the paging system.
104Handles large STL. or terrain map. Treat "Big Buffers" as a special Page Type. Add a "Large Object List" to your loop.
105Do not try to jam them into the standard EI logic if they require unique resource bindings per object. 1 seperate draw command for such Jumbo objects.
106
107Create a separate std::vector<BigObject> in Tab structure. Rendering:
108• Loop through Pages (ExecuteIndirect).
109• Loop through BigObjects (Standard DrawIndexedInstanced or EI with count 1).
110
111Defragmentation Logic:
112Copy queue marks the page for defragmentation. All frames of that tab freeze. Keep presenting previous render output.
113Any 1 of the rendering thread/queue reads the mark, Transition the resource to Common. Signal a fence.
114Copy queue picks it up , once defragmented, return the new resource . I am willing to accept the freeze of few frames on screen.
115This is a recognised engineering tradeoff. Acceptable to CAD users.
116
117EI Argument Buffers tightly coupled to the Memory Pages. When you defragment a Page, you must simultaneously rebuild its corresponding Argument Buffer.
118Do not try to "patch" the Argument buffer; regenerate it for that Page.
119
120Growth Logic: Similar to above defragmentation. How does my copy queue handle async ( without blocking render thread?)
121addition of 1 small geometry say 10kb to already existing 64MB heap out of which 50MB is filled up.
122All Views/frames of that particular tab freeze. However other tabs being handled by render thread keep processing.
123No thread stall. Transition that page to copy destination. Copy new data. Transituon back to render status for render thread to pick up.
124
125--------------------------
126RenderToTexture to implement frame freeze since swap chain is FLIP_DISCARD.
127Side benefits? ✔ HDR handling ✔ UI composition ✔ Multi-monitor flexibility ✔ Eviction safety ✔ Clean defrag freezes
128
129-------------------------------------------------------------------------------
130Known Issues / Limitations (to be resolved in latter revision):
1311. Transparency sorting. accepting imperfect sorting for "Glass" pages during rotation, and doing a CPU Sort + Args Rebuild only when the camera stops.
1322. Hot page for object drag / active mutation.
1333. Evict logic.
1344. Comput shader frustum culling.
1355. Telemetry. Per-tab VRAM usage graphs. Page fragmentation heatmap. Eviction frequency counters. Copy queue stall tracking
1366. Selection Highlighter methodology.
1377. Mesh Shader on supported hardware (RTX2000 onwards, RX6000 onwards).
1388. Instanced based LOD optimization . Optionally using compute shader.
139
140Find out any other architectural pitfalls / challenges I need to look after. Think over it for long and reply.
141
142With this our graphics design document ends.
143-------------------------------------------------------------------------------
144Miscellaneous aspects of Specification:
145There will be a uniform object it ( 64 bit ) unique across all objects across entire process memory.
146Each object can have up-to 16? different simultaneous variations of vertex geometry / graphics representation.
147I am expecting 1000 to 5000 draw calls per frame ?
148How should I handle multiple partially overlapping windows? Each windows can be independently resized or maximized / minimized.
149Lowest distance between object and ALL the different view camera position shall be used by logic threads to decided the Level of Detail.
150It will have some mechanism to manage memory over pressure. To signal the logic threads to reduce the level of detail within some distance.
151Our GPU Memory manager will be a singleton. There will be only 1 instance of that class managing entire GPU memory.
152
153
154Conceptual Questions: Ask this to any contemporary AI:
155Consider a Desktop PC. It has 2 discrete graphics card and 1 integrated graphics card. 1 Monitor is connected and active to each of these 3 devices.
156Can Windows 11 handle moving application window from 1 screen to another smoothly?
157What if application is attached to 1st device, and window is moved from the monitor connected to 2nd device to the monitor connected to integrated GPU?
158
159*/
160
161
162/*
163-------------------------------------------------------------------------------
164TO DO LIST : As things get completed, they will be removed from this pending list and get incorporated appropriately in design document.
165-------------------------------------------------------------------------------
166Phase 1: The Visual Baseline (Get these out of the way)
167Do this first so you aren't fighting "black screen" bugs later.
168[Done] Release Downloads (New Repository).
169[Done] Update Vertex format to include Normals. (Required for lighting).
170[Done] Hemispherical Lighting in shader. (Verify normals are correct).
171[ ] Mouse Zoom/Pan/Rotate (Basic).
172Advice: Move this UP. You need to be able to move the camera to debug the complex culling/memory bugs you are about to create in Phase 2 & 3.
173
174Phase 2: The "Freeze" Infrastructure
175Before you break the memory model, build the mechanism that hides the breakage.
176[ ] Render To Texture (RTT) & Full-Screen Quad.
177Goal: Detach the "Drawing" from the "Presenting."
178Success State: You can resize the window, and the inner "Canvas" scales or freezes independently of the window border.
179[ ] Face-wise Geometry colors. (Implementation detail).
180[ ] Upgrade Vertices to HDR + Tonemapping. (Do this now while touching pixel shaders).
181
182Phase 3: The API Pivot (The Hardest Part)
183Switching to ExecuteIndirect changes how you pass data. Do this BEFORE implementing custom heaps to isolate variables.
184[ ] [MISSING] Implement Structured Buffer for World Matrices.
185Critical: You cannot do ExecuteIndirect for multiple objects without a way to tell the shader which object is being drawn. You need a StructuredBuffer<float4x4> and a root constant index.
186[ ] DrawIndexedInstanced → ExecuteIndirect (EI).
187Advice: Implement this using your current committed resources first. Just get the API call working.
188[ ] Double buffered ExecuteIndirect Arguments.
189
190Phase 4: The Memory Manager (The "Vishwakarma" Core)
191Now that EI is working, replace the backing memory.
192[ ] [MISSING] Global Upload Ring Buffer.
193Critical: Your copy thread needs a staging area. If you don't build this, your "VRAM Pages" step will stall waiting on CreateCommittedResource for uploads.
194[ ] VRAM Pages per Tab (The Stack Allocator).
195Advice: Implement the "Double-Ended Stack" (Vertex Up, Index Down) here.
196[ ] CPU-Side Free List Allocator. (The logic that tracks the holes).
197[ ] Tab Management / View Management. (Integrating the heaps into the UI).
198
199Phase 5: Advanced Features & Polish
200[ ] VRAM Defragmentation. (Now safe to implement because RTT exists).
201[ ] Click Selection / Window Selection. (Requires Raycasting against your CPU Free List/Data structures).
202[ ] Instanced optimization for Pipes.
203[ ] SSAO.
204
205*/Actual Code of our graphics engine.
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2#pragma once
3
4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
7// Expect to read at least 2 times before you start grasping it !
8
9//Tell the HLSL compiler to include debug information into the shader blob.
10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
11#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
12//helper structures Library. MIT Licensed. Added to the project as git submodule.
13//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
14#include <d3dx12.h>
15#include <dxgi1_6.h>
16#include <dxgidebug.h>
17#include <wrl.h>
18#include <d3dcompiler.h>
19#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
20#include <vector>
21#include <string>
22#include <unordered_map>
23#include <random>
24#include <ctime>
25#include <iostream>
26#include <thread>
27#include <chrono>
28#include <map>
29#include <list>
30
31#include "डेटा.h"
32
33using namespace Microsoft::WRL;
34using namespace DirectX;
35
36//DirectX12 Libraries.
37#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
38#pragma comment(lib, "dxgi.lib")
39#pragma comment(lib, "d3dcompiler.lib")
40#pragma comment(lib, "dxguid.lib")
41
42/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
43exceeds frame refresh interval, than strutting distortion will appear. However
44we low input latency outweighs the slight frame smoothness of triple buffering.
45Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
46const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
47
48// Constants
49const UINT MaxPyramids = 100; // MODIFICATION: Define a max pyramid count for pre-allocation.
50const UINT MaxVertexCount = MaxPyramids * 4;
51const UINT MaxIndexCount = MaxPyramids * 12;
52const UINT MaxVertexBufferSize = MaxVertexCount * sizeof(Vertex);
53const UINT MaxIndexBufferSize = MaxIndexCount * sizeof(UINT16);
54
55/* DirectX 12 resources are organized at 3 levels:
561. The Data : Per Tab (Jumbo Buffers for geometry data, materials, textures, etc.)
572. The Target : Per Window (Swap Chain, Render Targets, Command Queue, Command List, etc.)
583. The Worker : Per Render Thread (Resources shared across multiple windows on the same monitor)
59*/
60
61struct DX12ResourcesPerTab { // (The Data) Geometry Data
62 // Since data is isolated per tab, these live here. We use a "Jumbo" buffer approach to reduce switching.
63 ComPtr<ID3D12Resource> vertexBuffer;
64 ComPtr<ID3D12Resource> indexBuffer;
65
66 // Upload Heaps (CPU -> GPU Transfer)
67 // Moved here because the Copy Thread writes to these when adding objects to the TAB.
68 ComPtr<ID3D12Resource> vertexBufferUpload;
69 ComPtr<ID3D12Resource> indexBufferUpload;
70
71 // Persistent Mapped Pointers (CPU Address)
72 UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
73 UINT8* pIndexDataBegin = nullptr; // Pointer for mapped index upload buffer
74
75 // Views into the buffers (to be bound during Draw)
76 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
77 D3D12_INDEX_BUFFER_VIEW indexBufferView;
78
79 // TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
80 ComPtr<ID3D12DescriptorHeap> srvHeap;
81
82 // Track how much of the jumbo buffer is used
83 uint64_t vertexDataSize = 0;
84 uint64_t indexDataSize = 0;
85};
86
87struct DX12ResourcesPerWindow {// Presentation Logic
88 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
89 int WindowHeight = 600;
90 ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with. To assist with migrations.
91
92 ComPtr<IDXGISwapChain3> swapChain; // The link to the OS Window
93 //ComPtr<ID3D12CommandQueue> commandQueue; // Moved to OneMonitorController
94 ComPtr<ID3D12DescriptorHeap> rtvHeap;
95 ComPtr<ID3D12Resource> renderTargets[FRAMES_PER_RENDERTARGETS];
96 UINT rtvDescriptorSize = 0;
97
98 ComPtr<ID3D12RootSignature> rootSignature;
99 ComPtr<ID3D12PipelineState> pipelineState;
100
101 ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
102 ComPtr<ID3D12DescriptorHeap> dsvHeap;
103
104 D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size). Not used yet.
105 D3D12_RECT scissorRect;
106
107 ComPtr<ID3D12Resource> constantBuffer;
108 ComPtr<ID3D12DescriptorHeap> cbvHeap;
109 UINT8* cbvDataBegin = nullptr;
110
111 UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread. It can change even during windows resize.
112};
113
114struct DX12ResourcesPerRenderThread { // Execution Context
115 // For convenience only. It simply points to OneMonitorController.commandQueue
116 ComPtr<ID3D12CommandQueue> commandQueue;
117
118 // Note that there are as many render thread as number of monitors attached.
119 // Command Allocators MUST be unique to the thread.
120 // We need one per frame-in-flight to avoid resetting while GPU is reading.
121 ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
122 UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
123
124 // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
125 ComPtr<ID3D12GraphicsCommandList> commandList;
126
127 // Synchronization (Per Window VSync)
128 HANDLE fenceEvent = nullptr;
129 ComPtr<ID3D12Fence> fence;
130 UINT64 fenceValue = 0;
131};
132
133struct OneMonitorController { // Variables stored per monitor.
134 // System Fetched information.
135 bool isScreenInitalized = false;
136 int screenPixelWidth = 800;
137 int screenPixelHeight = 600;
138 int screenPhysicalWidth = 0; // in mm
139 int screenPhysicalHeight = 0; // in mm
140 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
141 int WindowHeight = 600;
142
143 HMONITOR hMonitor = NULL; // Monitor handle
144 std::wstring deviceName; // Monitor device name (e.g., "\\\\.\\DISPLAY1")
145 std::wstring friendlyName; // Human readable name (e.g., "Dell U2720Q")
146 RECT monitorRect; // Full monitor rectangle
147 RECT workAreaRect; // Work area (excluding task bar)
148 int dpiX = 96; // DPI X
149 int dpiY = 96; // DPI Y
150 double scaleFactor = 1.0; // Scale factor (100% = 1.0, 125% = 1.25, etc.)
151 bool isPrimary = false; // Is this the primary monitor?
152 DWORD orientation = DMDO_DEFAULT; // Monitor orientation
153 int refreshRate = 60; // Refresh rate in Hz
154 int colorDepth = 32; // Color depth in bits per pixel
155
156 bool isVirtualMonitor = false; // To support headless mode.
157
158 // DirectX12 Resources.
159 ComPtr<ID3D12CommandQueue> commandQueue; // Persistent. Survives thread restarts.
160 // We need to know if this specific monitor is currently being serviced by a thread
161 bool hasActiveThread = false;
162};
163
164// Commands sent from Generator thread(s) to the Copy thread
165enum class CommandToCopyThreadType { ADD, MODIFY, REMOVE };
166struct CommandToCopyThread
167{
168 CommandToCopyThreadType type;
169 std::optional<GeometryData> geometry; // Present for ADD and MODIFY
170 uint64_t id; // Always present
171};
172// Thread synchronization between Main Logic thread and Copy thread
173extern std::mutex toCopyThreadMutex;
174extern std::condition_variable toCopyThreadCV;
175extern std::queue<CommandToCopyThread> commandToCopyThreadQueue;
176
177extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
178// Represents complete geometry and index data associated with 1 engineering object..
179// This structure holds information about a resource allocated in GPU memory (VRAM)
180struct GpuResourceVertexIndexInfo {
181 ComPtr<ID3D12Resource> vertexBuffer;
182 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
183 ComPtr<ID3D12Resource> indexBuffer;
184 D3D12_INDEX_BUFFER_VIEW indexBufferView;
185 UINT indexCount;
186
187 //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
188 // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
189 //uint64_t vramOffset; // Simulated VRAM address
190 //uint64_t size;
191 // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
192};
193
194extern std::mutex objectsOnGPUMutex;
195// Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
196extern std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
197
198// Packet of work for a Render Thread for one frame
199struct RenderPacket {
200 uint64_t frameNumber;
201 std::vector<uint64_t> visibleObjectIds;
202};
203
204class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
205{
206public:
207 HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
208 HRESULT Error() const { return hr; }
209private:
210 const HRESULT hr;
211};
212
213inline void ThrowIfFailed(HRESULT hr)
214{
215 if (FAILED(hr)) { throw HrException(hr); }
216}
217
218
219class ThreadSafeQueueGPU {
220public:
221 void push(CommandToCopyThread value) {
222 std::lock_guard<std::mutex> lock(mutex);
223 fifoQueue.push(std::move(value));
224 cond.notify_one();
225 }
226
227 // Non-blocking pop
228 bool try_pop(CommandToCopyThread& value) {
229 std::lock_guard<std::mutex> lock(mutex);
230 if (fifoQueue.empty()) {
231 return false;
232 }
233 value = std::move(fifoQueue.front());
234 fifoQueue.pop();
235 return true;
236 }
237
238 // Shuts down the queue, waking up any waiting threads
239 void shutdownQueue() {
240 std::lock_guard<std::mutex> lock(mutex);
241 shutdown = true;
242 cond.notify_all();
243 }
244
245private:
246 std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
247 std::mutex mutex;
248 std::condition_variable cond;
249 bool shutdown = false;
250};
251
252inline ThreadSafeQueueGPU g_gpuCommandQueue;
253
254// VRAM Manager
255// This class handles the GPU memory dynamically.
256// There will be exactly 1 object of this class in entire application. Hence the special name.
257// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
258class शंकर {
259public:
260 std::vector<OneMonitorController> screens;
261
262 ComPtr<IDXGIFactory4> factory; //The OS-level display system manager. Can iterate over GPUs.
263 //ComPtr<IDXGIFactory6> dxgiFactory;
264 ComPtr<IDXGIAdapter1> hardwareAdapter;// Represents a physical GPU device.
265 //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
266
267 ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
268 bool isGPUEngineInitialized = false; //TODO: To be implemented.
269
270 //Following to be added latter.
271 //ID3D12DescriptorHeapMgr ← Global descriptor allocator
272 //Shader& PSO Cache ← Shared by all threads
273 //AdapterInfo ← For device selection / VRAM stats
274
275 /* We will have 1 Render Queue per monitor, which is local to Render Thread.
276 IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
277 Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
278 Still we need to have 4 separate queue to properly handle different refresh rate.
279
280 Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
281 waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand.
282 Window B cannot be processed because the Queue is blocked by Windows A's VSync wait.
283 By using 4 Queues, Queue A can sit blocked waiting for VSync,
284 while Queue B immediately push work work to the GPU for the faster monitor.*/
285
286 ComPtr<ID3D12CommandQueue> renderCommandQueue; // Only used by Monitor No. 0 i.e. 1st Render Thread.
287 ComPtr<ID3D12Fence> renderFence;// Synchronization for Render Queue
288 UINT64 renderFenceValue = 0;
289 HANDLE renderFenceEvent = nullptr;
290
291 ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
292 ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
293 UINT64 copyFenceValue = 0;
294 HANDLE copyFenceEvent = nullptr;
295
296public:
297 UINT8* pVertexDataBegin = nullptr; // MODIFICATION: Pointer for mapped vertex upload buffer
298 UINT8* pIndexDataBegin = nullptr; // MODIFICATION: Pointer for mapped index upload buffer
299
300 // Maps our CPU ObjectID to its resource info in VRAM
301 std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
302
303 // Simulates a simple heap allocator with 16MB chunks
304 uint64_t m_nextFreeOffset = 0;
305 const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
306 uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
307
308 // When an object is updated, the old VRAM is put here to be freed later.
309 struct DeferredFree {
310 uint64_t frameNumber; // The frame it became obsolete
311 GpuResourceVertexIndexInfo resource;
312 };
313 std::list<DeferredFree> deferredFreeQueue;
314
315 // Allocate space in VRAM. Returns the handle. What is this used for?
316 // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
317
318 void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
319
320 शंकर() {}; // Our Main function inilsizes DirectX12 global resources by calling InitD3DDeviceOnly().
321 void InitD3DDeviceOnly();
322 void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
323 void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
324 void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thead.
325 DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes);
326 void WaitForPreviousFrame(DX12ResourcesPerRenderThread dx);
327
328 // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
329 void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
330 // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
331 void CleanupTabResources(DX12ResourcesPerTab& tabRes);
332 // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
333 // Thread resources are cleaned up by the Render Thread itself before exit.
334 void CleanupD3DGlobal();
335};
336
337extern int g_monitorCount; // Global variable. We support as many monitors as the system has.
338
339void FetchAllMonitorDetails();
340BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
341
342/*
343IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
344retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
345COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
346
347Ex: IID_PPV_ARGS(&device) expands to following:
348IID iid = __uuidof(ID3D12Device);
349void** ppv = reinterpret_cast<void**>(&device);
350*/
351
352// Structure to hold transformation matrices
353struct ConstantBuffer {
354 DirectX::XMFLOAT4X4 worldViewProjection;
355 DirectX::XMFLOAT4X4 world;
356};
357
358// Externs for communication
359extern std::atomic<bool> shutdownSignal;
360extern ThreadSafeQueueGPU g_gpuCommandQueue;
361
362// Logic Thread "Fence"
363extern std::mutex g_logicFenceMutex;
364extern std::condition_variable g_logicFenceCV;
365extern uint64_t g_logicFrameCount;
366
367// Copy Thread "Fence"
368extern std::mutex g_copyFenceMutex;
369extern std::condition_variable g_copyFenceCV;
370extern uint64_t g_copyFrameCount;
371
372//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
373/*
374std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
375
376 if (nextFreeOffset + size > m_vram_capacity) {
377 std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
378 // Here, the Main Logic thread would be signaled to reduce LOD.
379 return std::nullopt;
380 }
381 GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
382 nextFreeOffset += size; // Simple bump allocator
383 return info;
384}*/
385
386// =================================================================================================
387// Utility Functions
388// =================================================================================================
389
390// Waits for the previous frame to complete rendering.
391inline void WaitForGpu(DX12ResourcesPerWindow dx)
392{ //Where are we using this function?
393 /*
394 dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
395 dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
396 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
397 dx.fenceValue++;*/
398}
399
400// Waits for a specific fence value to be reached
401inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
402{ // Where are we using this?
403 /*
404 if (dx.fence->GetCompletedValue() < fenceValue)
405 {
406 ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
407 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
408 }*/
409}
410
411// =================================================================================================
412// Thread Functions
413// =================================================================================================
414// Thread synchronization between Main Logic thread and Copy thread
415inline std::mutex toCopyThreadMutex;
416inline std::condition_variable toCopyThreadCV;
417inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
418inline std::mutex objectsOnGPUMutex;
419// Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
420inline std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
421
422// Thread Functions - Just Declaration!
423void GpuCopyThread();
424void GpuRenderThread(int monitorId, int refreshRate);
1// Copyright (c) 2025-Present : Ram Shanker: All rights reserved.
2#pragma once
3
4//DirectX 12 headers. Best Place to learn DirectX12 is original Microsoft documentation.
5// https://learn.microsoft.com/en-us/windows/win32/direct3d12/direct3d-12-graphics
6// You need a good dose of prior C++ knowledge and Computer Fundamentals before learning DirectX12.
7// Expect to read at least 2 times before you start grasping it !
8
9//Tell the HLSL compiler to include debug information into the shader blob.
10#define D3DCOMPILE_DEBUG 1 //TODO: Remove from production build.
11#include <d3d12.h> //Main DirectX12 API. Included from %WindowsSdkDir\Include%WindowsSDKVersion%\\um
12//helper structures Library. MIT Licensed. Added to the project as git submodule.
13//https://github.com/microsoft/DirectX-Headers/blob/main/include/directx/d3dx12.h
14#include <d3dx12.h>
15#include <dxgi1_6.h>
16#include <dxgidebug.h>
17#include <wrl.h>
18#include <d3dcompiler.h>
19#include <DirectXMath.h> //Where from? https://github.com/Microsoft/DirectXMath ?
20#include <vector>
21#include <string>
22#include <unordered_map>
23#include <random>
24#include <ctime>
25#include <iostream>
26#include <thread>
27#include <chrono>
28#include <map>
29#include <list>
30
31#include "डेटा.h"
32
33using namespace Microsoft::WRL;
34using namespace DirectX;
35
36//DirectX12 Libraries.
37#pragma comment(lib, "d3d12.lib") //%WindowsSdkDir\Lib%WindowsSDKVersion%\\um\arch
38#pragma comment(lib, "dxgi.lib")
39#pragma comment(lib, "d3dcompiler.lib")
40#pragma comment(lib, "dxguid.lib")
41
42/* Double buffering is preferred for CAD application due to low input lag.Caveat: If rendering time
43exceeds frame refresh interval, than strutting distortion will appear. However
44we low input latency outweighs the slight frame smoothness of triple buffering.
45Double buffering (2x) is also 50% more memory efficient Triple Buffering (3x). */
46const UINT FRAMES_PER_RENDERTARGETS = 2; //Initially we are going with double buffering.
47
48// Constants
49const UINT MaxPyramids = 100; // MODIFICATION: Define a max pyramid count for pre-allocation.
50const UINT MaxVertexCount = MaxPyramids * 4;
51const UINT MaxIndexCount = MaxPyramids * 12;
52const UINT MaxVertexBufferSize = MaxVertexCount * sizeof(Vertex);
53const UINT MaxIndexBufferSize = MaxIndexCount * sizeof(UINT16);
54
55/* DirectX 12 resources are organized at 3 levels:
561. The Data : Per Tab (Jumbo Buffers for geometry data, materials, textures, etc.)
572. The Target : Per Window (Swap Chain, Render Targets, Command Queue, Command List, etc.)
583. The Worker : Per Render Thread (Resources shared across multiple windows on the same monitor)
59*/
60
61struct DX12ResourcesPerTab { // (The Data) Geometry Data
62 // Since data is isolated per tab, these live here. We use a "Jumbo" buffer approach to reduce switching.
63 ComPtr<ID3D12Resource> vertexBuffer;
64 ComPtr<ID3D12Resource> indexBuffer;
65
66 // Upload Heaps (CPU -> GPU Transfer)
67 // Moved here because the Copy Thread writes to these when adding objects to the TAB.
68 ComPtr<ID3D12Resource> vertexBufferUpload;
69 ComPtr<ID3D12Resource> indexBufferUpload;
70
71 // Persistent Mapped Pointers (CPU Address)
72 UINT8* pVertexDataBegin = nullptr;// Pointer for mapped vertex upload buffer
73 UINT8* pIndexDataBegin = nullptr; // Pointer for mapped index upload buffer
74
75 // Views into the buffers (to be bound during Draw)
76 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
77 D3D12_INDEX_BUFFER_VIEW indexBufferView;
78
79 // TODO: We will generalize this to hold materials, shaders, textures etc. unique to this project/tab
80 ComPtr<ID3D12DescriptorHeap> srvHeap;
81
82 // Track how much of the jumbo buffer is used
83 uint64_t vertexDataSize = 0;
84 uint64_t indexDataSize = 0;
85};
86
87struct DX12ResourcesPerWindow {// Presentation Logic
88 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
89 int WindowHeight = 600;
90 ID3D12CommandQueue* creatorQueue = nullptr; // Track which queue this windows was created with. To assist with migrations.
91
92 ComPtr<IDXGISwapChain3> swapChain; // The link to the OS Window
93 //ComPtr<ID3D12CommandQueue> commandQueue; // Moved to OneMonitorController
94 ComPtr<ID3D12DescriptorHeap> rtvHeap;
95 ComPtr<ID3D12Resource> renderTargets[FRAMES_PER_RENDERTARGETS];
96 UINT rtvDescriptorSize = 0;
97
98 ComPtr<ID3D12RootSignature> rootSignature;
99 ComPtr<ID3D12PipelineState> pipelineState;
100
101 ComPtr<ID3D12Resource> depthStencilBuffer;// Depth Buffer (Sized to the window dimensions)
102 ComPtr<ID3D12DescriptorHeap> dsvHeap;
103
104 D3D12_VIEWPORT viewport;// Viewport & Scissor (Dependent on Window Size). Not used yet.
105 D3D12_RECT scissorRect;
106
107 ComPtr<ID3D12Resource> constantBuffer;
108 ComPtr<ID3D12DescriptorHeap> cbvHeap;
109 UINT8* cbvDataBegin = nullptr;
110
111 UINT frameIndex = 0; // Remember this is different from allocatorIndex in Render Thread. It can change even during windows resize.
112};
113
114struct DX12ResourcesPerRenderThread { // Execution Context
115 // For convenience only. It simply points to OneMonitorController.commandQueue
116 ComPtr<ID3D12CommandQueue> commandQueue;
117
118 // Note that there are as many render thread as number of monitors attached.
119 // Command Allocators MUST be unique to the thread.
120 // We need one per frame-in-flight to avoid resetting while GPU is reading.
121 ComPtr<ID3D12CommandAllocator> commandAllocators[FRAMES_PER_RENDERTARGETS];
122 UINT allocatorIndex = 0; // Remember this is different from frameIndex available per Window.
123
124 // The Command List (The recording pen). Can be reset and reused for multiple windows within the same frame.
125 ComPtr<ID3D12GraphicsCommandList> commandList;
126
127 // Synchronization (Per Window VSync)
128 HANDLE fenceEvent = nullptr;
129 ComPtr<ID3D12Fence> fence;
130 UINT64 fenceValue = 0;
131};
132
133struct OneMonitorController { // Variables stored per monitor.
134 // System Fetched information.
135 bool isScreenInitalized = false;
136 int screenPixelWidth = 800;
137 int screenPixelHeight = 600;
138 int screenPhysicalWidth = 0; // in mm
139 int screenPhysicalHeight = 0; // in mm
140 int WindowWidth = 800;//Current ViewPort ( Rendering area ) size. excluding task-bar etc.
141 int WindowHeight = 600;
142
143 HMONITOR hMonitor = NULL; // Monitor handle
144 std::wstring deviceName; // Monitor device name (e.g., "\\\\.\\DISPLAY1")
145 std::wstring friendlyName; // Human readable name (e.g., "Dell U2720Q")
146 RECT monitorRect; // Full monitor rectangle
147 RECT workAreaRect; // Work area (excluding task bar)
148 int dpiX = 96; // DPI X
149 int dpiY = 96; // DPI Y
150 double scaleFactor = 1.0; // Scale factor (100% = 1.0, 125% = 1.25, etc.)
151 bool isPrimary = false; // Is this the primary monitor?
152 DWORD orientation = DMDO_DEFAULT; // Monitor orientation
153 int refreshRate = 60; // Refresh rate in Hz
154 int colorDepth = 32; // Color depth in bits per pixel
155
156 bool isVirtualMonitor = false; // To support headless mode.
157
158 // DirectX12 Resources.
159 ComPtr<ID3D12CommandQueue> commandQueue; // Persistent. Survives thread restarts.
160 // We need to know if this specific monitor is currently being serviced by a thread
161 bool hasActiveThread = false;
162};
163
164// Commands sent from Generator thread(s) to the Copy thread
165enum class CommandToCopyThreadType { ADD, MODIFY, REMOVE };
166struct CommandToCopyThread
167{
168 CommandToCopyThreadType type;
169 std::optional<GeometryData> geometry; // Present for ADD and MODIFY
170 uint64_t id; // Always present
171};
172// Thread synchronization between Main Logic thread and Copy thread
173extern std::mutex toCopyThreadMutex;
174extern std::condition_variable toCopyThreadCV;
175extern std::queue<CommandToCopyThread> commandToCopyThreadQueue;
176
177extern std::atomic<bool> pauseRenderThreads; // Defined in Main.cpp
178// Represents complete geometry and index data associated with 1 engineering object..
179// This structure holds information about a resource allocated in GPU memory (VRAM)
180struct GpuResourceVertexIndexInfo {
181 ComPtr<ID3D12Resource> vertexBuffer;
182 D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
183 ComPtr<ID3D12Resource> indexBuffer;
184 D3D12_INDEX_BUFFER_VIEW indexBufferView;
185 UINT indexCount;
186
187 //TODO: Latter on we will generalize this structure to hold textures, materials, shaders etc.
188 // Currently we are letting the Drive manage the GPU memory fragmentation. Latter we will manage it ourselves.
189 //uint64_t vramOffset; // Simulated VRAM address
190 //uint64_t size;
191 // In a real DX12 app, this would hold ID3D12Resource*, D3D12_VERTEX_BUFFER_VIEW, etc.
192};
193
194extern std::mutex objectsOnGPUMutex;
195// Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
196extern std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
197
198// Packet of work for a Render Thread for one frame
199struct RenderPacket {
200 uint64_t frameNumber;
201 std::vector<uint64_t> visibleObjectIds;
202};
203
204class HrException : public std::runtime_error// Simple exception helper for HRESULT checks
205{
206public:
207 HrException(HRESULT hr) : std::runtime_error("HRESULT Exception"), hr(hr) {}
208 HRESULT Error() const { return hr; }
209private:
210 const HRESULT hr;
211};
212
213inline void ThrowIfFailed(HRESULT hr)
214{
215 if (FAILED(hr)) { throw HrException(hr); }
216}
217
218
219class ThreadSafeQueueGPU {
220public:
221 void push(CommandToCopyThread value) {
222 std::lock_guard<std::mutex> lock(mutex);
223 fifoQueue.push(std::move(value));
224 cond.notify_one();
225 }
226
227 // Non-blocking pop
228 bool try_pop(CommandToCopyThread& value) {
229 std::lock_guard<std::mutex> lock(mutex);
230 if (fifoQueue.empty()) {
231 return false;
232 }
233 value = std::move(fifoQueue.front());
234 fifoQueue.pop();
235 return true;
236 }
237
238 // Shuts down the queue, waking up any waiting threads
239 void shutdownQueue() {
240 std::lock_guard<std::mutex> lock(mutex);
241 shutdown = true;
242 cond.notify_all();
243 }
244
245private:
246 std::queue<CommandToCopyThread> fifoQueue; // fifo = First-In First-Out
247 std::mutex mutex;
248 std::condition_variable cond;
249 bool shutdown = false;
250};
251
252inline ThreadSafeQueueGPU g_gpuCommandQueue;
253
254// VRAM Manager
255// This class handles the GPU memory dynamically.
256// There will be exactly 1 object of this class in entire application. Hence the special name.
257// भगवान शंकर की कृपा बनी रहे. Corresponding object is named "gpu".
258class शंकर {
259public:
260 std::vector<OneMonitorController> screens;
261
262 ComPtr<IDXGIFactory4> factory; //The OS-level display system manager. Can iterate over GPUs.
263 //ComPtr<IDXGIFactory6> dxgiFactory;
264 ComPtr<IDXGIAdapter1> hardwareAdapter;// Represents a physical GPU device.
265 //Represents 1 logical GPU device on above GPU adapter. Helps create all DirectX12 memory / resources / comments etc.
266
267 ComPtr<ID3D12Device> device; //Very Important: We support EXACTLY 1 GPU device only in this version.
268 bool isGPUEngineInitialized = false; //TODO: To be implemented.
269
270 //Following to be added latter.
271 //ID3D12DescriptorHeapMgr ← Global descriptor allocator
272 //Shader& PSO Cache ← Shared by all threads
273 //AdapterInfo ← For device selection / VRAM stats
274
275 /* We will have 1 Render Queue per monitor, which is local to Render Thread.
276 IMPORTANT: All GPU have only 1 physical hardware engine, and can execute 1 command at a time only.
277 Even if 4 commands list are submitted to 4 independent queue, graphics driver / WDDM serializes them.
278 Still we need to have 4 separate queue to properly handle different refresh rate.
279
280 Ex: If we put all 4 window on same queue: Window A (60Hz) submits a Present command. The Queue STALLS
281 waiting for Monitor A's VSync interval. Window B (144Hz) submits draw comand.
282 Window B cannot be processed because the Queue is blocked by Windows A's VSync wait.
283 By using 4 Queues, Queue A can sit blocked waiting for VSync,
284 while Queue B immediately push work work to the GPU for the faster monitor.*/
285
286 ComPtr<ID3D12CommandQueue> renderCommandQueue; // Only used by Monitor No. 0 i.e. 1st Render Thread.
287 ComPtr<ID3D12Fence> renderFence;// Synchronization for Render Queue
288 UINT64 renderFenceValue = 0;
289 HANDLE renderFenceEvent = nullptr;
290
291 ComPtr<ID3D12CommandQueue> copyCommandQueue; // There is only 1 across the application.
292 ComPtr<ID3D12Fence> copyFence;// Synchronization for Copy Queue
293 UINT64 copyFenceValue = 0;
294 HANDLE copyFenceEvent = nullptr;
295
296public:
297 UINT8* pVertexDataBegin = nullptr; // MODIFICATION: Pointer for mapped vertex upload buffer
298 UINT8* pIndexDataBegin = nullptr; // MODIFICATION: Pointer for mapped index upload buffer
299
300 // Maps our CPU ObjectID to its resource info in VRAM
301 std::unordered_map<uint64_t, GpuResourceVertexIndexInfo> resourceMap;
302
303 // Simulates a simple heap allocator with 16MB chunks
304 uint64_t m_nextFreeOffset = 0;
305 const uint64_t CHUNK_SIZE = 16 * 1024 * 1024;
306 uint64_t m_vram_capacity = 4 * CHUNK_SIZE; // Simulate 64MB VRAM
307
308 // When an object is updated, the old VRAM is put here to be freed later.
309 struct DeferredFree {
310 uint64_t frameNumber; // The frame it became obsolete
311 GpuResourceVertexIndexInfo resource;
312 };
313 std::list<DeferredFree> deferredFreeQueue;
314
315 // Allocate space in VRAM. Returns the handle. What is this used for?
316 // std::optional<GpuResourceVertexIndexInfo> Allocate(size_t size);
317
318 void ProcessDeferredFrees(uint64_t lastCompletedRenderFrame);
319
320 शंकर() {}; // Our Main function inilsizes DirectX12 global resources by calling InitD3DDeviceOnly().
321 void InitD3DDeviceOnly();
322 void InitD3DPerTab(DX12ResourcesPerTab& tabRes); // Call this when a new Tab is created
323 void InitD3DPerWindow(DX12ResourcesPerWindow& dx, HWND hwnd, ID3D12CommandQueue* commandQueue);
324 void PopulateCommandList(ID3D12GraphicsCommandList* cmdList, //Called by per monitor render thead.
325 DX12ResourcesPerWindow& winRes, const DX12ResourcesPerTab& tabRes);
326 void WaitForPreviousFrame(DX12ResourcesPerRenderThread dx);
327
328 // Called when a monitor is unplugged or window is destroyed. Destroys SwapChain/RTVs but KEEPS Geometry.
329 void CleanupWindowResources(DX12ResourcesPerWindow& winRes);
330 // Called when a TAB is closed by the user. Destroys the Jumbo Vertex/Index Buffers.
331 void CleanupTabResources(DX12ResourcesPerTab& tabRes);
332 // Called ONLY at application exit (wWinMain end).Destroys the Device, Factory, and Global Copy Queue.
333 // Thread resources are cleaned up by the Render Thread itself before exit.
334 void CleanupD3DGlobal();
335};
336
337extern int g_monitorCount; // Global variable. We support as many monitors as the system has.
338
339void FetchAllMonitorDetails();
340BOOL CALLBACK MonitorEnumProc(HMONITOR hMonitor, HDC hdcMonitor, LPRECT lprcMonitor, LPARAM dwData);
341
342/*
343IID_PPV_ARGS is a MACRO used in DirectX (and COM programming in general) to help safely and correctly
344retrieve interface pointers during object creation or querying. It helps reduce repetitive typing of codes.
345COM interfaces are identified by unique GUIDs. Than GUID pointer is converted to appropriate pointer type.
346
347Ex: IID_PPV_ARGS(&device) expands to following:
348IID iid = __uuidof(ID3D12Device);
349void** ppv = reinterpret_cast<void**>(&device);
350*/
351
352// Structure to hold transformation matrices
353struct ConstantBuffer {
354 DirectX::XMFLOAT4X4 worldViewProjection;
355 DirectX::XMFLOAT4X4 world;
356};
357
358// Externs for communication
359extern std::atomic<bool> shutdownSignal;
360extern ThreadSafeQueueGPU g_gpuCommandQueue;
361
362// Logic Thread "Fence"
363extern std::mutex g_logicFenceMutex;
364extern std::condition_variable g_logicFenceCV;
365extern uint64_t g_logicFrameCount;
366
367// Copy Thread "Fence"
368extern std::mutex g_copyFenceMutex;
369extern std::condition_variable g_copyFenceCV;
370extern uint64_t g_copyFrameCount;
371
372//TODO: Implement this. In a real allocator, we would manage free lists and possibly defragment memory.
373/*
374std::optional<GpuResourceVertexIndexInfo> शंकर::Allocate(size_t size) {
375
376 if (nextFreeOffset + size > m_vram_capacity) {
377 std::cerr << "VRAM MANAGER: Out of memory!" << std::endl;
378 // Here, the Main Logic thread would be signaled to reduce LOD.
379 return std::nullopt;
380 }
381 GpuResourceVertexIndexInfo info{ nextFreeOffset, size };
382 nextFreeOffset += size; // Simple bump allocator
383 return info;
384}*/
385
386// =================================================================================================
387// Utility Functions
388// =================================================================================================
389
390// Waits for the previous frame to complete rendering.
391inline void WaitForGpu(DX12ResourcesPerWindow dx)
392{ //Where are we using this function?
393 /*
394 dx.commandQueue->Signal(dx.fence.Get(), dx.fenceValue);
395 dx.fence->SetEventOnCompletion(dx.fenceValue, dx.fenceEvent);
396 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
397 dx.fenceValue++;*/
398}
399
400// Waits for a specific fence value to be reached
401inline void WaitForFenceValue(DX12ResourcesPerWindow dx, UINT64 fenceValue)
402{ // Where are we using this?
403 /*
404 if (dx.fence->GetCompletedValue() < fenceValue)
405 {
406 ThrowIfFailed(dx.fence->SetEventOnCompletion(fenceValue, dx.fenceEvent));
407 WaitForSingleObjectEx(dx.fenceEvent, INFINITE, FALSE);
408 }*/
409}
410
411// =================================================================================================
412// Thread Functions
413// =================================================================================================
414// Thread synchronization between Main Logic thread and Copy thread
415inline std::mutex toCopyThreadMutex;
416inline std::condition_variable toCopyThreadCV;
417inline std::queue<CommandToCopyThread> commandToCopyThreadQueue;
418inline std::mutex objectsOnGPUMutex;
419// Copy thread will update the following map whenever it adds/removes/modifies an object on GPU.
420inline std::map<uint64_t, GpuResourceVertexIndexInfo> objectsOnGPU;
421
422// Thread Functions - Just Declaration!
423void GpuCopyThread();
424void GpuRenderThread(int monitorId, int refreshRate);